pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 13:44:15 +08:00

Author	SHA1	Message	Date
Pritam Damania	56b43f4fec	Perform appropriate CUDA stream synchronization in distributed autograd. (#53929 ) (#54358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53929 The local autograd engine performs appropriate stream synchronization between autograd nodes in the graph to ensure a consumer's stream is synchronized with the producer's stream before executing the consumer. However in case of distributed autograd, the SendRpcBackward function receives gradients over the wire and TensorPipe uses its own pool of streams for this purpose. As a result, the tensors are received on TensorPipe's stream pool but SendRpcBackward runs on a different stream during the backward pass and there is no logic to synchronize these streams. To fix this, I've enhanced DistEngine to synchronize these streams appropriately when it receives grads over the wire. ghstack-source-id: 124055277 (Note: this ignores all push blocking failures!) Test Plan: 1) Added unit test which reproduced the issue. 2) waitforbuildbot. Reviewed By: walterddr, wanchaol Differential Revision: D27025307 fbshipit-source-id: 2944854e688e001cb3989d2741727b30d9278414 Co-authored-by: Pritam Damania <pritam.damania@fb.com>	2021-03-23 19:28:21 -07:00
Nikita Shulga	6c394614f0	[CI] Install compatible cmath for Win builds (#54556 ) * [CI]Install older cmath during Windows build (#54431) Summary: Based on peterjc123 analysis, `cmath` after `26bbe2ad50 (diff-3fa97ceb95d524432661f01d4b34509c6d261a2f7f45ddcf26f79f55b3eec88a)` renders a lot of CUDA fail to compile with: ``` error: calling a __host__ function("__copysignf") from a __host__ __device__ function("c10::guts::detail::apply_impl< ::at::native::AUnaryFunctor< ::> &, ::std::tuple<float > &, (unsigned long long)0ull > ") is not allowed ``` Workaround for https://github.com/pytorch/pytorch/issues/54382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54431 Reviewed By: anjali411 Differential Revision: D27234299 Pulled By: malfet fbshipit-source-id: b3f1fef941341222cc10cb27346fcf4a1d522a0c * [CI] Install compatible cmath for Win binary builds (#54527) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54527 Reviewed By: walterddr Differential Revision: D27269528 Pulled By: malfet fbshipit-source-id: 4afdc706598f3a6ad296468dfb77a70433ae7d0f	2021-03-23 19:02:01 -07:00
Luca Wehrstedt	7c3c293ea7	[1.8] Don't build TensorPipe CMA backend on old glibc versions (#54491 ) Some users who are building from source on old glibc versions are hitting the issue of TensorPipe using the process_vm_readv syscall which is not wrapped by glibc. This PR tries to check that condition in CMake and disable that backend in such cases. This should have no effect on PyTorch's official builds, it should just help people who are building from source.	2021-03-23 15:56:26 -07:00
Nikita Shulga	9d43171746	[1.8.1] Replace thrust with cub in randperm (#54537 ) Summary: Benchmark of ```python %timeit torch.randperm(100000, device='cuda'); torch.cuda.synchronize() ``` thrust: ``` 5.76 ms ± 42.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` cub: ``` 3.02 ms ± 32.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` sync in thrust sort is removed Warning: Thrust supports 64bit indexing, but cub doesn't, so this is a functional regression. However, `torch.randperm(231, device='cuda')` fails with OOM on 40GB A100, and `torch.randperm(232, device='cuda')` fails with OOM on 80GB A100, so I think this functional regression has low impact and is acceptable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53841 Reviewed By: albanD Differential Revision: D26993453 Pulled By: ngimel fbshipit-source-id: 39dd128559d53dbb01cab1585e5462cb5f3cceca Co-authored-by: Xiang Gao <qasdfgtyuiop@gmail.com>	2021-03-23 15:45:20 -07:00
Matti Picus	f3c950e04e	various doc building cleanups (#54141 )	2021-03-23 11:23:02 -07:00
Eli Uriegas	b6f49807db	third_party: Update kineto to fix libtorch builds (#54205 ) Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2021-03-17 13:26:11 -07:00
Joel Schlosser	d84e05be49	[fix] Dimension out of range in pixel_shuffle / pixel_unshuffle (#54178 ) Co-authored-by: Joel Benjamin Schlosser <jbschlosser@fb.com>	2021-03-17 12:40:59 -07:00
Nikita Shulga	c6139b7915	Make ideep honor `torch.set_num_thread` changes (#53871 ) (#54025 ) Summary: When compiled with OpenMP support `ideep`'s computational_cache would cache max number of OpenMP workers This number could be wrong after `torch.set_num_threads` call, so clean it after the call. Fixes https://github.com/pytorch/pytorch/issues/53565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53871 Reviewed By: albanD Differential Revision: D27003265 Pulled By: malfet fbshipit-source-id: 1d84c23070eafb3d444e09590d64f97f99ae9d36	2021-03-16 11:46:19 -07:00
Nikita Shulga	30baaef738	Use `int8_t` instead of `char` in [load\|store]_scalar` (#52616 ) (#54022 ) Summary: Since `char` is not guaranteed to be signed on all platforms (it is unsigned on ARM) Fixes https://github.com/pytorch/pytorch/issues/52146 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52616 Test Plan: Run ` python3 -c "import torch;a=torch.tensor([-1], dtype=torch.int8);print(a.tolist())"` on arm-linux system Reviewed By: walterddr Differential Revision: D26586678 Pulled By: malfet fbshipit-source-id: 91972189b54f86add516ffb96d579acb0bc13311	2021-03-16 11:45:50 -07:00
Nikita Shulga	264d0ecf83	[nn] nn.Embedding : `padding_idx` doc update (#53809 ) (#54026 ) Summary: Follow-up of https://github.com/pytorch/pytorch/pull/53447 Reference: https://github.com/pytorch/pytorch/pull/53447#discussion_r590521051 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53809 Reviewed By: bdhirsh Differential Revision: D27049643 Pulled By: jbschlosser fbshipit-source-id: 623a2a254783b86391dc2b0777b688506adb4c0e Co-authored-by: kshitij12345 <kshitijkalambarkar@gmail.com>	2021-03-16 11:44:37 -07:00
Vitaly Fedyunin	51233ea4b0	Disabling dispatch to OneDNN for group convolutions when groups size = 24 * n (#54015 ) * Disabling dispatch to OneDNN for group convolutions when groups size is 24 * n * Add condition to non-zero grps Co-authored-by: Vitaly Fedyunin <vitaly.fedyunin@gmail.com>	2021-03-16 07:34:18 -07:00
ilia-cher	31a1a00ae8	Update Kineto revision for 1.8.1 (#54044 ) Summary: Updating Kineto to include bugfixes for 1.8.1 Test Plan: CI	2021-03-16 07:31:47 -07:00
Bowen Bao	bb98a99638	[ONNX] Update embedding export wrt padding_idx (#53931 ) (#54033 ) Summary: To be in-sync with https://github.com/pytorch/pytorch/issues/53447 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53931 Reviewed By: ngimel Differential Revision: D27026616 Pulled By: malfet fbshipit-source-id: 4c50b29fa296c90aeeeb1757bdaada92cbba33d4	2021-03-15 21:38:49 -07:00
Bowen Bao	295c7cf1de	[ONNX] Update assign output shape for nested structure and dict output (#52893 ) (#53311 ) (#54019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53311 Fixes dict output & nested tuple. Test Plan: Imported from OSS Reviewed By: pbelevich, malfet Differential Revision: D26922426 Pulled By: SplitInfinity fbshipit-source-id: c2c6b71c8d978b990181e0b025626dbf6ef2199e	2021-03-15 18:52:11 -07:00
Nikita Shulga	3233861ec4	Fix test to use proper condition. (#52216 ) (#54028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52216 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26427506 Pulled By: ailzhang fbshipit-source-id: ba4f2f66794cb2843926e5566eb4d25582f7fb2b Co-authored-by: Ailing Zhang <ailzhang@fb.com>	2021-03-15 16:52:29 -07:00
gmagogsfm	47f4b3f7d4	Cherrypick #53576 into release/1.8 (#53766 )	2021-03-15 13:36:09 -07:00
Nikita Shulga	e450f1498f	[ONNX] Support torch.isinf, torch.any and torch.all export to ONNX (#53328 ) (#53529 ) (#54007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53529 Supported for ONNX export after opset 10. This is not exportable to opsets < 10 due to 1. onnx::IsInf is introduced in opset 10 2. onnx::Equal does not accept float tensor prior to opset 11 Test Plan: Imported from OSS Reviewed By: pbelevich, malfet Differential Revision: D26922418 Pulled By: SplitInfinity fbshipit-source-id: 69bcba50520fa3d69db4bd4c2b9f88c00146fca7 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-03-15 13:05:59 -07:00
Nikita Shulga	6fd01f9440	[ONNX] Update inputs/input_names formatting to avoid ValueError with scriptMethods (#53519 ) (#53548 ) (#54005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53548 fixes issue faced in #53506 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D26922415 Pulled By: malfet fbshipit-source-id: b61842827bb14cef8c7a7089b2426fa53e642c90 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-03-15 12:24:20 -07:00
Luca Wehrstedt	b33e434d55	[v1.8.1] Pick up upstream fixes from TensorPipe (#53804 ) - Support transferring >2GB over CMA - Avoid loading stub version of CUDA driver - Don't use unsupported mmap option on older kernels - Don't join non-existing thread if CMA is not viable The last two manifested as uncaught exceptions (hence crashes) when initializing RPC. The first one caused same-machine RPC requests to fail.	2021-03-15 12:22:10 -07:00
Nikita Shulga	a3e4bf60bb	[fix] nn.Embedding: allow changing the padding vector (#53447 ) (#53986 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53368 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53447 Reviewed By: albanD Differential Revision: D26946284 Pulled By: jbschlosser fbshipit-source-id: 54e5eec7da86fa02b1b6e4a235d66976a80764fc Co-authored-by: kshitij12345 <kshitijkalambarkar@gmail.com>	2021-03-15 12:21:05 -07:00
Neeraj Pradhan	e991cdaf58	[CherryPick] Fixes for distribution validation checks (#53763 ) * Add sample validation for LKJCholesky.log_prob * Fix distributions which don't properly honor validate_args=False A number of derived distributions use base distributions in their implementation. We add what we hope is a comprehensive test whether all distributions actually honor skipping validation of arguments in log_prob and then fix the bugs we found. These bugs are particularly cumbersome in PyTorch 1.8 and master when validate_args is turned on by default In addition one might argue that validate_args is not performing as well as it should when the default is not to validate but the validation is turned on in instantiation. Arguably, there is another set of bugs or at least inconsistencies when validation of inputs does not prevent invalid indices in sample validation (when with validation an IndexError is raised in the test). We would encourage the implementors to be more ambitious when validation is turned on and amend sample validation to throw a ValueError for consistency. * additional fixes to distributions * address failing tests Co-authored-by: neerajprad <neerajprad@devvm903.atn0.facebook.com> Co-authored-by: Thomas Viehmann <tv.code@beamnet.de>	2021-03-15 10:51:50 -07:00
Nikita Shulga	4596a8ec8a	Remove MNIST for XLA (#53274 ) (#53987 ) Summary: Mitigates https://github.com/pytorch/pytorch/issues/53267 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53274 Reviewed By: zhangguanheng66, ailzhang Differential Revision: D26819702 Pulled By: cpuhrsch fbshipit-source-id: 5b9b30db6f8fc414aa9f3c841429bf99bc927763 Co-authored-by: cpuhrsch <cpuhrsch@devvm2783.frc0.facebook.com>	2021-03-15 07:53:39 -07:00
Richard Zou	512f289884	Example LSTMCell (#51983 ) (#54003 ) Summary: Fixes #{51801} LSTMCell example updated Pull Request resolved: https://github.com/pytorch/pytorch/pull/51983 Reviewed By: agolynski Differential Revision: D26467104 Pulled By: zou3519 fbshipit-source-id: 31c8bf89b21cd2f748b2cc28a74169082d81503c Co-authored-by: CarlosJose126 <43588143+CarlosJose126@users.noreply.github.com>	2021-03-15 07:50:49 -07:00
mrshenli	c439f85b16	Fix set_device_map docs (#53508 ) (#53822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53508 closes #53501 Differential Revision: D26885263 Test Plan: Imported from OSS Reviewed By: H-Huang Pulled By: mrshenli fbshipit-source-id: dd0493e6f179d93b518af8f082399cacb1c7cba6	2021-03-12 17:31:29 -08:00
Eli Uriegas	30712fca7e	ci: Remove special versioning privileges for cu102 (#53133 ) (#53734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53133 In light of some issues where users were having trouble installing CUDA specific versions of pytorch we should no longer have special privileges for CUDA 10.2. Recently I added scripts/release/promote/prep_binary_for_pypi.sh (https://github.com/pytorch/pytorch/pull/53056) to make it so that we could theoretically promote any wheel we publish to download.pytorch.org to pypi Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D26759823 Pulled By: seemethere fbshipit-source-id: 2d2b29e7fef0f48c23f3c853bdca6144b7c61f22 (cherry picked from commit b8546bde09c7c00581fe4ceb061e5942c7b78b20) Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2021-03-10 11:53:14 -08:00
Nikita Shulga	debf62d95c	[1.8.1] Explicitly export submodules and variables from torch module (#53675 ) Summary: For https://github.com/pytorch/pytorch/issues/47027. Some progress has been made in https://github.com/pytorch/pytorch/issues/50665, but in my testing trying to unwrap the circular dependencies is turning into a neverending quest. This PR explicitly exports things in the top-level torch module without any semantic effect, in accordance with this py.typed library guidance: https://github.com/microsoft/pyright/blob/master/docs/typed-libraries.md#library-interface It may be possible to do some of the other fixes just using `__all__` where needed, but `__all__` has a semantic effect I would like to further review. This PR at least fixes simple completions like `torch.nn` in Pylance/pyright. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52339 Reviewed By: smessmer Differential Revision: D26694909 Pulled By: malfet fbshipit-source-id: 99f2c6d0bf972afd4036df988e3acae857dde3e1 Co-authored-by: Jake Bailey <5341706+jakebailey@users.noreply.github.com>	2021-03-10 10:10:42 -08:00
Nikita Shulga	e30dc8d21b	enable autocast for xla (#48570 ) (#53671 ) Summary: For enabling amp in torch/xla, see [this](https://github.com/pytorch/xla/pull/2654). Pull Request resolved: https://github.com/pytorch/pytorch/pull/48570 Reviewed By: ezyang Differential Revision: D26120627 Pulled By: ailzhang fbshipit-source-id: 32627b17c04bfdad128624676ea9bf6f117bc97d Co-authored-by: Chengji Yao <yaochengji@hotmail.com>	2021-03-10 10:06:02 -08:00
James Reed	4e590c9ced	Docs cherrypicks 1.8.1 (#53674 ) * [FX] Cherrypick docs fixes * Update code links to point to 1.8	2021-03-09 17:23:28 -08:00
Jerry Zhang	6e9f2c8df0	[1.8 release only] Remove fx graph mode quantization doc from release (#53055 )	2021-03-02 12:26:26 -08:00
Jeff Daily	37c1f4a7fe	Fix hipify_python (#52756 ) Co-authored-by: rraminen <rraminen@amd.com> Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-02-26 14:13:54 -08:00
Rong Rong	49b74a52a4	Catch Flake8 error codes with multiple letters (#52750 ) (#52801 ) Summary: The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future. Tagging the following people to ask what to do to fix these `EXE002` warnings: - https://github.com/pytorch/pytorch/issues/50629 authored by jaglinux, approved by rohan-varma - `test/distributed/test_c10d.py` - https://github.com/pytorch/pytorch/issues/51262 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/__init__.py` - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py` - `torch/utils/data/datapipes/iter/listdirfiles.py` - `torch/utils/data/datapipes/iter/__init__.py` - `torch/utils/data/datapipes/utils/__init__.py` - `torch/utils/data/datapipes/utils/common.py` - https://github.com/pytorch/pytorch/issues/51398 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromtar.py` - https://github.com/pytorch/pytorch/issues/51599 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromzip.py` - https://github.com/pytorch/pytorch/issues/51704 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/routeddecoder.py` - `torch/utils/data/datapipes/utils/decoder.py` - https://github.com/pytorch/pytorch/issues/51709 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/groupbykey.py` Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang? Pull Request resolved: https://github.com/pytorch/pytorch/pull/52750 Test Plan: The Lint / flake8-py3 job in GitHub Actions: - [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly - [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings - [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in https://github.com/pytorch/pytorch/issues/52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix Reviewed By: walterddr, janeyx99 Differential Revision: D26637307 Pulled By: samestep fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804 Co-authored-by: Sam Estep <sestep@fb.com>	2021-02-26 07:49:51 -08:00
Neeraj Pradhan	11c78e9cb3	Expose documentation for LKJCholesky distribution (#52904 ) This is already added to the master branch in https://github.com/pytorch/pytorch/pull/52763.	2021-02-26 07:47:29 -08:00
X Wang	d6943ea58d	apply diff 52351 (#52649 )	2021-02-23 07:51:38 -08:00
Nikita Shulga	02b61b49ea	[1.8] Update XNNPACK (#52647 ) Cherry-pick `55d53a4e70` into release/1.8 branch	2021-02-23 05:31:57 -08:00
Luca Wehrstedt	d553478c98	[v1.8] Make TensorPipe work around bug in old versions of libibverbs (#52615 ) The bug affects PyTorch users who meet two conditions: - they have an old version of libibverbs installed (the userspace library), namely older than v25, which dates from Jul 29, 2019; - but they do _not_ have an InfiniBand kernel module loaded. In those cases they will experience a crash (uncaught exception) happening when initializing RPC, mentioning an "unknown error -38". There is a workaround, which is for those users to activate a killswitch (which is private and undocumented) to disable the `ibv` backend of TensorPipe.	2021-02-22 16:55:12 -08:00
Nikita Shulga	63333e2a25	[1.8] Update api doc for enabling TcpStore on Windows (#52601 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/51847 Reviewed By: albanD Differential Revision: D26405678 Pulled By: malfet fbshipit-source-id: 073b675225b48d1732771583f8f2473e0fdcf35c Co-authored-by: Joe Zhu <jozh@microsoft.com>	2021-02-22 10:14:09 -08:00
Bowen Bao	8e7eebfc9a	[1.8] Fix onnx mixed precision export for layernorm & fuseLogSoftmaxNllLoss (#52510 ) Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com>	2021-02-19 14:40:53 -08:00
Eli Uriegas	f8afb8bdd0	[v1.8.0] Various CUDA 11.1 with BUILD_SPLIT_CUDA_FIXES (#52518 ) Co-authored-by: Nikita Shulga <nshulga@fb.com> Co-authored-by: peterjc123 <peterghost86@gmail.com> Co-authored-by: Jane Xu <janeyx@fb.com>	2021-02-19 12:41:21 -08:00
eellison	0851cc42b0	Update freezing API - changes from 52337 (#52392 ) Co-authored-by: eellison <eellison@fb.com>	2021-02-18 15:36:51 -08:00
Jane (Yuan) Xu	804f7b6018	Add arm64 binary build (#52443 ) (#52469 ) Summary: This is getting tested by https://github.com/pytorch/pytorch/issues/52441. Adds new config for macos arm64 to our binary builds. Now stores artifacts for mac builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52443 Reviewed By: walterddr Differential Revision: D26517330 Pulled By: janeyx99 fbshipit-source-id: 02774937a827bdd4c08486dc9f8fe63446917f1e	2021-02-18 15:17:27 -08:00
SplitInfinity	32758d30b3	onnx export of per channel fake quantize functions (#42835 ) (#52430 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39502 This PR adds support for exporting fake_quantize_per_channel_affine to a pair of QuantizeLinear and DequantizeLinear. Per tensor support was added by PR https://github.com/pytorch/pytorch/pull/39738. `axis` attribute of QuantizeLinear and DequantizeLinear, which is required for per channel support, is added in opset13 added by https://github.com/onnx/onnx/pull/2772. [update 1/20/2021]: opset13 is being supported on master, the added function is now properly tested. Code also rebased to new master. The function is also tested offline with the following code ```python import torch from torch import quantization from torchvision import models qat_resnet18 = models.resnet18(pretrained=True).eval().cuda() qat_resnet18.qconfig = quantization.QConfig( activation=quantization.default_fake_quant, weight=quantization.default_per_channel_weight_fake_quant) quantization.prepare_qat(qat_resnet18, inplace=True) qat_resnet18.apply(quantization.enable_observer) qat_resnet18.apply(quantization.enable_fake_quant) dummy_input = torch.randn(16, 3, 224, 224).cuda() _ = qat_resnet18(dummy_input) for module in qat_resnet18.modules(): if isinstance(module, quantization.FakeQuantize): module.calculate_qparams() qat_resnet18.apply(quantization.disable_observer) qat_resnet18.cuda() input_names = [ "actual_input_1" ] output_names = [ "output1" ] torch.onnx.export(qat_resnet18, dummy_input, "quant_model.onnx", verbose=True, opset_version=13) ``` It can generate the desired graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42835 Reviewed By: houseroad Differential Revision: D26293823 Pulled By: SplitInfinity fbshipit-source-id: 300498a2e24b7731b12fa2fbdea4e73dde80e7ea Co-authored-by: Hao Wu <skyw@users.noreply.github.com>	2021-02-18 12:50:40 -08:00
gchanan	bcb64a8084	Fix upsample bicubic2d batching handling on CPU. (#52389 ) (#52445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52389 Fixes: https://github.com/pytorch/pytorch/issues/49159 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26496319 Pulled By: gchanan fbshipit-source-id: d385cd683ef09e0596a9875ce84d03e6e77acc93	2021-02-18 12:46:39 -08:00
albanD	f07991d396	update symeig backward note about similar eigenvalues (#52311 ) (#52446 ) Summary: First part of https://github.com/pytorch/pytorch/issues/49886 to at least properly warn users of the current state Pull Request resolved: https://github.com/pytorch/pytorch/pull/52311 Reviewed By: soulitzer Differential Revision: D26495644 Pulled By: albanD fbshipit-source-id: 72abdfe41cdbcc1ac739a536eb85d1aa4ba90897	2021-02-18 12:45:47 -08:00
Eli Uriegas	c458cd4852	[v1.8.0] .circleci: Downgrade CUDA 11.2 -> 11.1 for binaries (#52151 ) (#52406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52151 CUDA 11.2 might not be as performant as we thought so let's downgrade to something we think is more performant. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D26408314 Pulled By: seemethere fbshipit-source-id: e2446aa0115e2c2a79718b1fdfd9fccf2072822d (cherry picked from commit a11650b069729997b002032d70e9793477147851) Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2021-02-18 10:59:03 -08:00
Nikita Shulga	f7c4afc0f4	[cmake] Add explicit cublas->cudart dependency (#52243 ) (#52404 ) Summary: Necessary to ensure correct link order, especially if libraries are linked statically. Otherwise, one might run into: ``` /usr/bin/ld: /usr/local/cuda/lib64/libcublasLt_static.a(libcublasLt_static.a.o): undefined reference to symbol 'cudaStreamWaitEvent@libcudart.so.11.0' /usr/local/cuda/lib64/libcudart.so: error adding symbols: DSO missing from command line ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52243 Reviewed By: seemethere, ngimel Differential Revision: D26437159 Pulled By: malfet fbshipit-source-id: 33b8bb5040bda10537833f3ad737f535488452ea	2021-02-17 16:07:41 -08:00
Richard Zou	20554c00b6	[1.8] Remove torch.vmap (#52397 ) torch.vmap is a prototype feature and should not be in the stable binary. This PR: - Removes the `torch.vmap` API - Removes the documentation entry for torch.vmap - Changes the vmap tests to use an internal API instead of torch.vmap. Test Plan: - Tested locally (test_torch, test_autograd, test_type_hints, test_vmap), but also wait for CI.	2021-02-17 16:05:34 -08:00
Nikita Shulga	3464d64f08	[1.8] Fix libnvrtc discoverability in package patched by `auditwheel` (#52365 )	2021-02-17 16:05:05 -08:00
Vitaly Fedyunin	c6972eb3ac	Skip OneDNN Convolution in case of groups = 24 #50042 (#52313 ) Co-authored-by: Vitaly Fedyunin <vitaly.fedyunin@gmail.com>	2021-02-17 16:04:26 -08:00
Rohan Varma	25562d3d41	Use side-stream in CPU to GPU copies in DDP (#50180 ) (#52270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50180 Resolves the regression in https://github.com/pytorch/pytorch/issues/49819 by adding copy over background stream similar to scatter. For internal use cases, this is gated with an env var that maintains the previous behavior when it is off. Test Plan: CI Reviewed By: mrshenli, ngimel Differential Revision: D25818170 fbshipit-source-id: e50c76c035504b2a44e2be084701cee45c90df75	2021-02-17 09:49:30 -08:00
Mike Ruberry	cd63c37bc6	ports fix (#52242 ) Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>	2021-02-13 17:59:51 -08:00
Yi Wang	c79decdbba	[v1.8 patch] [Resubmission] Add a documentation page for DDP communication hooks (#52215 ) Co-authored-by: wayi <wayi@devgpu238.prn2.facebook.com>	2021-02-12 16:37:23 -08:00
Nikita Shulga	c307a3f336	[1.8] Do not print warning if CUDA driver not found (#51806 ) (#52050 ) Summary: It frequently happens when PyTorch compiled with CUDA support is installed on machine that does not have NVIDIA GPUs. Fixes https://github.com/pytorch/pytorch/issues/47038 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51806 Reviewed By: ezyang Differential Revision: D26285827 Pulled By: malfet fbshipit-source-id: 9fd5e690d0135a2b219c1afa803fb69de9729f5e	2021-02-12 12:20:46 -08:00
Nikita Shulga	f071020756	Workaround arm64 gcc error in `std::copysign` (#51900 ) (#52049 ) Summary: Move definition of copysign template and specialization for bfloat16/half types before first use of copysign in that file Add comment explaining why this is necessary Fixes https://github.com/pytorch/pytorch/issues/51889 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51900 Reviewed By: walterddr Differential Revision: D26321741 Pulled By: malfet fbshipit-source-id: 888858b11d9708fa140fe9c0570cc5a24599205b	2021-02-12 08:00:46 -08:00
Vasiliy Kuznetsov	4f436f8570	fake_quant cachemask: remove Python bindings (#51878 ) (#52160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51878 `fake_quantize_per_tensor_affine_cachemask` and `fake_quantize_per_channel_affine_cachemask` are implementation details of `fake_quantize_per_tensor_affine` and `fake_quantize_per_channel_affine`, removing the Python bindings for them since there is no need to expose them. Test Plan: ``` python test/test_quantization.py TestFakeQuantize ``` Imported from OSS Reviewed By: albanD, bugra Differential Revision: D26314173 fbshipit-source-id: 733c93a3951453e739b6ed46b72fbad2244f6e97 (cherry picked from commit 33afb5f19f4e427f099653139ae45b661b8bc596)	2021-02-12 07:37:00 -08:00
James Reed	ae11589710	[FX][1.8] Cherrypick three FX fixes to 1.8 (#52021 ) * Fix leaf modules in Transformer [ghstack-poisoned] * Fix tuple type annotations [ghstack-poisoned] * Generalize dict key check in `create-arg` (#51927) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51927 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26329655 Pulled By: jamesr66a fbshipit-source-id: a15e7d9564551521af12a8fde1c7524856f0cbc2	2021-02-12 07:35:34 -08:00
Yuxin Wu	9e5bcc1020	1.8 cherrypick: Add metacompile of Ternary if (#51789 ) (#51913 ) Summary: Fixes issue: https://github.com/pytorch/pytorch/issues/49728 ======== Ternary if operation fails in Torchscript when the condition variable is annotated as Final. Tests: ======= pytest -k test_ternary_static_if test/test_jit.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/51789 Reviewed By: gmagogsfm Differential Revision: D26278969 Pulled By: nikithamalgifb fbshipit-source-id: 27d1383290211503188428fb2e8b7749f59ba16e Co-authored-by: nikithamalgi <nikithamalgi@devvm146.prn0.facebook.com>	2021-02-09 21:34:26 -08:00
Eli Uriegas	fa8578241d	.jenkins: Release branch specific updates (#51982 )	2021-02-09 21:33:29 -08:00
Eli Uriegas	1368809532	[v1.8.0] [wip] doc_fix (#52006 ) Summary: tries to fix doc_test Pull Request resolved: https://github.com/pytorch/pytorch/pull/51825 Reviewed By: bertmaher Differential Revision: D26295583 Pulled By: ngimel fbshipit-source-id: 13f6e7f1675d810adfd4abd2d579e2812fe54c80 (cherry picked from commit 6c0bf28da651eb8ff1d2d0dcfe807ea757fb61e5) Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Co-authored-by: Natalia Gimelshein <ngimel@fb.com>	2021-02-09 21:32:32 -08:00
James Reed	4073248fc2	[FX] Hide experimental folder (#51987 )	2021-02-09 15:44:33 -08:00
Jane (Yuan) Xu	75153cb730	Disable unaliged-access test from TestVectorizedMemoryAccess.CopyKernel (#51864 ) (#51890 ) Summary: Test begins to fail after the driver udpate See https://github.com/pytorch/pytorch/issues/51863 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51864 Reviewed By: bertmaher Differential Revision: D26304018 Pulled By: malfet fbshipit-source-id: bb7ade2f28d8cf8f847159d4ce92391f0794c258 Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-02-09 10:17:18 -08:00
Rong Rong	5bb69b080c	concantenate LICENSE files when building a wheel (#51634 ) (#51882 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50695 I checked locally that the concatenated license file appears at `torch-<version>.dist-info/LICENSE` in the wheel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51634 Reviewed By: zhangguanheng66 Differential Revision: D26225550 Pulled By: walterddr fbshipit-source-id: 830c59fb7aea0eb50b99e295edddad9edab6ba3a Co-authored-by: mattip <matti.picus@gmail.com>	2021-02-09 10:16:12 -08:00
James Reed	9112f4eded	[FX][docs] Indent forward (#51802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51802 lol Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D26284311 Pulled By: jamesr66a fbshipit-source-id: 0d303d8c99131abb8d97e0acd0ac2d810e1e950c	2021-02-05 18:01:27 -08:00
Vasiliy Kuznetsov	8c48af822e	pytorch docs: add fake_quantize functions documentation (#51748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51748 Adding docs for `fake_quantize_per_tensor_affine` and `fake_quantize_per_channel_affine` functions. Note: not documenting `fake_quantize_per_tensor_affine_cachemask` and `fake_quantize_per_channel_affine_cachemask` since they are implementation details of `fake_quantize_per_tensor_affine` and `fake_quantize_per_channel_affine`, and do not need to be exposed to the user at the moment. Test Plan: Build the docs locally on Mac OS, it looks good Reviewed By: supriyar Differential Revision: D26270514 Pulled By: vkuzo fbshipit-source-id: 8e3c9815a12a3427572cb4d34a779e9f5e4facdd	2021-02-05 17:53:02 -08:00
Nikita Shulga	ececbcfff2	[Conda][Kineto] efine weak `acc_get_device_type` if kineto is used (#51818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51818 Reviewed By: ilia-cher Differential Revision: D26291188 Pulled By: malfet fbshipit-source-id: 68797e02fe4dd54d8030e67aaf28046a4fae0770	2021-02-05 17:46:30 -08:00
Jane Xu	fb07aca7b0	Adding support for CUDA 11.2 in our nightly build matrix (#51611 ) Summary: Replacing 11.0 with 11.2 in our nightlies. (am slightly uncertain why the manywheel linux tests worked before we added the GPU driver for 11.2) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51611 Reviewed By: malfet, seemethere, zhangguanheng66 Differential Revision: D26282829 Pulled By: janeyx99 fbshipit-source-id: b15380e5c44a957e6a85e4f5fb9691ab9c6103a5	2021-02-05 15:40:31 -08:00
Xu Zhao	5c3a054b12	Add FLOPS support to the new profiler API. (#51734 ) Summary: The new profiler API was added in PR#48280. This PR is to add FLOPS support to the new profiler API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51734 Test Plan: ```python python test/test_profiler.py -k test_flops ``` Reviewed By: xuzhao9 Differential Revision: D26261851 Pulled By: ilia-cher fbshipit-source-id: dbeba4c197e6f51a9a8e640e8bb60ec38df87f73	2021-02-05 15:03:35 -08:00
Alban Desmaison	430329e875	Revert D26009829: Optimize relu on cpu using clamp_min Test Plan: revert-hammer Differential Revision: D26009829 (`2054cd56c5`) Original commit changeset: 7bb1583ffb3e fbshipit-source-id: 3e945b438fb8d83f721e400ae69be8848cab9720	2021-02-05 14:48:06 -08:00
Rong Rong (AI Infra)	50c9c08203	Enable GPU/RE tags for caffe2/caffe2/python/TARGETS Summary: Moving caffe2_core_gpu_python contbuild to use GPU/RE Test Plan: CI Reviewed By: malfet Differential Revision: D26261826 fbshipit-source-id: a6f8c7bd8368c1cb69499ea0ea7d5add0956a7ad	2021-02-05 13:52:48 -08:00
Bert Maher	2054cd56c5	Optimize relu on cpu using clamp_min (#50924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50924 `clamp_min` seems slightly faster than `threshold` (on avx2 cpus) because it compiles down to vmaxps, rather than vcmpps+vblendv. I see the biggest perf difference (about 20% faster) with float tensors at 32k-64k elements. Bigger tensors are more memory bound although it looks like it might still be a tiny win (2%). Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26009829 Pulled By: bertmaher fbshipit-source-id: 7bb1583ffb3ee242e347f59be82e0712c7631f7e	2021-02-05 13:03:40 -08:00
Nikita Shulga	3cfbf6d3ac	[quick-checks] Allow `gradlew` to be executable (#51796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51796 Reviewed By: IvanKobzarev Differential Revision: D26280152 Pulled By: malfet fbshipit-source-id: ab19ddc8589471002fb330d8d97c81f5a6deeb6f	2021-02-05 12:54:53 -08:00
Stephen Jia	029f857b22	[Metal] Add hardswish and hardsigmoid to metal, fix broadcasting for binary elementwise ops Summary: Add hardswish_ and hardsigmoid_ activations to enable MobileNetV3. Also fix binary elementwise ops to work when the first input is being broadcasted rather than the second. Test Plan: Test on device: ``` arc focus2 pp-ios ``` Test on mac ``` buck test pp-macos ``` Reviewed By: xta0 Differential Revision: D26241385 fbshipit-source-id: 6ce7269d60d63cf909b75a7f4e18fb17ac2f5d31	2021-02-05 12:46:37 -08:00
Alban Desmaison	a930162c69	Revert D26276903: [pytorch][PR] Add LazyBatchNormXd Test Plan: revert-hammer Differential Revision: D26276903 (`aa1fd6b45a`) Original commit changeset: 0ac706974178 fbshipit-source-id: bfe01b01cd460f1e2845ea5ef1fc1514e6b6ba54	2021-02-05 12:37:29 -08:00
Nikita Shulga	33973d45a9	Add `acc_get_device_type` weak symbol to `kineto_profler` (#51787 ) Summary: Move `kinetoAvailable` to profiler_kineto.h and make it a constexpr Update kineto submodule Fixes https://github.com/pytorch/pytorch/issues/51026 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51787 Reviewed By: seemethere Differential Revision: D26278170 Pulled By: malfet fbshipit-source-id: 0cdd903cd8e3106c830ccce03b903b787ae33190	2021-02-05 11:52:45 -08:00
Supriya Rao	59cb693c90	[quant] add docs for embedding/embedding_bag (#51770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51770 Test Plan: tested locally on mac Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26279112 fbshipit-source-id: 8675d3ef712ecbe545bad0d3502181b3ccdd7f89	2021-02-05 11:43:15 -08:00
Horace He	9c2dd5775a	Fixed slight bug in FX docs (#51779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51779 Reviewed By: ngimel Differential Revision: D26279623 Pulled By: Chillee fbshipit-source-id: 0cd2a487ce6b80ce0d3f81e2b2334ade20d816bb	2021-02-05 11:27:39 -08:00
Akifumi Imanishi	aa1fd6b45a	Add LazyBatchNormXd (#51548 ) Summary: This PR implements UninitializedBuffer and LazyBatchnormXd based on https://github.com/pytorch/pytorch/issues/44538. (cc. emcastillo and albanD) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51548 Reviewed By: zhangguanheng66 Differential Revision: D26276903 Pulled By: albanD fbshipit-source-id: 0ac706974178363f8af075e59b41d5989418922f	2021-02-05 10:27:04 -08:00
Yi Wang	5a962369e2	[Gradient Compression] Check if the backend is NCCL when a DDP communication hook is registered (#51759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51759 Some unit tests actually register a comm hook on other backends like GLOO. Example: `test_ddp_comm_hook_future_passing_cpu` Therefore, only do the check on `register_builtin_comm_hook`. Currently DDP communication hook can only be supported on NCCL. Add a check in the registration methods. ghstack-source-id: 121115814 Test Plan: unit tests. Reviewed By: pritamdamania87 Differential Revision: D26268581 fbshipit-source-id: c739fa4dca6d320202dc6689d790c2761c834c30	2021-02-05 09:59:12 -08:00
Jeffrey Wan	105c3d2196	Update CODEOWNERS (#51726 ) Summary: add myself and alban to folders Pull Request resolved: https://github.com/pytorch/pytorch/pull/51726 Reviewed By: albanD Differential Revision: D26254528 Pulled By: soulitzer fbshipit-source-id: 91477dda3ff81014dbadd3a93f5f511ac3da81e0	2021-02-05 09:01:18 -08:00
Kimish Patel	a7ba051fa6	[QNNPACK, Sparsity] Add dyanmic linear sparse kernel for arm64 (#50591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50591 Adds sparse kernels for arm64. Reg blocking factor of 8x8. Test Plan: q8gemm-sparse-test Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D25925501 fbshipit-source-id: 8d62a8eb638f172ffaadfb1480ade0db35831189	2021-02-05 08:46:01 -08:00
Kimish Patel	70830b5ac0	[QNNPACK, Sparsity] Sparse kernel with 4x8 blocking (#50590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50590 Larger blocking across M dim such as 8 in previous PR is likely introducing wasted compute on the shapes being benchmarked. Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing smaller data for small values of M and 2) for compute kernel it writes same number of bytes but more contiguously. It is not certain but it likely helps. Test Plan: q8gemm-sparse-test fully-connected-sparse-test Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D25925499 fbshipit-source-id: 01c661ceea38bd6ee8321bb85cf1d5da5de4e984	2021-02-05 08:42:53 -08:00
albanD	e8ee35a666	Add script to compare namespace content for release cleanup (#51685 ) Summary: Usage explanation will be in the release note runbook. This allows to generate diffs like: ``` Processing torch.nn Things that were added: {'quantizable', 'ChannelShuffle', 'LazyConvTranspose2d', 'LazyConv2d', 'LazyConvTranspose3d', 'LazyConv1d', 'GaussianNLLLoss', 'LazyConv3d', 'PixelUnshuffle', 'UninitializedParameter', 'LazyLinear', 'LazyConvTranspose1d'} Things that were removed: set() ``` This can then be shared with module owners along with the commits to help them validate that the namespace changes for their submodule is as expected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51685 Reviewed By: zhangguanheng66 Differential Revision: D26260258 Pulled By: albanD fbshipit-source-id: 40e40f86314e17246899d01ffa4b2631e93b52f7	2021-02-05 07:54:00 -08:00
Meghan Lele	28c5d90b67	[JIT] Allow implicit boolean conversion of containers (#51683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51683 Summary This commit enables implicit boolean conversion of lists, strings, and dictionaries in conditional expressions. Like Python, empty lists, strings and dictionaries evaluate to `False` and their non-empty counterparts evaluate to `True`. This allows users to write code like ``` torch.jit.script def fn(l: List[int]): if l: ... else: ... ``` This has been requested by some users and would be a good usability improvement. Test Plan This commit adds unit tests to `TestList`, `TestDict` and `test_jit_string.py` to test this new feature. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26264410 Pulled By: SplitInfinity fbshipit-source-id: b764c18fd766cfc128ea98a02b7c6c3fa49f8632	2021-02-05 00:34:35 -08:00
Natalia Gimelshein	d3023d86ba	Revert D26249330: [Gradient Compression] Add a documentation page for DDP communication hooks Test Plan: revert-hammer Differential Revision: D26249330 (`e62aabac43`) Original commit changeset: ab973390ddb7 fbshipit-source-id: d508daed76219e7ca588cf7fb38aeaaffc61acfd	2021-02-04 22:38:06 -08:00
Yanan Cao	1065c2d5b6	Fix clang-tidy warnings in python_sugared_value.{h,cpp} (#51703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51703 Reviewed By: gchanan Differential Revision: D26245798 Pulled By: gmagogsfm fbshipit-source-id: 01620adca820968324687982cc48390ff9336d20	2021-02-04 21:29:40 -08:00
Rohan Varma	c941730b96	[JIT/Futures] support set_exception api (#50983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50983 There is currently no way to handle/propagate errors with the python-based futures API (they are raised correctly if set with an error, but this is only possible from C++). This diff allows the Future's `unwrap_func` to be set in python optionally, so users can set futures completed with an exception and the error will throw as expected. This is mostly to support the following use case in the next diff: ``` ret_fut = torch.futures.Future(unwrap_func = lambda python_result: { # throw exception if needed if isinstance(python_result, Exception): throw python_result }) rpc_fut = rpc.rpc_async(...) # RPC future that times out # Goal is to propagate RPC error to this future rpc_fut.add_done_callback( res => { # Note that ret_fut.set_result(res.wait()) won't propagate the error try: ret_fut.set_result(res.wait()) except Exception as e: ret_fut.set_result(e) } ) ``` ghstack-source-id: 121021434 Test Plan: unittest ``` buck test mode/dev-nosan mode/no-gpu //caffe2/test:futures -- te st_unwrap --print-passing-details ``` Reviewed By: mrshenli Differential Revision: D25950304 fbshipit-source-id: 7ee61e98fcd783b3f515706fa141d538e6d2174d	2021-02-04 20:22:19 -08:00
Rohan Varma	8e78dd6de8	[torch.futures] Fix doc inconsistency about callback args (#50979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50979 Noticed that the documentation is inconsisntent about the arg needed in the callback. It appears to require the future, so fix this in the docs. ghstack-source-id: 121021431 Test Plan: CI Reviewed By: mrshenli Differential Revision: D25944637 fbshipit-source-id: 0bfcd4040c4a1c245314186d29a0031e634b29c3	2021-02-04 20:22:14 -08:00
Rohan Varma	21afbba79b	[torch.futures] Clarify callback behavior when future is completed (#50978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50978 Noticed that the documentation is not clear that the cbs are invoked inline if the future is already completed. We should probably document this behavior. ghstack-source-id: 121021432 Test Plan: CI Reviewed By: mrshenli Differential Revision: D25944636 fbshipit-source-id: f4ac133d076ba9a5690fecfa56bde6d614a40191	2021-02-04 20:22:09 -08:00
Rohan Varma	c3f2f3294e	[RPC] Add option to make rref.get_type not block. (#50977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50977 Adds a `blocking` flag that can be set to False to make this API return a `Future` to the type. This is to make this function non-blocking, mostly for a future change that will allow `rref.rpc_async()` to be completely non-blocking (it currently calls and waits for this function that issues an RPC in-line). ghstack-source-id: 121021433 Test Plan: Modified UT Reviewed By: mrshenli Differential Revision: D25944582 fbshipit-source-id: e3b48a52af2d4578551a30ba6838927b489b1c03	2021-02-04 20:18:50 -08:00
albanD	716a8c2153	make forward AD API private (#51693 ) Summary: Avoid leaking private functions in `torch.` namespace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51693 Reviewed By: gchanan Differential Revision: D26245046 Pulled By: albanD fbshipit-source-id: 5481b57eb56ba96581848598d32ebf5894a7adf0	2021-02-04 19:02:29 -08:00
Yi Wang	e62aabac43	[Gradient Compression] Add a documentation page for DDP communication hooks (#51715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51715 Add a documentation page for DDP communication hooks. Screenshot: {F369781049} Test Plan: View locally Reviewed By: pritamdamania87 Differential Revision: D26249330 fbshipit-source-id: ab973390ddb785c5191f587a1b2b6de7d229e50e	2021-02-04 18:53:53 -08:00
Mike Ruberry	de7eeb7752	Removes nonzero method warning (#51618 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44284 https://github.com/pytorch/pytorch/pull/45413 incorrectly left this only partially fixed because it did not update the separate list of method signatures that were deprecated. This PR correctly fixes https://github.com/pytorch/pytorch/issues/44284. A test is added for the behavior, but until the WARN_ONCE flag is added it's toothless. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51618 Reviewed By: ngimel Differential Revision: D26220181 Pulled By: mruberry fbshipit-source-id: 397b47ac7e962d108d8fde0f3dc6468d6327d1c3	2021-02-04 17:43:43 -08:00
Heitor Schueroff	e7ff0854c6	[doc] Fix inconsistencies with torch.linalg.inv and deprecate torch.inverse (#51672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51672 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26240535 Pulled By: heitorschueroff fbshipit-source-id: 16dbd0a8a8c0f851faa12bf092dbedfb7cb0b292	2021-02-04 17:19:45 -08:00
Heitor Schueroff	ff4848aaa1	[doc] Fix inconsistencies with linalg.pinv docs and deprecate pinverse (#51671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51671 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26240534 Pulled By: heitorschueroff fbshipit-source-id: 26e2a3cad2105e6e2b7779e785666b38597450c5	2021-02-04 17:19:41 -08:00
Heitor Schueroff	e7d7256f2d	doc] Fix inconsistencies with torch.linalg.matrix_rank doc (#51660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51660 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26234100 Pulled By: heitorschueroff fbshipit-source-id: b9c48c0e172461ed2770d52c07a147152d51d4b7	2021-02-04 17:19:37 -08:00
Heitor Schueroff	0308261ddc	[doc] Fix inconsistencies with torch.linalg.eigvalsh (#51659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51659 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26234102 Pulled By: heitorschueroff fbshipit-source-id: 6a6711c7b129cd29f2c733c635c4192caaf42d22	2021-02-04 17:19:33 -08:00
Heitor Schueroff	87504c3265	[doc] Fix inconsistencies with torch.linalg.eigh (#51658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51658 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26234101 Pulled By: heitorschueroff fbshipit-source-id: c1b5cc74ba0b32c49bfd843e97f957971d8be364	2021-02-04 17:19:29 -08:00
Heitor Schueroff	4835f203ec	[doc] Fix inconsistencies with torch.linalg.det docs (#51651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51651 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26234103 Pulled By: heitorschueroff fbshipit-source-id: 00ec7dae942bda887f57cb76752f8b5ef25d276a	2021-02-04 17:19:25 -08:00
Heitor Schueroff	7c12afb5e2	[doc] Fix inconsistencies with torch.linalg.cond doc (#51641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51641 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26234104 Pulled By: heitorschueroff fbshipit-source-id: 5c2c9a206c4051092305d910ed0e808458e5afd9	2021-02-04 17:13:42 -08:00
jiej	4d703d040b	Linear autodiff revert revert (#51613 ) Summary: patch PR https://github.com/pytorch/pytorch/issues/50856 and rollbak the revert D26105797 (`e488e3c443`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51613 Reviewed By: mruberry Differential Revision: D26253999 Pulled By: ngimel fbshipit-source-id: a20b1591de06dd277e4cd95542e3291a2f5a252c	2021-02-04 16:32:05 -08:00
Kimish Patel	6dcbf396aa	[QNNPACK, Sparsity] Added prepacking base aarch32 kernels (#50589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50589 Adds 1. Input prepacking kernel 2. Compute kernels that processes prepacked activation. Hunch is that input prepacking will help with 1. Cache locality and 2. Avoid a lot of address compute instructions. Cache locality helps mainly comes from the fact that we are doing mr=8 and nr=4. mr being 8 likely results in cache line evictions as likely cache associativity is 4. Laying out transposed activations which are blocked by mr=8 will lay all the transposed activation in one contiguous block. Downside is that now we will tranpose all the blocks regardless of them participating in compute. However it is likely that entire activation matrix participates in compute for some output block. Also add benchmark Test Plan: q8gemm-sparse-test fully-connected-test-sparse Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D25925502 fbshipit-source-id: b2c36419a2c5d23b4a49f25f9ee41cee8397c3be	2021-02-04 16:20:08 -08:00
Kimish Patel	47a6703bdb	[QNNPACK, Sparsity] ARMV7, aarch32, kernels for dynamic linear (#50588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50588 This diff introduces aarch32 asm kernel for sparse dense gemm. Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D25925498 fbshipit-source-id: e9e19ce67157a4bc3cba4656f926e828442f09ad	2021-02-04 16:16:35 -08:00
XiaobingSuper	3fec1e5025	fix hardsigmoid_backward for boundary case (#51454 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51438. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51454 Reviewed By: mruberry Differential Revision: D26243461 Pulled By: ngimel fbshipit-source-id: 7d954dc47427f02b7cbf0344e9889db223bfb525	2021-02-04 14:37:58 -08:00
Jane Xu	8c737f732b	replacing ubuntu-latest with ubuntu-18.04 (#51744 ) Summary: following https://github.com/pytorch/pytorch/pull/51725#pullrequestreview-583703598 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51744 Reviewed By: samestep Differential Revision: D26262089 Pulled By: janeyx99 fbshipit-source-id: fa24e5c15d24750f2a5ccd5b6a5aad9a4a3ad09f	2021-02-04 14:17:06 -08:00
Taylor Robie	094d597679	raise windows tol to 30% (#51733 ) Summary: Up the Windows tolerance set by https://github.com/pytorch/pytorch/pull/35818, as CI is still showing some flakes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51733 Test Plan: CI Reviewed By: zou3519 Differential Revision: D26258005 Pulled By: robieta fbshipit-source-id: 864c848b7b31a05a2d07d1e683342b3202377c10	2021-02-04 14:09:10 -08:00
guyang3532	ab0cf3b6b5	Add 'repeat' argument to profiler.schedule (#51630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51630 Reviewed By: gdankel Differential Revision: D26246317 Pulled By: ilia-cher fbshipit-source-id: 28b572c837184fe1b2a07dd57e99aa72cb93a9cb	2021-02-04 13:51:04 -08:00
Howard Huang	62aea33d7f	Revert D26237328: Add compare_set operation and test to TCPStore Test Plan: revert-hammer Differential Revision: D26237328 (`7d00aec6bc`) Original commit changeset: c6837a4cc34f fbshipit-source-id: 662f8067ead9bce0da13b35d393fb781635dd2b9	2021-02-04 13:43:05 -08:00
guyang3532	ecfb73aaca	Update docs for torch.profiler.tensorboard_trace_handler (#51636 ) Summary: ![image](https://user-images.githubusercontent.com/62738430/106856207-17f8c000-66f9-11eb-80c9-844f79de423e.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51636 Reviewed By: orionr Differential Revision: D26246309 Pulled By: ilia-cher fbshipit-source-id: 083868e9231727638238c5f5ca31e3566d5e2e7e	2021-02-04 13:32:59 -08:00
Horace He	d4d5f8569f	[FX] Fix mypy error in FX for rewriter (#51740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51740 Reviewed By: jamesr66a Differential Revision: D26261009 Pulled By: Chillee fbshipit-source-id: ce97316aede5509fc8ed90b4eb6b758e2bc1fa7a	2021-02-04 13:15:51 -08:00
Peter Bell	b150f150ba	Add division overload with rounding_mode selection (#51706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51706 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50280 As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26123271 Pulled By: mruberry fbshipit-source-id: 51a83717602114597ec9c4d946e35a392eb01d46	2021-02-04 13:08:36 -08:00
James Reed	949ab213dd	Revert "Revert D26246231: [FX] Edits after comprehensive pass over docs" (#51728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51728 This reverts commit 6c80fd005f23a55b3e4e655e867e0eed493ee416. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26254130 Pulled By: jamesr66a fbshipit-source-id: f301688f85c512076fee9b83a986677ef893d2c5	2021-02-04 13:01:09 -08:00
BowenBao	8c0da1f5e9	[ONNX] Modifications in remove inplace ops passes to better handle binary inplace ops (#51318 ) (#51572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51572 Modifications in remove_inplace_ops_for_onnx pass and remove_inplace_ops pass to better handle binary inplace ops * Handles special case of binary inplace ops, where the first input node has a lower type precedence than the second input node. * When the inplace node is converted to a regular op, this information is lost and the resulting type is based on type precedence, just like regular ops. To avoid this loss of information, we add a cast node before the input node with the higher data type precedence, so that both the input types are the same. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26203117 Pulled By: SplitInfinity fbshipit-source-id: f018b503701b9067dba053c2764c3b92ef1abc38	2021-02-04 12:44:49 -08:00
BowenBao	c7f1595b19	fix bug (#51222 ) (#51527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51527 Fix bug in scatter_add symbolic Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26203119 Pulled By: SplitInfinity fbshipit-source-id: e61f024e2daa7bc396fb264b8823a72ebf94ccdb	2021-02-04 12:44:44 -08:00
BowenBao	25b18bb5d7	[ONNX] Support list remove for onnx export (#51373 ) (#51526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51526 * Support aten::Delete * Refactor prepare_inplace_ops_for_onnx into one pass. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26203114 Pulled By: SplitInfinity fbshipit-source-id: ce940bca54a30c39f4b0810f62b0e7b497508f59	2021-02-04 12:44:40 -08:00
BowenBao	6d47e2cff8	[ONNX] Fix opset 11 ConstantChunk with negative dim (#51396 ) (#51525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51525 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26203115 Pulled By: SplitInfinity fbshipit-source-id: d76942f7cc5812c8a1cc16891e4956cc658283d8	2021-02-04 12:44:35 -08:00
BowenBao	ba824eb2d6	[ONNX] Update unsafe_chunk() method to support new version 13 of Split operator. (#51415 ) (#51524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51524 * def unsafe_chunk() support and test in ops13. * Use _unsqueeze_helper insteadof Unsqueeze operator * Cast the splits into long. * Change the test to a fixed dimension. * Update test_pytorch_onnx_onnxruntime.py * Disable test_loop_with_list for opset 13. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26203123 Pulled By: SplitInfinity fbshipit-source-id: b273aeff8339faa0e8e9f1fcfbf877d1b703209f Co-authored-by: Negin Raoof <neginmr@utexas.edu>	2021-02-04 12:44:31 -08:00
BowenBao	8ae6b0c5f9	[ONNX] Enable Constant Folding for ONNX Opset 13 (#51096 ) (#51523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51523 * Enable Constant Folding for ONNX Opset 13 * fix CI clang-diagnostic * fix integers type * fix comments:sort axes and support negative number * update squeeze op constant folding * fix format warning * fix clang-format issue Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26203111 Pulled By: SplitInfinity fbshipit-source-id: c33637ab39db614207bd442c6ab464bd09339b4a Co-authored-by: hwangdeyu <deyhuang@qq.com>	2021-02-04 12:44:26 -08:00
BowenBao	1c7d966432	Update error message that displays when encountering an op unsupported for ONNX export. (#51387 ) (#51522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51522 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26203121 Pulled By: SplitInfinity fbshipit-source-id: 5920995b735cecb500b12948b8ad91803e576dcb	2021-02-04 12:44:22 -08:00
BowenBao	586c2e8d62	[ONNX] Fix graph sequence output from loop node (#51305 ) (#51521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51521 * Add loop & if node to the list of nodes that could produce sequence type output. * Switch from `[]` to `at()` to avoid segfault of out of range access. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26203112 Pulled By: SplitInfinity fbshipit-source-id: e990eeed933124b195be0be159271e33fb485063	2021-02-04 12:44:17 -08:00
BowenBao	3cc46002a3	[ONNX] Fix graph position to insert clone node for inplace op removal (#50123 ) (#51520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51520 Previous insertBefore approach might end-up inserting clone node in inner sub-blocks, while then the node being used later at other outside call sites. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26203124 Pulled By: SplitInfinity fbshipit-source-id: 999511e901ad1087f360bb689fcdfc3743c78aa4	2021-02-04 12:44:12 -08:00
BowenBao	0e7e4d4217	[ONNX] Add silu operator support for onnx (#51193 ) (#51519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51519 Support for yolov5 compound-scaled object detection models export. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26203120 Pulled By: SplitInfinity fbshipit-source-id: c70bd730ee5d6f8bdebaf8ff764b94ffe7673808	2021-02-04 12:44:08 -08:00
BowenBao	9191b639ba	[ONNX] Enable remaining failed tests in opset13 (#50806 ) (#51518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51518 * enable remaining test in opset13 * add comments for error version test info * fix comments:opset12 unbind problem * add ignore[no-redef] * fix format Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26203122 Pulled By: SplitInfinity fbshipit-source-id: e7d95bd2ce13f79f11965be82f640379cd55ff0f Co-authored-by: hwangdeyu <deyhuang@qq.com>	2021-02-04 12:44:04 -08:00
BowenBao	3f185ac18e	[ONNX] Export get/set attribute nodes (#50768 ) (#51517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51517 Fix get/set attributes when getting/setting a model parameter. This PR also fixes inplace ops in If blocks. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26203116 Pulled By: SplitInfinity fbshipit-source-id: bed6ee6dd92b5b43febc8c584a6872290f8fe33f	2021-02-04 12:43:59 -08:00
BowenBao	1829268e7f	[ONNX] Improve error message for parse_arg in symbolic functions (#50512 ) (#51516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51516 previous error message looks like this ``` RuntimeError: Unexpected node type: onnx::Gather ``` now ``` RuntimeError: Expected node type 'onnx::Constant' for argument 'groups' of node 'conv1d', got 'onnx::Gather'. ``` Repro example: ```python torch.jit.script def conv(x, w): return F.conv1d(x, w, groups=x.shape[0]) class Net(nn.Module): def forward(self, x, w): return conv(x, w) model = Net() x = torch.randn(8, 8, 512) w = torch.randn(8, 1, 3) torch.onnx.export(model, (x, w), "file.onnx", opset_version=12) ``` Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26203118 Pulled By: SplitInfinity fbshipit-source-id: 607b22f4cba4baa24154f197914b6817449ab9f8	2021-02-04 12:43:54 -08:00
BowenBao	8dd9fefacb	[ONNX] Fix bug in unfold symbolic (#50504 ) (#51515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51515 Fix bug in unfold symbolic Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26203113 Pulled By: SplitInfinity fbshipit-source-id: 3a1b0013624d918de762a88ac6de8c9cafa0f732	2021-02-04 12:43:50 -08:00
BowenBao	7255b3f6b7	[ONNX] Update constant-folding of Gather op (#50554 ) (#51514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51514 Update constant-folding of Gather operator so it also includes cases where rank of indices input is 0. Currently it only support cases where rank of indices is 1. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26191323 Pulled By: SplitInfinity fbshipit-source-id: 7edcbd8835b0248fefb908aca394f5cca5eae29e	2021-02-04 12:40:30 -08:00
Horace He	2d305b97e9	[FX] Added partial concrete values for symbolic tracing (#51609 ) Summary: Currently it's passed in a dict but might be worth considering whether we want to support other methods of passing it in (like a list corresponding to the positional args). Pull Request resolved: https://github.com/pytorch/pytorch/pull/51609 Reviewed By: zou3519 Differential Revision: D26224464 Pulled By: Chillee fbshipit-source-id: 305769db1a6e5fdcfb9e7dcacfdf153acd057a5a	2021-02-04 12:06:02 -08:00
Jeffrey Wan	2e8e560cdf	Fix anomaly mode memory leak (#51610 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51349 The memory leak happens when 1) `create_graph` is True AND 2) detect anomaly mode is on. When a backward node's constructor is called during backward, the current evaluating node is assigned as a "parent" of the created node. The code that assigns the parent encounters the below issue: `functionToPyObject(parent_node)` returns a new PyObject (with refcount 1) or if PyObject already exists, increments its refcount by 1. However [PyDict_SetItem](`1b55b65638/Objects/dictobject.c (L1532)`) calls into [insertdict](https://github.com/python/cpython/blob/v3.8.1/Objects/dictobject.c#L1034) which increments refcount again. This means that when dict is destroyed, the refcount of the PyObject is at least one. This keeps `parent_node` (the backward function) alive, which then keeps the saved tensor alive. Similar calls in the codebase to `functionToPyObject` won't require Py_DECREF if it is then passed into a tuple (instead of dict), because the analogous PyTuple_SetItem call does not increment refcount. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51610 Reviewed By: albanD Differential Revision: D26240336 Pulled By: soulitzer fbshipit-source-id: 2854528f66fab9dbce448f8a7ba732ce386a7310	2021-02-04 11:53:37 -08:00
Sam Estep	0222966ecd	Fix several minor things in .circleci/README.md (#51724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51724 Reviewed By: walterddr Differential Revision: D26252671 Pulled By: samestep fbshipit-source-id: 53781c391e3b54f3896e88bce07f7ee66a19ac92	2021-02-04 11:43:59 -08:00
Raghuraman Krishnamoorthi	14273126d2	Numeric Suite: Swap with shadow modules only for quantized part of model (#51052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51052 Ensure that shadow modules are inserted only for quantized modules in a model. Removes redundant module insertion. ghstack-source-id: 121041113 Test Plan: buck test caffe2/test:quantization -- 'test_compare_model_stub_partial $quantization\.test_numeric_suite\.TestEagerModeNumericSuite$' Reviewed By: vkuzo Differential Revision: D26054016 fbshipit-source-id: 73fc2fd2f0239b0363f358c80e34566d06a0c7cb	2021-02-04 11:40:30 -08:00
Joel Schlosser	a0137808a7	Note on Modules for 1.8 docs (#51536 ) Summary: A new note on Modules for 1.8 documentation. Rendered form can be seen here: https://alband.github.io/doc_view/notes/modules.html (thanks Alban!) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51536 Reviewed By: albanD Differential Revision: D26254282 Pulled By: jbschlosser fbshipit-source-id: 09cbd46aa268a29b6f54fd48ffe1d6b98db0ff31	2021-02-04 11:28:11 -08:00
Jane Xu	de9364aef2	fixes clang-tidy-11 install by using ubuntu18.04 instead of 20.04 (#51725 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/51725 Reviewed By: walterddr Differential Revision: D26255539 Pulled By: janeyx99 fbshipit-source-id: 1b4459e0c474938c134c529501c6c04106d5b18e	2021-02-04 11:20:23 -08:00
kshitij12345	1e2df9e46d	[cuda] masked_scatter : static_cast init_value to circumvent cuda 11.2 issue (#51614 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51544 Tested locally as there is no CI for 11.2 as of now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51614 Reviewed By: malfet Differential Revision: D26253965 Pulled By: ngimel fbshipit-source-id: 6d666a54871510ad0d00f915e45bbebcebc93015	2021-02-04 10:44:49 -08:00
Howard Huang	7d00aec6bc	Add compare_set operation and test to TCPStore (#51593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51593 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D26237328 Pulled By: H-Huang fbshipit-source-id: c6837a4cc34f8247df6e1c29c1f40fd9e7953313	2021-02-04 10:36:58 -08:00
Michael Suo	003a240e68	[package] use WeakValueDictionary for global imported module registry (#51666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51666 This ensures the modules will get properly unloaded when all references to them die Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D26232574 Pulled By: suo fbshipit-source-id: a9889965aa35ba2f6cbbfbdd13e02357cc706cab	2021-02-04 09:42:18 -08:00
Alban Desmaison	6c80fd005f	Revert D26246231: [FX] Edits after comprehensive pass over docs Test Plan: revert-hammer Differential Revision: D26246231 (`c22bc4821d`) Original commit changeset: 8d6278a9fe1d fbshipit-source-id: fdc83289f8fe7986bc02181eec55e4e72be2d812	2021-02-04 09:26:21 -08:00
Edward Yang	4d85e30133	Support at::cpu on non-structured kernels (#51590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51590 This PR backports a subset of Jiakai's changes from https://github.com/pytorch/pytorch/pull/51554 that adds support for at::cpu in non-structured kernels. The unusual bits: - Need to add a new forward inference rule for doing conversions of const optional<Tensor>& to const Tensor& - Need to give the wrapper functions a prefix so that the call to wrapper is not ambiguous Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D26209871 Pulled By: ezyang fbshipit-source-id: 8162686039675ab92a2af7a14f6b18941f8944df	2021-02-04 09:19:45 -08:00
Edward Yang	668e0f3598	Split anonymous and namespaced definitions in RegisterDispatchKey (#51585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51585 Some payoff from the stack of refactors. When I initially landed at::cpu, Brian asked me why I couldn't just separate the anonymous and namespaced definitions. Well, it used to be annoying. Now it's not annoying anymore, so go ahead and split them up. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D26209873 Pulled By: ezyang fbshipit-source-id: 63057d22acfaa0c17229947d9e65ec1193e360ec	2021-02-04 09:19:41 -08:00
Edward Yang	a626b78467	Factor out structured generation into its own subclass. (#51583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51583 There are no substantive changes in this PR. The cluster of structured helper methods is now split off into its own class. To make sure all of the original closure was available, I subclassed RegisterDispatchKey and passed it all on; the only new thing closed over is the structured functions group being processed. I also renamed all the methods to remove structured_ from their names as it is now redundant. Most of the benefit is being able to remove a level of indentation from gen_one. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D26209872 Pulled By: ezyang fbshipit-source-id: 76c11410a24968d4f3d8a2bbc9392251a7439e6e	2021-02-04 09:19:37 -08:00
Edward Yang	93c4f9f972	Split out RegisterDispatchKey to its own file (#51508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51508 No substantive changes. The codegen for this file was getting a bit long so I moved it off into tools.codegen.dest submodule (I wanted to do tools.codegen.gen but that conflicts with the existing module; oy vey!) To do this I had to move some other functions around so that they were more generally accessible. Otherwise self-explanatory. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D26187856 Pulled By: ezyang fbshipit-source-id: fd3784571d03d01c4acb7ca589fcde4492526408	2021-02-04 09:19:32 -08:00
Edward Yang	6045663f39	Use Literal to model targets. (#51500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51500 I'm going to add some new Target types shortly, so having tighter types for the individual unions will make it clearer which ones are valid. This is also the first use of typing_extensions in the codegen, and I want to make sure it works. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D26187854 Pulled By: ezyang fbshipit-source-id: 6a9842f19b3f243b90b210597934db902b816c21	2021-02-04 09:16:22 -08:00
James Reed	c22bc4821d	[FX] Edits after comprehensive pass over docs (#51705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51705 Pull Request resolved: #51679 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D26246231 Pulled By: jamesr66a fbshipit-source-id: 8d6278a9fe1da5e6c34eff4fedc4c7e18533fe0f	2021-02-04 08:11:07 -08:00
albanD	9920ae665b	Make te a hidden package for now (#51690 ) Summary: As discussed with suo , having it in `torch._C.XX` means that it automatically gets added to `torch.XX` which is unfortunate. Making it `torch._C._XX` means that it won't be added to `torch.`. Let me know if that approach to hide it is not good and we can update that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51690 Reviewed By: gchanan Differential Revision: D26243207 Pulled By: albanD fbshipit-source-id: 3eb91a96635e90a6b98df799e3a732833dd280d5	2021-02-04 07:58:38 -08:00
nikithamalgi	ecf8166522	Support Union[NoneType, T] as input type (#51605 ) Summary: ghstack-source-id: 32db9661ce0f9441ef7061285bc24967c2808ea6 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51605 Fixes https://github.com/pytorch/pytorch/issues/51582 ========= In Python 3.9+ Union[T, NoneType] and Union[NoneType, T] as OptionalType. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51606 Test Plan: ==== python test/test_jit.py -v TestJit.test_union_to_optional Reviewed By: pbelevich Differential Revision: D26242353 Pulled By: nikithamalgifb fbshipit-source-id: 0ac441fa1bdf2fb1044e3fe131bee47adda90bbb	2021-02-04 06:25:41 -08:00
Ilia Cherniavskii	f1f9b049d8	[profiler] Support top-level memory events (#51421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51421 Mark memory events that did not happen within an operator context explicitly in the profiler output. Test Plan: python test/test_profiler.py -k test_memory_profiler Reviewed By: ngimel Differential Revision: D26166518 Pulled By: ilia-cher fbshipit-source-id: 3c14d3ac25a7137733ea7cc65f0eb48693a98f5e	2021-02-04 04:14:15 -08:00
Ilia Cherniavskii	a9584f29c1	Fix attribution of some CUDA events to CPU events (#51632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51632 Some fixes: - attribute CUDA Runtime events to proper PyTorch CPU events - make sure we don't accidentally attribute some CUDA kernels to the CUDA Runtime events that have semantically different ids - minor fixes in the output Test Plan: CI https://gist.github.com/ilia-cher/0e78d0440fe02b77ff6721571c14f01c https://gist.github.com/ilia-cher/8f655cf15beb1b11547fd3564a1c3958 Reviewed By: gdankel Differential Revision: D26222734 Pulled By: ilia-cher fbshipit-source-id: 13571dbeea0222ee1a531edacd1f4153f1e38da3	2021-02-04 03:54:02 -08:00
Ilia Cherniavskii	d6452a1a0c	[profiler] Default activities value (#51561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51561 Using CPU as a default activities value https://github.com/pytorch/pytorch/issues/51337 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26198910 Pulled By: ilia-cher fbshipit-source-id: 7d7b227059a8eb48dc600a5ec077dd811fd9c8b4	2021-02-04 03:50:29 -08:00
Teng Gao	7abba67d8c	add dumping callstack to kineto (#51565 ) Summary: In profiler, pass operators' callstack to kineto and dump them into chrome tracing file. The kineto side update is merged [here](`66a4cad380`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51565 Reviewed By: malfet Differential Revision: D26219324 Pulled By: ilia-cher fbshipit-source-id: 96ac818012336602368647ff7b75048070f63b28	2021-02-04 03:30:32 -08:00
Yanan Cao	8c3e0ddbc6	[Usability] Tolerate `torch.jit.script` call to Enum classes (#51624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51624 Reviewed By: SplitInfinity Differential Revision: D26244694 Pulled By: gmagogsfm fbshipit-source-id: c87a068cd11d6f497fa48dc206215300c55d6539	2021-02-04 01:51:49 -08:00
Thomas Viehmann	86861095fa	Graceful invalidation of Python Node/Value/Block when C++ object is deleted (#50326 ) Summary: Previously we might have gotten segfaults and all, now it raises an exception. Thread safety hasn't been an objective. I have a followup to expand the Python interface for the API. Fixes https://github.com/pytorch/pytorch/issues/49969. wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/50326 Reviewed By: pbelevich Differential Revision: D26096234 Pulled By: gmagogsfm fbshipit-source-id: 5425772002eb4deb3830ed51eaa3964f22505840	2021-02-04 01:34:46 -08:00
Taylor Robie	c8af338407	Expand benchmark utils docs (#51664 ) Summary: Add some much needed documentation on the Timer callgrind output format, and expand what is shown on the website. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51664 Reviewed By: tugsbayasgalan Differential Revision: D26246675 Pulled By: robieta fbshipit-source-id: 7a07ff35cae07bd2da111029242a5dc8de21403c	2021-02-04 00:22:41 -08:00
Natalia Gimelshein	1518aee639	unbreak bc test (#51702 ) Summary: Caused by https://github.com/pytorch/pytorch/issues/48223 revert Pull Request resolved: https://github.com/pytorch/pytorch/pull/51702 Reviewed By: mruberry Differential Revision: D26245905 Pulled By: ngimel fbshipit-source-id: 9fd7860ecb5c22b2e568db3347d51e648d6c5d6b	2021-02-03 23:03:26 -08:00
Akshit Khurana	6a945bfb5c	Fix memory leak in qnnpack ops (#51612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51612 Test Plan: `pytest test/quantization/test_quantized_op.py` passes Reviewed By: kimishpatel, dhruvbird Differential Revision: D26217925 Pulled By: axitkhurana fbshipit-source-id: f422a868d34ea5fe122fcdcce8b80c7859bfc415	2021-02-03 22:40:58 -08:00
Joel Schlosser	e60f18c2ad	Generate header with version #defines for LibTorch (#50073 ) Summary: Uses cmake's `configure_file()` macro to generate a new `torch/csrc/api/include/torch/version.h` header with `TORCH_VERSION_{MAJOR,MINOR,PATCH}` \#defines from an input file `torch/csrc/api/include/torch/version.h.in`. For Bazel builds, this is accomplished with `header_template_rule()`. For Buck builds, this is accomplished with `fb_native.genrule()`. Fixes https://github.com/pytorch/pytorch/issues/44365 <img width="1229" alt="Screen Shot 2021-01-05 at 3 19 24 PM" src="https://user-images.githubusercontent.com/75754324/103809279-3fd80380-5027-11eb-9039-fd23922cebd5.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/50073 Reviewed By: glaringlee Differential Revision: D25855877 Pulled By: jbschlosser fbshipit-source-id: 6bb792718c97e2c2dbaa74b7b7b831a4f6938e49	2021-02-03 22:18:53 -08:00
Martin Yuan	23c50a4a50	[PyTorch Mobile] Support torchbind custom classes in lite interpreter (#51432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51432 ghstack-source-id: 120976584 torchbind is a convenient way to include custom class to both python and torchscript. CREATE_OBJECT is used to create an object of custom class. CREATE_OBJECT was not supported by lite interpreter. The major reason was that for custom class directly defined in Python, there's no language parser in lite interpreter. It's still the case. However, for torchbind classes that are defined in C++, a python/torchscript parser is not needed. This diff is to support the case of torchbind custom classes. 1. The class type can be resolved at import level. 2. If the class is not the supported torchbind class, an error message is provided at export stage. Workaround is also suggested. 3. Unit tests. C++: ```LiteInterpreterTest::BuiltinClass``` is added as an end-to-end test on supported class. Python: ```test_unsupported_createobject``` is changed to ```test_unsupported_classtype``` to test unsupported classes. Test Plan: CI Reviewed By: raziel Differential Revision: D26168913 fbshipit-source-id: 74e8b6a12682ad8e9c39afdfd2b605c5f8e65427	2021-02-03 21:57:19 -08:00
Zafar	1ffd26f8d8	[quant] Add reflection padding to conv (#49011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49011 Differential Revision: D25394384 Test Plan: Imported from OSS Reviewed By: ayush29feb Pulled By: z-a-f fbshipit-source-id: 256aded53c3c6555772aacfc5b0bbd32ef24c972	2021-02-03 21:44:12 -08:00
Kurt Mohler	c41678fd53	Use deterministic impl of `index_put` and `index` backward CPU when `torch.are_deterministic_algorithms_enabled() == True` (#51388 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51366 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51388 Reviewed By: zou3519 Differential Revision: D26235290 Pulled By: ngimel fbshipit-source-id: 64cce1a5e75d8a9ce9807c28d641da82ede666e2	2021-02-03 21:37:33 -08:00
Horace He	f1a63b7c10	[FX] Added how to write transformations section (#51278 ) Summary: ![image](https://user-images.githubusercontent.com/6355099/106121588-b8614a00-6125-11eb-923f-fcdf575cd6cd.png) I still need to add links to vmap/grad/decomposition, but those haven't been added to the examples folder yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51278 Reviewed By: zou3519 Differential Revision: D26223103 Pulled By: Chillee fbshipit-source-id: 3ad9bf76cd3438743edecdc17c44f8d1e00e5ea1	2021-02-03 21:32:43 -08:00
anjali411	bd3ae117fc	Fixes cat backward formula to return correct gradient values for R -> C case (#51681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51681 Fixes https://github.com/pytorch/pytorch/issues/51627 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D26238748 Pulled By: anjali411 fbshipit-source-id: 1dc47f8ddddbf3f2c176f21e5dcee917f84f4c93	2021-02-03 21:29:55 -08:00
Jerry Zhang	d8742eeed0	[quant] Support 2 dim input in quantized batchnorm 1d (#51597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51597 aliging quantized batchnorm behavior with fp batchnorm Test Plan: python test/test_quantization.py TestQuantizedOps.test_batch_norm python test/test_quantization.py TestQuantizedOps.test_batch_norm_relu Imported from OSS Reviewed By: vkuzo Differential Revision: D26212489 fbshipit-source-id: 663d5d70cc82ea5cc68e66452590efe1342998f1	2021-02-03 21:05:03 -08:00
Hongtao Yu	5d123ecf2f	Fix caffee2 for LLVM trunk Summary: LLVM trunk is at 13 now. I'm relaxing the places that only supports up to 12. Test Plan: Try Sandcastle staging builds. Reviewed By: ayermolo Differential Revision: D26227448 fbshipit-source-id: 0b69a9c135b34db4de94b82ee38d2fb1b328888b	2021-02-03 20:45:11 -08:00
Haichuan Yang	0c60922fb0	mem-efficient learnable fake quantization (#49315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49315 Update the learnable fake-quantization to use c++ and CUDA kernels, and resolve some issues on using it with pytorch DDP. The updated quantization operator have a different gradient calculation for scale and zero_point when the output is at the endpoints of clamp operation. The updated quantization operator calculates the gradient according to the gradient of the `clamp` function. This behavior is consistent with the gradient calculation for non-learnable fake-quant ops. ghstack-source-id: 120821868 Test Plan: # learnable_fake_quantization forward/backward op test ## Unit Test: `buck test mode/dev-nosan -c fbcode.platform=platform009 //caffe2/test:quantization -- -v TestFakeQuantize` ## Benchmark Test: `buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:quantization_test -- --operators FakeQuantizePerTensorOpBenchmark` `buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:quantization_test -- --operators FakeQuantizePerChannelOpBenchmark` ### In microseconds (`1e-6` second), References: P171624031 input size: [1, 3, 256, 256] \| \| C++ Kernel \| Non-backprop C++ Kernel \| \|---------------------------\|---------------\|------------\|-------------------------\|---\| \| Per Tensor CPU Forward \| 1372.123 \| 1365.981 \| \| Per Tensor Cuda Forward \| 84.586 \| 27.205\| \| Per Channel CPU Forward \| 2306.668 \| 2299.991\| \| Per Channel Cuda Forward \| 154.742 \| 135.219 \| \| Per Tensor CPU Backward \| 2544.617 \| 581.268\| \| Per Tensor Cuda Backward \| 304.529 \| 137.335\| \| Per Channel CPU Backward \| 3328.188 \|582.088 \| \| Per Channel Cuda Backward \| 504.176 \| 134.082\| input size: [1, 3, 512, 512] \| \| C++ Kernel \| Non-backprop C++ Kernel \| \|---------------------------\|---------------\|------------\|-------------------------\|---\| \| Per Tensor CPU Forward \| 5426.244 \| 5726.440 \| \| Per Tensor Cuda Forward \| 85.834 \| 26.871\| \| Per Channel CPU Forward \| 9125.913 \| 9118.152\| \| Per Channel Cuda Forward \| 159.599 \| 145.117 \| \| Per Tensor CPU Backward \| 14020.830 \| 2214.864\| \| Per Tensor Cuda Backward \| 285.525 \| 131.302\| \| Per Channel CPU Backward \| 16977.141 \|2104.345 \| \| Per Channel Cuda Backward \| 541.511 \| 120.222\| # use learnable_fake_quantization in AI-denoising QAT: f229412681 Reviewed By: raghuramank100 Differential Revision: D24479735 fbshipit-source-id: 5275596f3ce8200525f4d9d07d0c913afdf8b43a	2021-02-03 18:57:47 -08:00
James Reed	7918f37e8c	[FX] Move examples to pytorch/examples (#51686 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51686 Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D26241146 Pulled By: jamesr66a fbshipit-source-id: b9cda75997fb98afd0e59ea78074fd7bd26ecebf	2021-02-03 18:41:11 -08:00
Ansley Ussery	f2c4deabeb	Extend subgraph_rewriter logic (#51532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51532 - Change output of `replace_pattern` to `List[Match]` reflecting the pattern(s) matched in the original graph - Ensure that all Callables (not just FunctionType objects) work with the rewriter - Fix incorrect matching in degenerate case (`test_subgraph_rewriter_correct_output_replacement`) - Verify that pattern matching works when pattern and original graph are the same Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D26193082 Pulled By: ansley fbshipit-source-id: 7f40c3862012a44adb88f403ade7afc37e50417f	2021-02-03 18:14:37 -08:00
Nikita Shulga	627ec8badf	Type-annotate tools/generate_torch_version (#51637 ) Summary: And add it to mypy.ini Pull Request resolved: https://github.com/pytorch/pytorch/pull/51637 Reviewed By: janeyx99 Differential Revision: D26225123 Pulled By: malfet fbshipit-source-id: d70d539ae58a14321e82f4592aaa44b3ce6b6358	2021-02-03 18:07:01 -08:00
Vincent Quenneville-Belair	50d903f19f	[optim] make functional api be private (#51316 ) (#51665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51665 This reverts commit 896f82aa92eb7557229053a21da786f5927e64e0. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D26232608 Pulled By: vincentqb fbshipit-source-id: ca006baf4fb672c11c1bb003c39a29cbadb63dd3	2021-02-03 17:59:05 -08:00
Richard Zou	45e5562fcc	Beef up {jacobian, hessian} vectorize docs; eliminate a warning (#51638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51638 This PR makes the following doc changes: - Makes it clear to users that they should use vectorize "at their own risk" - Makes it clear that vectorize uses the "experimental prototype vmap" so that when users see error messages related to vmap they will know where it is coming from. This PR also: - makes it so that {jacobian, hessian} call a version of vmap that doesn't warn the user that they are using an "experimental prototype". The regular torch.vmap API does warn the user about this. This is to improve a UX a little because the user already knows from discovering the flag and reading the docs what they are getting themselves into. Test Plan: - Add test that {jacobian, hessian} with vectorize=True don't raise warnings Reviewed By: albanD Differential Revision: D26225402 Pulled By: zou3519 fbshipit-source-id: 1a6db920ecf10597fb2e0c6576f510507d999c34	2021-02-03 17:15:16 -08:00
Natalia Gimelshein	443a431ac3	Revert D25074763: [WIP] Update foreach APIs to use scalar lists Test Plan: revert-hammer Differential Revision: D25074763 (`cce84b5ca5`) Original commit changeset: 155e3d2073a2 fbshipit-source-id: ef0d153e2740b50bd4a95f7a57c370bb5da46355	2021-02-03 17:06:40 -08:00
Natalia Gimelshein	d1bc1ab8ca	Revert D25502940: Refactor ForeachUnaryOps.cu Test Plan: revert-hammer Differential Revision: D25502940 (`5cf3278723`) Original commit changeset: fce2f18a4f62 fbshipit-source-id: 2cef82bd3cb34783d9a0c6c16cc4321abab31932	2021-02-03 17:02:11 -08:00
Mike Ruberry	16cfe970e0	Updates linalg documentation per feature review process (#51620 ) Summary: Notes the module is in beta and that the policy for returning optionally computed tensors may change in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51620 Reviewed By: heitorschueroff Differential Revision: D26220254 Pulled By: mruberry fbshipit-source-id: edf78fe448d948b43240e138d6d21b780324e41e	2021-02-03 16:11:57 -08:00
lixinyu	1ee0c42d6d	move ZipDataset to Zip DataPipe (#51599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51599 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26212859 Pulled By: glaringlee fbshipit-source-id: 3fabcf8876d3c9c24339dbf6a12e0bb04b400108	2021-02-03 15:42:59 -08:00
anjali411	34d4d79966	Autograd doc note fix (#51661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51661 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D26230912 Pulled By: anjali411 fbshipit-source-id: 94323d7bce631a4c5781020e9650495461119ede	2021-02-03 15:08:35 -08:00
Marat Subkhankulov	0d9ca21d74	[Static Runtime] Native stack for contiguous inputs (#50863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50863 - Avoid calling unsqueeze on every input tensor by copying data directly - Model benchmark shows small improvement: -2.3% (b=1), -1.1% (b=20) - This diff does not yet modify torch::stack implementation, only the static_runtime path. A followup diff will do this. Test Plan: # Test ``` buck test //caffe2/aten:native_test buck run //caffe2/test:torch ``` # Op benchmark expected no changes here because this diff only touches static runtime ``` Baseline \|Native \|Change 6.38 \|6.336 \|-0.69% 6.553 \|6.588 \|0.53% 14.904 \|14.883 \|-0.14% 5.657 \|5.68 \|0.41% 5.612 \|5.795 \|3.26% 6.051 \|6.058 \|0.12% 4.225 \|4.252 \|0.64% 4.24 \|4.294 \|1.27% 6.28 \|4.249 \|-32.34% 6.267 \|4.257 \|-32.07% 418.932 \|404.356 \|-3.48% 417.694 \|404.752 \|-3.10% 1592.455 \|1583.277 \|-0.58% 2919.261 \|2685.636 \|-8.00% 211.458 \|202.838 \|-4.08% 211.518 \|203.229 \|-3.92% 783.953 \|792.198 \|1.05% 1457.823 \|1348.824 \|-7.48% 2032.816 \|1975.961 \|-2.80% 2090.662 \|2000.612 \|-4.31% 6487.098 \|6635.41 \|2.29% 11874.702 \|10853.302 \|-8.60% 2123.83 \|2039.272 \|-3.98% 2195.453 \|2221.82 \|1.20% 6435.978 \|6593.363 \|2.45% 11852.205 \|10858.92 \|-8.38% 2036.526 \|1983.042 \|-2.63% 2055.618 \|2072.03 \|0.80% 6417.192 \|6681.064 \|4.11% 12468.744 \|10888.336 \|-12.67% 4959.704 \|4954.734 \|-0.10% 5121.823 \|4996.84 \|-2.44% 5082.105 \|5029.652 \|-1.03% 5395.936 \|5438.628 \|0.79% 5162.756 \|5114.147 \|-0.94% 23798.08 \|21884.065 \|-8.04% 4957.921 \|4972.01 \|0.28% 4971.234 \|4968.977 \|-0.05% 5005.909 \|5039.95 \|0.68% 5159.614 \|5180.426 \|0.40% 5013.221 \|5202.684 \|3.78% 20238.741 \|20212.581 \|-0.13% 7632.439 \|7610.345 \|-0.29% 7589.376 \|7679.148 \|1.18% 7859.937 \|7850.485 \|-0.12% 8214.213 \|8150.846 \|-0.77% 11606.562 \|11724.139 \|1.01% 34612.919 \|34817.677 \|0.59% ``` # Adindexer model benchmark ``` caffe2=0 batch={1\|20} profile=1 ./scripts/bwasti/static_runtime/run.sh ``` ## Baseline ``` Batch 1 0.00291311 ms. 3.97139%. aten::stack (1 nodes) Batch 20 0.00477447 ms. 0.934081%. aten::stack (1 nodes) ``` ## Native stack (this change) ``` Batch 1 0.00115161 ms. 1.67388%. aten::stack (1 nodes) Batch 20 0.00264831 ms. 0.543767%. aten::stack (1 nodes) ``` Reviewed By: hlu1 Differential Revision: D25988638 fbshipit-source-id: 82ce84c88963cae40dc5819004baf03ce9093ecc	2021-02-03 14:59:52 -08:00
Hong Xu	fe67438f32	Replace AT_ASSERTM in ATen/core (#51579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51579 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D26206404 Pulled By: ezyang fbshipit-source-id: b7e6a530b8ca3ebfa02c87037c37010f9ee0b0db	2021-02-03 14:42:30 -08:00
Hong Xu	c60dacd4cf	Replace all AT_ASSERTM in ATen/native (#51147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51147 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D26206403 Pulled By: ezyang fbshipit-source-id: 6a5c331337bf03b3dc29ceef8f7eeb4539b22c7f	2021-02-03 14:39:21 -08:00
Jerry Zhang	f38e1d2d60	[quant][graphmode][fx] Enable inception_v3 and googlenet static quant test (#51402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51402 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D26162805 fbshipit-source-id: 28ddc66f0593d28539dd6c6d3f617541e698d3bd	2021-02-03 14:32:00 -08:00
Luca Wehrstedt	8e53bf010d	Use new TensorPipe functions to create channels (#51550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51550 ghstack-source-id: 120931213 Test Plan: Export to CircleCI Reviewed By: beauby Differential Revision: D26147946 fbshipit-source-id: edd44b5edf7041efcc9662cc3bfc550663976fc1	2021-02-03 14:20:49 -08:00
Luca Wehrstedt	56ef24bc0f	Use new TensorPipe functions to create transports (#51549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51549 ghstack-source-id: 120931215 Test Plan: Export to CircleCI Reviewed By: beauby Differential Revision: D26147369 fbshipit-source-id: 43f58f27edec964c24c0bf4ea76f2a47695ee1ea	2021-02-03 14:17:49 -08:00
Helene	47557b95ef	Removed typographical error from tech docs (#51286 ) Summary: Dublications removed from tech docs. ![Screenshot](https://user-images.githubusercontent.com/71665475/106158807-6e5b8100-6184-11eb-9036-bccdf2086c31.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51286 Reviewed By: albanD Differential Revision: D26227627 Pulled By: ailzhang fbshipit-source-id: efa0cd90face458673b8530388378d5a7eb0f1cf	2021-02-03 14:09:36 -08:00
Edward Yang	333a0c8b6f	Add support for generating faithful at::cpu signatures (#51499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51499 I'm going to turn on at::cpu signatures on for all operators; before I do it I want to make sure I'm at feature parity everywhere. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D26187855 Pulled By: ezyang fbshipit-source-id: 8fdfd9d843fc98435b1f1df8b475d3184d87dc96	2021-02-03 14:03:50 -08:00
Edward Yang	81c7c3bae5	Add api.structured; switch structured kernels to use const Tensor& everywhere (#51490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51490 Mutable Tensor ref is a source of endless confusion for kernel writers; if we're going to make everyone rewrite their kernels, might as well also get rid of mutable Tensor& while we're at it. This is a refactor-then-small-update double whammy. The refactor is to separate tools.codegen.api.structured from api.native for describing the type signatures of structured kernels (previously, I was naughtily reusing native for this purpose--now I need it to behave differently as Tensor). This started off as a copy paste, but since there are not that many structured kernels so far I could delete all of the legacy logic from native that didn't make sense (without having to go out and fix all the use sites all at once). One more small addition was teaching translate to convert Tensor& to const Tensor&. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D26182413 Pulled By: ezyang fbshipit-source-id: ed636866add3581179669cf9283f9835fcaddc06	2021-02-03 14:03:46 -08:00
Edward Yang	648cdb7d0a	Relax type signature for tools.codegen.api.translate (#51477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51477 Passing in a full binding is still OK, but if you have less (e.g., an Expr/CType), that will do too. I'll need this for some codegen patches I'm doing later. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D26179560 Pulled By: ezyang fbshipit-source-id: 5730dfb2c91bf5325496e57b0c91eb6823c9194d	2021-02-03 14:00:47 -08:00
Yi Wang	43df03de13	[Gradient Compression] Replace torch.sqrt(torch.sum(col ** 2)) by torch.norm() (#51629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51629 Leverage the existing util functions as much as possible for potential performance gain. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 120919883 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl No performance regression: f248664994 uses `torch.norm()` ``` total: 32 GPUs -- 32 GPUs: p25: 1.050 30/s (batch size 32) p50: 1.230 26/s (batch size 32) p75: 1.449 22/s (batch size 32) p90: 1.611 19/s (batch size 32) p95: 1.702 18/s (batch size 32) backward: 32 GPUs -- 32 GPUs: p25: 0.769 41/s (batch size 32) p50: 0.920 34/s (batch size 32) p75: 1.139 28/s (batch size 32) p90: 1.322 24/s (batch size 32) p95: 1.440 22/s (batch size 32) ``` f248678690 does not use `torch.norm()` ``` total: 32 GPUs -- 32 GPUs: p25: 1.056 30/s (batch size 32) p50: 1.249 25/s (batch size 32) p75: 1.443 22/s (batch size 32) p90: 1.608 19/s (batch size 32) p95: 1.711 18/s (batch size 32) backward: 32 GPUs -- 32 GPUs: p25: 0.777 41/s (batch size 32) p50: 0.939 34/s (batch size 32) p75: 1.127 28/s (batch size 32) p90: 1.322 24/s (batch size 32) p95: 1.448 22/s (batch size 32) ``` Reviewed By: pritamdamania87 Differential Revision: D26219835 fbshipit-source-id: 31d8ad3401d4efced4a6069f4f1e169ea3372697	2021-02-03 13:39:11 -08:00
Hector Yuen	00675292ca	replace silufp16 with cubic interpolation (#51645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51645 added cubic interpolation Test Plan: increase the input domain, reduced the threshold to 0 Reviewed By: kausv Differential Revision: D26212239 fbshipit-source-id: e0813d8a4f3f54cfd0bf62e385cd28fa4a1976e8	2021-02-03 12:58:38 -08:00
Xu Zhao	cae4379826	Enable FLOPS Computation for Experimental Kineto Profiler (#51503 ) Summary: Add the FLOPS metric computation to the experimental Kineto profiler. This includes saving necessary extra arguments and compute flops in the C++ code, and extract the FLOPS value from the Python frontend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51503 Test Plan: Build PyTorch with USE_KINETO option, then run the unit test: ```python python test/test_profiler.py -k test_flops ``` Reviewed By: ilia-cher Differential Revision: D26202711 Pulled By: xuzhao9 fbshipit-source-id: 7dab7c513f454355a220b72859edb3ccbddcb3ff	2021-02-03 12:15:23 -08:00
Omkar Salpekar	3361d365bd	[Gloo] Use TORCH_CHECK for ensuring tag is nonnegative (#51370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51370 TORCH_CHECK should be used when confirming the correctness of function arguments like the tag passed to Gloo functions. ghstack-source-id: 120908449 Test Plan: Sandcastle/CI Reviewed By: mingzhe09088 Differential Revision: D26152359 fbshipit-source-id: ddffaa6f11393aaedaf0870759dc526d8d4530ee	2021-02-03 11:48:20 -08:00
Shen Li	a3f2fe0d52	Prevent CUDAFuture from using uninitialized device index (#51505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51505 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D26187380 Pulled By: mrshenli fbshipit-source-id: 437bb1244a65ee859458d9a87fdaef9f4dd20b59	2021-02-03 11:04:33 -08:00
Jasha	a651696ab4	fix misspelling in swa_utils.pyi (#51608 ) Summary: Change `avg_fun -> avg_fn` to match the spelling in the `.py` file. (`swa_utils.pyi` should match `swa_utils.py`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51608 Reviewed By: glaringlee Differential Revision: D26224779 Pulled By: zou3519 fbshipit-source-id: 01ff7173ba0a996f1b7a653438acb6b6b4659de6	2021-02-03 10:51:22 -08:00
Mikhail Zolotukhin	c639513378	[TensorExpr] Resubmit: Introduce ExternalCall nodes to TE IR. (#51594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51594 ExternalCall nodes represent opaque calls to external functions to fill a tensor (buffer) with values. It could be used to include nodes that are otherwise not-representable as TE, or whose TE representation is currently too slow. To make an external function available in NNC as ExternalCall, one needs to implement a "bridge" function that would take raw (void*) pointers to the data along with the arrays containing dimension info. This function would then internally call the desired external function and make sure the results of the call are correctly placed in the provided raw data buffers. The reason the PR was previously reverted was that the LLVM generated calls to bridge functions were breaking unwind tables. This is now fixed by requiring bridge functions to never throw and setting the corresponding attribute in the LLVM generated code. Differential Revision: D26213882 Test Plan: Imported from OSS Reviewed By: pbelevich, ngimel Pulled By: ZolotukhinM fbshipit-source-id: db954d8338e2d750c2bf0a41e88e38bd494f2945	2021-02-03 10:22:54 -08:00
anjali411	18a7ec7d7d	Update the JIT complex type name to be consistent with Python (#51476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51476 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D26179237 Pulled By: anjali411 fbshipit-source-id: 6a5c60c8545eb42416583836b8038ceffd3f3244	2021-02-03 09:59:08 -08:00
Vincent Quenneville-Belair	896f82aa92	[optim] make functional api be private (#51316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51316 Make optim functional API be private until we release with beta Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26213469 fbshipit-source-id: b0fd001a8362ec1c152250bcd57c7205ed893107	2021-02-03 09:29:33 -08:00
Jane Xu	550c965b2e	Re-enable test_standalone_load for Windows 11.1 (#51596 ) Summary: This fixes the previous erroring out by adding stricter conditions in cpp_extension.py. To test, run a split torch_cuda build on Windows with export BUILD_SPLIT_CUDA=ON && python setup.py develop and then run the following test: python test/test_utils.py TestStandaloneCPPJIT.test_load_standalone. It should pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51596 Reviewed By: malfet Differential Revision: D26213816 Pulled By: janeyx99 fbshipit-source-id: a752ce7f9ab9d73dcf56f952bed2f2e040614443	2021-02-03 08:58:34 -08:00
Jeff Daily	727f163bea	caffe2 test.sh pip might not need sudo if pip is root (#50223 ) Summary: Update logic in MAYBE_SUDO check. Assumption was incorrect that if pip was installed as user then sudo is needed. pip could be installed as root and run as root. Assumption was initially pip was root and user was non root. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50223 Reviewed By: H-Huang Differential Revision: D26212127 Pulled By: walterddr fbshipit-source-id: 20b316606b6c210dc705a972c13088fa3d9bfddd	2021-02-03 08:13:03 -08:00
Iurii Zdebskyi	5cf3278723	Refactor ForeachUnaryOps.cu (#49248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49248 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D25502940 Pulled By: izdeby fbshipit-source-id: fce2f18a4f62f7a5fdd6747707d006c3588530d1	2021-02-03 07:05:27 -08:00
Erjia Guan	52de407b4b	[DataLoader] Rename Functional DataSet to DataPipe (#51488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51488 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D26209888 Pulled By: ejguan fbshipit-source-id: cb8bc852b1e4d72be81e0297308a43954cd95332	2021-02-03 07:01:09 -08:00
Erjia Guan	bea0519b0b	[WIP][DataLoader] Implement BucketBatchIterableDataset (#51126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51126 BucketBatch: Get a chunk of data as a bucket, and sort the bucket by the specified key, then batching. If sort key is not specified, directly use batchIterableDS.. 1. Implement BucketBatch for bucket sampler 2. Improve BatchDS tests Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D26209890 Pulled By: ejguan fbshipit-source-id: 8519e2e49da158b3fe32913c8f3cadfa6f3ff1fc	2021-02-03 07:01:05 -08:00
Erjia Guan	14ee63f7e6	[WIP][DataLoader] Implement CallableIterableDataset (#50045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50045 Add CallableIterableDataset Modify CollateIterableDataset as another callable Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D26209889 Pulled By: ejguan fbshipit-source-id: d4773026c1269e43b29a3efb16e36e1865fdd024	2021-02-03 06:54:48 -08:00
Alban Desmaison	c311b8961a	Revert D26113953: [pytorch][PR] [ZeroRedundancyOptimizer] Elastic and pytorch compatible checkpoints Test Plan: revert-hammer Differential Revision: D26113953 (`bbe18e3527`) Original commit changeset: 030bfeee2c34 fbshipit-source-id: 6c1494ad01c2f96a15601329b4fce3fef4b38a01	2021-02-03 06:12:21 -08:00
Yanan Cao	75ee575671	[Usability] Handle repeated jit.script calls on function gracefully (#51545 ) Summary: Repeated calls on `class` is not handled since `class`'s compilation process will change soon in https://github.com/pytorch/pytorch/issues/44324 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51545 Reviewed By: H-Huang Differential Revision: D26207010 Pulled By: gmagogsfm fbshipit-source-id: 5f3f64b0e4b4ab4dbf5c9411d9c143472922a106	2021-02-03 02:09:25 -08:00
Dhruv Matani	7b556db69d	[PyTorch Mobile] Skip inferring function schema from the C++ function type (#50457 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50457 The code to infer function schema from a C++ function relies on templates and code expansion. This uses valuable binary size. We can avoid inferring the schema from the C++ function type (arguments, name, return value) in case that the function implementation is being added to the dispatcher via `m.impl`. In this case, it is assumed that we have a schema registered already. Adding an implementation via `m.def` still triggers schema inferrence. In addition, we don't do schema schema checks on mobile, so the schema is not needed in the first place. ghstack-source-id: 120915259 Test Plan: Auto-unit tests succeed. ### Size test: igios ``` D25853094-V1 (https://www.internalfb.com/intern/diff/D25853094/?dest_number=119632217) igios: Succeeded Change in Download Size for arm64 + 3x assets variation: -21.8 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -45.5 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:261049318687117@base/bsb:261049318687117@diff/ ``` ### Size test: fbios ``` D25853094-V1 (https://www.internalfb.com/intern/diff/D25853094/?dest_number=119632217) fbios: Succeeded Change in Download Size for arm64 + 3x assets variation: -27.2 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -80.1 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:454289062251865@base/bsb:454289062251865@diff/ ``` Reviewed By: smessmer Differential Revision: D25853094 fbshipit-source-id: e138d9dff7561d424bfb732f3a5898466f018f60	2021-02-03 00:37:35 -08:00
Gemfield	62f6e55439	Fix the missing parameter in get_sha function (#51290 ) Summary: get_sha() function didn't pass in the pytorch_root argument, so subprocess.check_output always raise exception since pytorch_root is not defined, thus always return 'Unknown'. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51290 Reviewed By: soumith Differential Revision: D26219051 Pulled By: malfet fbshipit-source-id: fee2c4f5fdfc61983559eec1600b9accb344c527	2021-02-02 23:25:57 -08:00
Ansley Ussery	ab4623da16	Document FX debugging (#51530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51530 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D26192641 Pulled By: ansley fbshipit-source-id: c69ab1bb2451d8ee5a729445f52bccc66e6f431b	2021-02-02 23:17:51 -08:00
Nikita Shulga	f7313b3105	Fix Python.h discovery logic on some MacOS platforms (#51586 ) Summary: On all non-Windows platforms we should use 'posix_prefix' schema to discover location of Python.h header Pull Request resolved: https://github.com/pytorch/pytorch/pull/51586 Reviewed By: ezyang Differential Revision: D26208684 Pulled By: malfet fbshipit-source-id: bafa6d79de42231629960c642d535f1fcf7a427f	2021-02-02 21:38:37 -08:00
Kimish Patel	7360ce36e4	[QNNPACK:Sparsity] Add A matrix pretransformed based sparse kernels for FC (#50587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50587 This diff introduces two kernesl. One is to pretransform A to do block wise transforms. And then the kernel that directly works on top pretransformed weights. Test Plan: ./build/local/q8gemm-sparse-test ./build/local/fully-connected-sparse-test Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D25925504 fbshipit-source-id: 9b02819405ce587f20e675b154895dc39ecd1bad	2021-02-02 21:33:02 -08:00
Kimish Patel	eb571b33fe	[QNNPACK Sparse] Create fc sparse operator (#50586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50586 Creates sparse operator for fully connected layer. Test Plan: ./build/local/fully-connected-sparse-test Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D25925503 fbshipit-source-id: 49042158ba3bf26a716a6d68258fc7ead85ce9d8	2021-02-02 21:32:58 -08:00
Kimish Patel	520f96b8c7	[QNNPACK] Block Sparse kernel. First commit. (#50585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50585 This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D25925500 fbshipit-source-id: e112cafd3226f8c11487c139cd414fa53a58fd0d	2021-02-02 21:30:24 -08:00
wanyu2018umac	444203c52f	Fix torch.cdist backward CUDA error due to illegal gridDim setting (#51569 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49928 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51569 Reviewed By: mruberry Differential Revision: D26215694 Pulled By: ngimel fbshipit-source-id: 0710417e6a802424e2dcada325f27452c95d042f	2021-02-02 20:41:24 -08:00
Gemfield	b48ee75507	Fix quantization doc issue (#50187 ) Summary: There has a description error in quantization.rst, fixed it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50187 Reviewed By: mrshenli Differential Revision: D25895294 Pulled By: soumith fbshipit-source-id: c0b2e7ba3fadfc0977ab2d4d4e9ed4f93694cedd	2021-02-02 20:33:21 -08:00
Jeffrey Wan	b18eeaa80a	Implement `np.diff` for single order differences (#50569 ) Summary: Implements `np.diff` for single order differences only: - method and function variants for `diff` and function variant for `diff_out` - supports out variant, but not in-place since shape changes - adds OpInfo entry, and test in `test_torch` - automatic autograd because we are using the `Math` dispatch _Update: we only support Tensors for prepend and append in this PR. See discussion below and comments for more details._ Currently there is a quirk in the c++ API based on how this is implemented: it is not possible to specify scalar prepend and appends without also specifying all 4 arguments. That is because the goal is to match NumPy's diff signature of `diff(int n=1, int dim=-1, Union[Scalar, Tensor] prepend=None, Union[Scalar, Tensor] append)=None` where all arguments are optional, positional and in the correct order. There are a couple blockers. One is c++ ambiguity. This prevents us from simply doing `diff(int n=1, int dim=-1, Scalar? prepend=None, Tensor? append=None)` etc for all combinations of {Tensor, Scalar} x {Tensor, Scalar}. Why not have append, prepend not have default args and then write out the whole power set of {Tensor, Scalar, omitted} x {Tensor, Scalar, omitted} you might ask. Aside from having to write 18 overloads, this is actually illegal because arguments with defaults must come after arguments without defaults. This would mean having to write `diff(prepend, append, n, dim)` which is not desired. Finally writing out the entire power set of all arguments n, dim, prepend, append is out of the question because that would actually involve 2 * 2 * 3 * 3 = 36 combinations. And if we include the out variant, that would be 72 overloads! With this in mind, the current way this is implemented is actually to still do `diff(int n=1, int dim=-1, Scalar? prepend=None, Tensor? append=None)`. But also make use of `cpp_no_default_args`. The idea is to only have one of the 4 {Tensor, Scalar} x {Tensor, Scalar} provide default arguments for the c++ api, and add `cpp_no_default_args` for the remaining 3 overloads. With this, Python api works as expected, but some calls such as `diff(prepend=1)` won't work on c++ api. We can optionally add 18 more overloads that cover the {dim, n, no-args} x {scalar-tensor, tensor-scalar, scalar-scalar} x {out, non-out} cases for c++ api. _[edit: counting is hard - just realized this number is still wrong. We should try to count the cases we do cover instead and subtract that from the total: (2 * 2 * 3 * 3) - (3 + 2^4) = 17. 3 comes from the 3 of 4 combinations of {tensor, scalar}^2 that we declare to be `cpp_no_default_args`, and the one remaining case that has default arguments has covers 2^4 cases. So actual count is 34 additional overloads to support all possible calls]_ _[edit: thanks to https://github.com/pytorch/pytorch/issues/50767 hacky_wrapper is no longer necessary; it is removed in the latest commit]_ hacky_wrapper was also necessary here because `Tensor?` will cause dispatch to look for the `const optional<Tensor>&` schema but also generate a `const Tensor&` declaration in Functions.h. hacky_wrapper allows us to define our function as `const Tensor&` but wraps it in optional for us, so this avoids both the errors while linking and loading. _[edit: rewrote the above to improve clarity and correct the fact that we actually need 18 more overloads (26 total), not 18 in total to complete the c++ api]_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/50569 Reviewed By: H-Huang Differential Revision: D26176105 Pulled By: soulitzer fbshipit-source-id: cd8e77cc2de1117c876cd71c29b312887daca33f	2021-02-02 20:25:16 -08:00
Yanli Zhao	e54cbb8250	Create PyTorch DDP logging APIs for applications to use (#50637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50637 add APIs for logging pytorch ddp logging data in applications. Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D25933411 fbshipit-source-id: 57c248a2f002da06a386fc7406d3e5533ebb9124	2021-02-02 18:24:21 -08:00
Natalia Gimelshein	26f9ac98e5	Revert D26105797: [pytorch][PR] Exposing linear layer to fuser Test Plan: revert-hammer Differential Revision: D26105797 (`e488e3c443`) Original commit changeset: 6f7cedb9f6e3 fbshipit-source-id: f0858cefed76d726e9dba61e51e1eaf2af4c99c5	2021-02-02 17:39:17 -08:00
Jeff Daily	5a402274d4	[ROCm] add 4.0.1 to nightly builds (#51257 ) Summary: Depends on https://github.com/pytorch/builder/pull/628. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51257 Reviewed By: H-Huang, seemethere Differential Revision: D26208135 Pulled By: malfet fbshipit-source-id: 8a4386b5661c6f71df28d98279e2771c4044f06c	2021-02-02 16:52:38 -08:00
Richard Barnes	b283ac6da4	"whitelist" -> "allowlist" (#51375 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51375 Test Plan: Sandcastle tests Reviewed By: iseeyuan Differential Revision: D26150609 fbshipit-source-id: 1ca17bc8943598a42f028005d1f6d3f362fe2659	2021-02-02 16:20:34 -08:00
Richard Barnes	c791a30484	Fix warnings in "ForeachOpsKernels" with c10::irange (#50783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50783 Compiling currently shows: ``` Jan 13 16:46:28 In file included from ../aten/src/ATen/native/ForeachOpsKernels.cpp:2: Jan 13 16:46:28 ../aten/src/ATen/native/ForeachUtils.h:28:21: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachUtils.h:44:21: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachUtils.h:149:25: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 for (int64_t i = 0; i < tensors1.size(); i++) { Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachUtils.h:164:25: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 for (int64_t i = 0; i < tensors1.size(); i++) { Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachUtils.h:183:25: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 for (int64_t i = 0; i < tensors1.size(); i++) { Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachUtils.h:198:25: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 for (int64_t i = 0; i < tensors1.size(); i++) { Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:150:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST_ALPHA(add); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:74:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST_ALPHA' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:150:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST_ALPHA(add); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:84:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST_ALPHA' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:151:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST_ALPHA(sub); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:74:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST_ALPHA' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:151:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST_ALPHA(sub); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:84:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST_ALPHA' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:158:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(add); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:31:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:158:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(add); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:40:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:159:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(sub); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:31:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:159:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(sub); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:40:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:160:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(mul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:31:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:160:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(mul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:40:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:161:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(div); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:31:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:161:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(div); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:40:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:163:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST(mul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:53:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:163:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST(mul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:63:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:164:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST(div); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:53:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:164:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST(div); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:63:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:195:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALAR(addcdiv); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:115:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALAR' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:195:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALAR(addcdiv); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:125:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALAR' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:196:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALAR(addcmul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:115:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALAR' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:196:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALAR(addcmul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:125:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALAR' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:198:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALARLIST(addcdiv); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:135:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:198:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALARLIST(addcdiv); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:145:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:199:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALARLIST(addcmul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:135:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:199:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALARLIST(addcmul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:145:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { ``` this diff fixes that Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25935046 fbshipit-source-id: 9a042367410b3c1ffd27d9f957a623f1bae07d20	2021-02-02 16:13:03 -08:00
jiej	e488e3c443	Exposing linear layer to fuser (#50856 ) Summary: 1. enabling linear in autodiff; 2. remove control flow in python for linear; Pull Request resolved: https://github.com/pytorch/pytorch/pull/50856 Reviewed By: pbelevich Differential Revision: D26105797 Pulled By: eellison fbshipit-source-id: 6f7cedb9f6e3e46daa24223d2a6080880498deb4	2021-02-02 15:39:01 -08:00
Nikita Shulga	5499e839f1	[Fuser] Do not attempt to use OpenMP if build without OpenMP support (#51504 ) Summary: Clang from XCode does not support `-fopenmp` option, no need to try to compile with it. Infer whether OpenMP is supported by checking _OPENMP define. Also, use clang compiler if host app was compiled with clang rather than gcc. Fix few range loop warnings and add static_asserts that range loop variables are raw pointers. This changes makes fuser tests on OS X a bit faster. Before: ``` % python3 test_jit.py -v TestScript.test_batchnorm_fuser_cpu Fail to import hypothesis in common_utils, tests are not derandomized CUDA not available, skipping tests test_batchnorm_fuser_cpu (__main__.TestScript) ... clang: error: unsupported option '-fopenmp' clang: error: unsupported option '-fopenmp' warning: pytorch jit fuser failed to compile with openmp, trying without it... ok ---------------------------------------------------------------------- Ran 1 test in 0.468s OK ``` After: ``` % python3 test_jit.py -v TestScript.test_batchnorm_fuser_cpu Fail to import hypothesis in common_utils, tests are not derandomized CUDA not available, skipping tests test_batchnorm_fuser_cpu (__main__.TestScript) ... ok ---------------------------------------------------------------------- Ran 1 test in 0.435s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/51504 Reviewed By: smessmer Differential Revision: D26186875 Pulled By: malfet fbshipit-source-id: 930b3bcf543fdfad0f493d687072aaaf5f9e2bfc	2021-02-02 15:31:59 -08:00
kshitij12345	38eb836387	[complex] Enable complex autograd and jit tests for `trace` (#51537 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50381 Now that `index_fill_` supports complex, we can enable complex support for `trace`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51537 Reviewed By: H-Huang Differential Revision: D26198904 Pulled By: anjali411 fbshipit-source-id: d62bb02549919fe35b0bac44f77af964ebd0e92e	2021-02-02 15:24:38 -08:00
James Reed	209e27eaff	[FX] Add note about more use cases of FX (#51576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51576 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D26203610 Pulled By: jamesr66a fbshipit-source-id: d33a3e7e0f3a959349ed0e29a1aba0592022606d	2021-02-02 14:57:48 -08:00
Jacob Szwejbka	37f1412965	[Pytorch Mobile] Preserved all functions generated by bundled inputs (#51496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51496 A previous change added the possibility of more functions being generated when bundled inputs are attached. Want to preserve those here in optimize_for_mobile ghstack-source-id: 120862718 Test Plan: Created a dummy model. Augment several methods with bundled inputs. Call optimize for mobile. Verified the functions are still there. Discovered a weird interaction between freeze_module and bundled inputs. If the user does something like inputs =[<inputs>] augment_many_model_functions_with_bundled_inputs( model, inputs={ model.forward : inputs, model.foo : inputs, } ) to attach their bundled inputs, freeze_module within optimize_for_mobile will error out. Instead the user would need to do something like inputs =[<inputs>] inputs2 =[<inputs>] # Nominally the same as the the inputs above augment_many_model_functions_with_bundled_inputs( model, inputs={ model.forward : inputs, model.foo : inputs2, } ) Reviewed By: dhruvbird Differential Revision: D26005708 fbshipit-source-id: 3e908c0f7092a57da9039fbc395aee6bf9dd2b20	2021-02-02 14:57:44 -08:00
Iurii Zdebskyi	cce84b5ca5	[WIP] Update foreach APIs to use scalar lists (#48223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48223 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D25074763 Pulled By: izdeby fbshipit-source-id: 155e3d2073a20d16bdbe358820170bf53f93c7a5	2021-02-02 14:54:28 -08:00
Jagadish Krishnamoorthy	506fdf9abf	[ROCm] disable tests for ROCm 4.0.1 (#51510 ) Summary: These tests are failing for ROCm 4.0/4.0.1 release. Disable the tests until they are fixed. - TestCuda.test_cudnn_multiple_threads_same_device - TestCudaFuser.test_reduction Pull Request resolved: https://github.com/pytorch/pytorch/pull/51510 Reviewed By: H-Huang Differential Revision: D26205179 Pulled By: seemethere fbshipit-source-id: 0c3d29989d711deab8b5046b458c772a1543d8ed	2021-02-02 14:39:08 -08:00
Benjamin Lefaudeux	bbe18e3527	[ZeroRedundancyOptimizer] Elastic and pytorch compatible checkpoints (#50956 ) Summary: - Makes it possible to use non-sharded optimizer checkpoints (as long as the model/param groups are the same, of course) - Makes it possible to save with a given world size, and load with another world size - Use Torch Distributed built-in broadcast object list instead of a ad-hoc version Pull Request resolved: https://github.com/pytorch/pytorch/pull/50956 Reviewed By: malfet Differential Revision: D26113953 Pulled By: blefaudeux fbshipit-source-id: 030bfeee2c34c2d987590d45dc8efe05515f2e5c	2021-02-02 14:32:13 -08:00
Max Balandat	a990ff7001	[SobolEngine] Fix edge case of dtype of first sample (#51578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51578 https://github.com/pytorch/pytorch/pull/49710 introduced an edge case in which drawing a single sample resulted in ignoring the `dtype` arg to `draw`. This fixes this and adds a unit test to cover this behavior. Test Plan: Unit tests Reviewed By: danielrjiang Differential Revision: D26204393 fbshipit-source-id: 441a44dc035002e7bbe6b662bf6d1af0e2cd88f4	2021-02-02 14:24:56 -08:00
Ivan Yashchuk	4746b3d1fb	Added missing VSX dispatch for cholesky_inverse (#51562 ) Summary: It was overlooked that vsx dispatch is also needed for cholesky_inverse cpu dispatch. See https://github.com/pytorch/pytorch/pull/50269#issuecomment-771688180 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51562 Reviewed By: H-Huang Differential Revision: D26199581 Pulled By: anjali411 fbshipit-source-id: 5d02c6da52ce1d2e9e26001f5d4648a71dd0e829	2021-02-02 13:45:35 -08:00
Ashkan Aliabadi	2565a33c98	[Vulkan] Remove redundant qualifiers on writeonly images. (#51425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51425 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D26179605 Pulled By: AshkanAliabadi fbshipit-source-id: 26358cd4fd23922fed21120e120774eea0b728df	2021-02-02 13:37:59 -08:00
Ashkan Aliabadi	0402df5427	[Vulkan] Improve error handling in a few places. (#51423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51423 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D26179604 Pulled By: AshkanAliabadi fbshipit-source-id: 2e270423bf7e960e9303b17e0ca1a1530b760ad3	2021-02-02 13:34:43 -08:00
yangu	365986cfe0	Add tensorboard_trace_handler for profiler (#50875 ) Summary: Add a tensorboard_trace_handler to output tracing files and formalize the file name for tensorboard plugin. As discussed in https://github.com/pytorch/pytorch/pull/49231 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50875 Reviewed By: H-Huang Differential Revision: D26098493 Pulled By: ilia-cher fbshipit-source-id: 906ea118682f8bff412e76ca3f391bebab23b0ff	2021-02-02 13:28:00 -08:00
Teng Gao	cde7fa6e3c	update kineto submodule (#51566 ) Summary: To make the dumping callstack feature work, I have to make pytorch refer to latest kineto version. Callstack feature's kineto side: [Link](`66a4cad380`); Callstack feature's pytorch side: [Link](https://github.com/pytorch/pytorch/pull/51565) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51566 Reviewed By: ilia-cher Differential Revision: D26205782 Pulled By: gdankel fbshipit-source-id: 52d835e45a87ab4630fd22ea024cb41b82c96ebc	2021-02-02 13:17:05 -08:00
Edward Yang	a38a648cb7	Test if allocator is set only in DEBUG mode. (#51360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51360 Invariant should be satisfied by call sites of allocator ensuring that the device type makes sense. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: swolchok Differential Revision: D26170202 Pulled By: ezyang fbshipit-source-id: f23681f34187c0d3da794f7a8c869ea8da88365d	2021-02-02 12:51:15 -08:00
Edward Yang	0ff855efea	Make empty_cpu sanity test CPU only in DEBUG mode (#51358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51358 BackendSelect is expected to enforce this invariant. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: swolchok Differential Revision: D26149502 Pulled By: ezyang fbshipit-source-id: f53ab66e8324b729a4057b376fe3d60b14daf2fb	2021-02-02 12:47:56 -08:00
Yanan Cao	351ee1ece7	Remove duplicate check for THPLayout in toSugaredValue (#51543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51543 Reviewed By: Lilyjjo Differential Revision: D26202297 Pulled By: gmagogsfm fbshipit-source-id: f0d40c9d73b579a68e34c54b004d329fd3b76ff3	2021-02-02 12:34:29 -08:00
XiaobingSuper	ec378055c3	add OneDNN linear backward (#49453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49453 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26006889 Pulled By: VitalyFedyunin fbshipit-source-id: 06e2a02b6e01d847395521a31fe84d844f2ee9ae	2021-02-02 12:18:59 -08:00
Xu Zhao	4fdebdc0c9	Improve PyTorch profiler flop computation formulas (#51377 ) Summary: Improve the flops computation formula of aten::conv2d operator to support stride, pad, dilation, and groups arguments. This diff also fixes the following issues: - Apply a factor of 2 to aten::mm because output accounts for multiplication and addition. - Fix incorrect names of scalar operators to aten::mul and aten::add. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51377 Test Plan: ```python python test/test_profiler.py ``` Reviewed By: jspark1105 Differential Revision: D26165223 Pulled By: xuzhao9 fbshipit-source-id: 2c5f0155c47af2e6a19332fd6ed73ace47fa072a	2021-02-02 11:49:04 -08:00
Michael Suo	55a4aa79aa	[package] patch inspect.getfile to work with PackageImporter (#51568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51568 The default behavior of inspect.getfile doesn't work on classes imported from PackageImporter, because it returns the following. sys.modules[kls.__module__].__file__ Looking in `sys.modules` is hard-coded behavior. So, patch it to first check a similar registry of PackageImported modules we maintain. Test Plan: Imported from OSS Reviewed By: yf225 Differential Revision: D26201236 Pulled By: suo fbshipit-source-id: aaf5d7ee8ca0155619c8185e64f70a30152ac567	2021-02-02 11:29:29 -08:00
Jane Xu	b6c6fb7252	fix windows 11.1 test2 by disabling test (#51573 ) Summary: `TestStandaloneCPPJIT.test_load_standalone` fails with the split torch_cuda build, but the error seems irrelevant (cannot find `nvToolsExt64_1.dll`). Temporarily disabling as I'm investigating why that dependency is even there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51573 Reviewed By: malfet, H-Huang Differential Revision: D26203084 Pulled By: janeyx99 fbshipit-source-id: 373aeae8165506384e433bc256b80eea4a7a5048	2021-02-02 11:01:26 -08:00
Meghan Lele	751c30038f	[JIT] Properly convert Python strings implictly to device (#51340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51340 Summary `toIValue` assumes that any value passed for an argument of type `torch.device` is a valid device object, even when it is not. This can lead to device type arguments of functions being assigned incorrect values (see #51098). This commit adds an explicit check that the passed in object is indeed a `torch.device` using `THPDevice_Check` and only then does is it converted to an `IValue`. Since implicit conversion from strings to devices is generally allowed, if `THPDevice_Check` fails, it is assumed that the object is a string and an `IValue` containing a `c10::Device` containing the passed in string is returned. Test Plan This commit adds a unit test to `test_jit.py` to test that invalid strings passed as devices are not longer silently accepted. Fixes This commit fixes #51098. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26187190 Pulled By: SplitInfinity fbshipit-source-id: 48c990203431da30f9f09381cbec8218d763325b	2021-02-02 10:57:56 -08:00
Xin (Annie) Guan	74ec9e7ccf	compare_model_outputs_fx API implementation (#49266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49266 compare_model_outputs_fx API implementation ghstack-source-id: 120828880 Test Plan: buck test mode/dev caffe2/test:quantization_fx -- 'test_compare_model_outputs_linear_static_fx' buck test mode/dev caffe2/test:quantization_fx -- 'test_compare_model_outputs_conv_static_fx' buck test mode/dev caffe2/test:quantization_fx -- 'test_compare_model_stub_linear_static_fx' buck test mode/dev caffe2/test:quantization_fx -- 'test_compare_model_stub_conv_static_fx' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_conv_static' Reviewed By: vkuzo Differential Revision: D25507933 fbshipit-source-id: 1b502b5eadb0fafbe9e8c2e843410bca03c63fd6	2021-02-02 10:43:25 -08:00
Jacob Szwejbka	0118dec2e3	[Pytorch] Expanded Bundled Inputs To Any Public Function (#51153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51153 Enabled bundled inputs for all public functions that the user wants in a torchscript module. An important caveat here is that you cant add bundled inputs to functions that were in the nn.module but weren't caught in the scripting/tracing process that brought the model to torchscript. Old Api is exactly the same. Still only works on forward, return types the same, etc. -----------New API------------- Attachment of inputs: *augment_model_with_bundled_inputs* : works the same as before but added the option to specify an info dictionary. *augment_many_model_functions_with_bundled_inputs* : Similar to the above function but allows the user to specify a Dict[Callable, List[<inputs>]] (mapping function references to the bundled inputs for that function) to attach bundled inputs to many functions Consumption of inputs: *get_all_bundled_inputs_for_<function_name>()* : Works exactly like get_all_bundled_inputs does, but can be used for functions other then forward if you know ahead of time what they are called, and if they have bundled inputs. *get_bundled_inputs_functions_and_info()* : This is easily the hackiest function. Returns a Dict['str', 'str'] mapping function_names to get_all_bundled_inputs_for_<function_name>. A user can then execute the functions specified in the values with something like all_info = model.get_bundled_inputs_functions_and_info() for func_name in all_info.keys(): input_func_name = all_info[func_name]['get_inputs_function_name'][0] func_to_run = getattr(loaded, input_func_name) The reason its done this way is because torchscript doesn't support 'Any' type yet meaning I can't return the bundled inputs directly because they could be different types for each function. Torchscript also doesn't support callable so I can't return a function reference directly either. ghstack-source-id: 120768561 Test Plan: Got a model into torchscript using the available methods that I'm aware of (tracing, scripting, old scripting method). Not really sure how tracing brings in functions that arent in the forward call path though. Attached bundled inputs and info to them successfully. Changes to TorchTest.py on all but the last version of this diff (where it will be/is removed for land) illustrate what I did to test. Created and ran unit test Reviewed By: dreiss Differential Revision: D25931961 fbshipit-source-id: 36e87c9a585554a83a932e4dcf07d1f91a32f046	2021-02-02 10:33:59 -08:00
Fritz Obermeyer	6465793011	Fix Dirichlet.arg_constraints event_dim (#51369 ) Summary: This fix ensures ```py Dirichlet.arg_constraints["concentration"].event_dim == 1 ``` which was missed in https://github.com/pytorch/pytorch/issues/50547 ## Tested - [x] added a regression test, covering all distributions Pull Request resolved: https://github.com/pytorch/pytorch/pull/51369 Reviewed By: H-Huang Differential Revision: D26160644 Pulled By: neerajprad fbshipit-source-id: 1bb44c79480a1f0052b0ef9d4605e750ab07bea1	2021-02-02 10:26:45 -08:00
Jan	a5b65ae40a	Fix small typo (#51542 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51541 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51542 Reviewed By: albanD Differential Revision: D26199174 Pulled By: H-Huang fbshipit-source-id: 919fc4a70d901916eae123672d010e9eb8e8b977	2021-02-02 10:14:17 -08:00
Joel Schlosser	8f0968f899	Fix: Bad autograd side effects from printing (#51364 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49756 ## Background Fix applied here is to remove the grad enabled check from `collect_next_edges`, unconditionally returning the actual collected edges. This pushes the responsibility for determining whether the function should be called without grad mode to its call-sites. With this update, `collect_next_edges` will no longer incorrectly return an empty list, which caused the problem described in the issue. Three call-sites depended on this behavior and have been updated. Beyond bad printing side effects, this fix addresses the more general issue of accessing `grad_fn` with grad mode disabled after an in-place operation on a view. The included test verifies this without the use of print. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51364 Test Plan: ``` python test/test_autograd.py TestAutogradDeviceTypeCPU.test_inplace_view_then_no_grad_cpu ``` Reviewed By: zou3519 Differential Revision: D26190451 Pulled By: jbschlosser fbshipit-source-id: 9b004a393463f8bd4ac0690e5e53c07a609f87f0	2021-02-02 09:30:27 -08:00
kshitij12345	c39fb9771d	[complex] Enable complex autograd tests for `diag` (#51268 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51268 Reviewed By: pbelevich Differential Revision: D26179236 Pulled By: anjali411 fbshipit-source-id: e9756136eaaced5a8692228a158965f77505e7b9	2021-02-02 09:10:28 -08:00
Guilherme Leobas	43084d7aab	add type annotations to conv_fused/blas_compare/blas_compare_setup (#51235 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51234 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51235 Reviewed By: malfet Differential Revision: D26147184 Pulled By: walterddr fbshipit-source-id: 1ca1a1260785c8b7f4c3c24d7763ccbdaa0bfefb	2021-02-02 08:50:49 -08:00
Heitor Schueroff	c6f37e50f2	[doc] Add deprecation message to torch.slogdet in favor of torch.linalg.slogdet (#51354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51354 Re-created from https://github.com/pytorch/pytorch/pull/51301 because of issues with ghstack. This PR is part of a larger effort to ensure torch.linalg documentation is consistent (see #50287). Updated torch.slogdet documentation to add a deprecation message in favor of torch.linalg.slogdet. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D26148679 Pulled By: heitorschueroff fbshipit-source-id: 4d9f3386d9ba6dc735a4d1e86cfcd88eaba3cbcc	2021-02-02 07:58:01 -08:00
Heitor Schueroff	1caed167fb	[doc] Fix linalg.slogdet doc consistency issues (#51353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51353 Re-created from https://github.com/pytorch/pytorch/pull/51300 because of issues with ghstack. This PR is part of a larger effort to ensure torch.linalg documentation is consistent (see https://github.com/pytorch/pytorch/issues/50287). Updated torch.linalg.slogdet to include notes about cross-device synchronization, backend routines used and fix signature missing out argument. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D26148678 Pulled By: heitorschueroff fbshipit-source-id: 40f6340226ecb72e4ec347c5606012f31f5877fb	2021-02-02 07:54:29 -08:00
lixinyu	c0d58bce0d	move Tar Dataset to Tar DataPipe (#51398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51398 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26162319 Pulled By: glaringlee fbshipit-source-id: a84879fe4ca044e34238d5e1d31a245d4b80ae8e	2021-02-02 07:46:53 -08:00
Jane Xu	a07a37e4fb	reenable BUILD_SPLIT_CUDA for windows and fixes Linux 11_1 tests (#51538 ) Summary: Reenabling split build for windows and also fixes linux 11_1 tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/51538 Reviewed By: lw Differential Revision: D26198269 Pulled By: janeyx99 fbshipit-source-id: 363b2eed6631d75592120834d1543b438cfd2d8f	2021-02-02 05:38:21 -08:00
Luca Wehrstedt	4f37150f40	Revert D26179083: [TensorExpr] Introduce ExternalCall nodes to TE IR. Test Plan: revert-hammer Differential Revision: D26179083 (`f4fc3e3920`) Original commit changeset: 9e44de098ae9 fbshipit-source-id: d15684e04c65c395b4102d4f98a4488482822d1b	2021-02-02 05:29:41 -08:00
Ansley Ussery	41e4c55379	Correct subgraph rewriter pattern containment rules (#51529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51529 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D26192470 Pulled By: ansley fbshipit-source-id: 6e44f7df1e245835365ec868ae9cc539ecc873f2	2021-02-02 05:13:03 -08:00
Ansley Ussery	8bb0dff7e2	Write FX Subgraph Rewriter tutorial (#51531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51531 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D26192992 Pulled By: ansley fbshipit-source-id: 769901b418d4580cdf8aed2451dd8ef3d8ddf0d1	2021-02-02 05:02:51 -08:00
generatedunixname89002005325676	5c5db25cd5	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D26195387 fbshipit-source-id: 009860c4237048125e31e8abea44e8222e13715c	2021-02-02 04:54:15 -08:00
Yi Wang	79e7544cb4	[Gradient Compression] Check start_PowerSGD_iter > 1 and add guidance on tuning PowerSGD configs. (#51427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51427 A user reported that `start_PowerSGD_iter` failed when it's set as 1. This is because allocating memory for error tensors somehow overlap with bucket rebuilding process at iteration 1. Check `start_PowerSGD_iter > 1` instead of `start_PowerSGD_iter >= 1`. Also add a unit test of `test_invalid_powerSGD_state` and some guidance on tuning PowerSGD configs. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 120834126 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_invalid_powerSGD_state Reviewed By: rohan-varma Differential Revision: D26166897 fbshipit-source-id: 34d5b64bb3dd43acb61d792626c70e6c8bb44a5d	2021-02-02 04:30:24 -08:00
Horace He	d555768e8f	[FX] Added invert example (#51478 ) Summary: Added an inverse example. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51478 Reviewed By: pbelevich Differential Revision: D26190544 Pulled By: Chillee fbshipit-source-id: 4324ea8b917557f4c49f3b9aecd35c4e9ab36bf3	2021-02-02 02:38:22 -08:00
Facebook Community Bot	96a22123f4	Automated submodule update: tensorpipe (#51469 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51469 This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `b4098ad5de` Pull Request resolved: https://github.com/pytorch/pytorch/pull/51346 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: mrshenli Differential Revision: D26177172 Pulled By: lw fbshipit-source-id: 4ec508ce78cf521b11fed52ffdfc6f788ca6a6d0	2021-02-02 01:13:38 -08:00
Mikhail Zolotukhin	f4fc3e3920	[TensorExpr] Introduce ExternalCall nodes to TE IR. (#51475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51475 ExternalCall nodes represent opaque calls to external functions to fill a tensor (buffer) with values. It could be used to include nodes that are otherwise not-representable as TE, or whose TE representation is currently too slow. To make an external function available in NNC as ExternalCall, one needs to implement a "bridge" function that would take raw (void*) pointers to the data along with the arrays containing dimension info. This function would then internally call the desired external function and make sure the results of the call are correctly placed in the provided raw data buffers. Test Plan: Imported from OSS Reviewed By: pbelevich, Chillee Differential Revision: D26179083 Pulled By: ZolotukhinM fbshipit-source-id: 9e44de098ae94d25772cf5e2659d539fa6f3f659	2021-02-02 00:50:46 -08:00
vfdev	b106250047	Introduced AliasInfo for OpInfo (#50368 ) Summary: Introduced AliasInfo for OpInfo. Context: Split of https://github.com/pytorch/pytorch/issues/49158 cc mruberry , please let me know if you'd like to see here more code to cover > [ ] fold test_op_aliases.py into OpInfo-based testing in test_ops.py from https://github.com/pytorch/pytorch/issues/50006 and/or add `UnaryUfuncInfo('abs')` as discussed https://github.com/pytorch/pytorch/pull/49158/files#r548774221 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50368 Reviewed By: ngimel Differential Revision: D26177261 Pulled By: mruberry fbshipit-source-id: 2e3884a387e8d5365fe05945375f0a9d1b5f5d82	2021-02-02 00:10:09 -08:00
Scott Wolchok	7328710cbc	[PyTorch][codemod] Replace immediately-dereferenced cast calls w/castRaw (#50229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50229 `fastmod -m 'cast(<((at\|c10)::)?\w+Type>\s*)->' 'castRaw${1}->'` Presuming it builds, this is a safe change: the result of `cast()` wasn't being saved anywhere, so we didn't need it, so we can use a raw pointer instead of a new `shared_ptr`. ghstack-source-id: 120769170 Test Plan: CI Reviewed By: SplitInfinity Differential Revision: D25837494 fbshipit-source-id: 46319100dc0dfc78f6d2b45148207f83481f2ada	2021-02-01 23:12:07 -08:00
anjali411	f0006315a9	Add support for complex valued keys for dict in TS (#51472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51472 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D26177963 Pulled By: anjali411 fbshipit-source-id: 5841159c36b07290b1d88d4df27a0bf8c17d9df8	2021-02-01 22:40:01 -08:00
Jane Xu	9c474c97b7	Disable BUILD_SPLIT_CUDA for now (#51533 ) Summary: BUILD_SPLIT_CUDA needs to be exported to true in order for cpp_extensions.py to work properly--currently disabling to make tree green Pull Request resolved: https://github.com/pytorch/pytorch/pull/51533 Reviewed By: pbelevich Differential Revision: D26194055 Pulled By: janeyx99 fbshipit-source-id: 08d3cc7e6ba57011dddbf27f96ef5acb648b6b9a	2021-02-01 22:25:24 -08:00
Xin (Annie) Guan	c354888e5d	compare_model_stub_fx API implementation (#48951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48951 compare_model_stub_fx API implementation ghstack-source-id: 120817825 Test Plan: buck test mode/dev caffe2/test:quantization_fx -- 'test_compare_model_stub_conv_static_fx' buck test mode/dev caffe2/test:quantization_fx -- 'test_compare_model_stub_linear_static_fx' Reviewed By: vkuzo Differential Revision: D25379000 fbshipit-source-id: f1321d37b60b56b202e7d227e370ce13addb10cc	2021-02-01 22:16:14 -08:00
Jeff Daily	d02ea9a141	[ROCm] add hipMAGMA support (#51238 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48831. - CI image is updated to build hipMAGMA from source and set env MAGMA_HOME. - CMake is updated to separate different requirements for CUDA versus ROCm MAGMA. - Some unit tests that become enabled with MAGMA are currently skipped for ROCm due to failures. Fixing these failures will be follow-on work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51238 Reviewed By: ngimel Differential Revision: D26184918 Pulled By: malfet fbshipit-source-id: ada632f1ae7b413e8cae6543fe931dcd46985821	2021-02-01 22:09:33 -08:00
Ivan Yashchuk	5e09ec6518	Fixed SVD ignoring "some/full_matrices" flag for empty inputs (#51109 ) Summary: For empty inputs `torch.svd` (and `torch.linalg.svd`) was returning incorrect results for `some=True` (`full_matrices=False`). Behaviour on master branch: ```python In [1]: import torch In [2]: a = torch.randn(0, 7) In [3]: a.svd() Out[3]: torch.return_types.svd( U=tensor([], size=(0, 0)), S=tensor([]), V=tensor([[0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.]])) In [4]: a.svd(some=False) Out[4]: torch.return_types.svd( U=tensor([], size=(0, 0)), S=tensor([]), V=tensor([[0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.]])) ``` `some` flag is ignored and 7x7 `V` matrix is returned in both cases. `V` should have 7x0 shape when `some=True`. This PR fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51109 Reviewed By: ngimel Differential Revision: D26170897 Pulled By: mruberry fbshipit-source-id: 664c09ca27bb375fabef2a046d0a09ca57b01aac	2021-02-01 21:51:58 -08:00
kshitij12345	4b65a27a35	[testing] Add OpInfo for round and logit (#51272 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50006 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51272 Reviewed By: ngimel Differential Revision: D26177020 Pulled By: mruberry fbshipit-source-id: 4728b14c7a42980c7ca231ca1946430e0e38ed5b	2021-02-01 21:15:40 -08:00
Scott Wolchok	205c971431	[PyTorch] Remove always-empty string args to inferFunctionSchemaFromFunctor (#51307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51307 Using -ftime-trace shows that this roughly halves time spent compile/optimizing this function in an optimized build of RegisterCPU.cpp (savings of about 1 second). ghstack-source-id: 120697493 Test Plan: manual build with -ftime-trace as above; sketch of directions at https://fb.workplace.com/groups/894363187646754/permalink/1153321361750934/ , except that I extracted a compiler invocation for RegisterCPU.cpp by injecting a syntax error and running buck build with -v 3 so that I could rebuild and measure just the one file quickly. Reviewed By: ezyang Differential Revision: D26135978 fbshipit-source-id: 756499fbcc8d3b169bae5a463f63caecb79f7fcd	2021-02-01 19:21:17 -08:00
Scott Wolchok	1416fb9877	[PyTorch] IWYU in torch/csrc/utils/future.h (#51293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51293 It looks like this header did not need ivalue.h at all. ghstack-source-id: 120697488 Test Plan: CI to ensure correctness Reviewed By: ezyang Differential Revision: D26128288 fbshipit-source-id: a24a7e49b9d623fb182bdfaf286972739497e770	2021-02-01 19:18:12 -08:00
James Reed	a1c5eba4bd	[FX] Move some heavily used passes out of experimental (#51392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51392 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D26161172 Pulled By: jamesr66a fbshipit-source-id: 04bfe606555bdf1988f527231d4de2e0196e6b37	2021-02-01 19:02:26 -08:00
James Reed	a3353d1ec0	[FX] Support ellipsis as arg (#51502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51502 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D26186578 Pulled By: jamesr66a fbshipit-source-id: 91943af38412bafc1766398dfaebdf50b64ccd74	2021-02-01 18:54:14 -08:00
Jane Xu	88af2149e1	Add build option to split torch_cuda library into torch_cuda_cu and torch_cuda_cpp (#49050 ) Summary: Because of the size of our `libtorch_cuda.so`, linking with other hefty binaries presents a problem where 32bit relocation markers are too small and end up overflowing. This PR attempts to break up `torch_cuda` into `torch_cuda_cu` and `torch_cuda_cpp`. `torch_cuda_cu`: all the files previously in `Caffe2_GPU_SRCS` that are * pure `.cu` files in `aten`match * all the BLAS files * all the THC files, except for THCAllocator.cpp, THCCachingHostAllocator.cpp and THCGeneral.cpp * all files in`detail` * LegacyDefinitions.cpp and LegacyTHFunctionsCUDA.cpp * RegisterCUDA.cpp CUDAHooks.cpp * CUDASolver.cpp * TensorShapeCUDA.cpp `torch_cuda_cpp`: all other files in `Caffe2_GPU_SRCS` Accordingly, TORCH_CUDA_API and TORCH_CUDA_BUILD_MAIN_LIB usages are getting split as well to TORCH_CUDA_CU_API and TORCH_CUDA_CPP_API. To test this locally, you can run `export BUILD_SPLIT_CUDA=ON && python setup.py develop`. In your `build/lib` folder, you should find binaries for both `torch_cuda_cpp` and `torch_cuda_cu`. To see that the SPLIT_CUDA option was toggled, you can grep the Summary of running cmake and make sure `Split CUDA` is ON. This build option is tested on CI for CUDA 11.1 builds (linux for now, but windows soon). Pull Request resolved: https://github.com/pytorch/pytorch/pull/49050 Reviewed By: walterddr Differential Revision: D26114310 Pulled By: janeyx99 fbshipit-source-id: 0180f2519abb5a9cdde16a6fb7dd3171cff687a6	2021-02-01 18:42:35 -08:00
Frank Seide	87ad77eb4e	T66557700 Support default argument values of a method (#48863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48863 Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`). Test Plan: buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg Reviewed By: iseeyuan Differential Revision: D25896212 fbshipit-source-id: 6d7e7fd5f3244a88bd44889024d81ad2e678ffa5	2021-02-01 18:35:13 -08:00
Lillian Johnson	ec3aae8cdb	[JIT] Enable saving modules with hooks in FBCODE (#51241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51241 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D26111488 Pulled By: Lilyjjo fbshipit-source-id: 3315068ac9adef8aa23670a4a5f86c5a54fdd1f7	2021-02-01 17:01:44 -08:00
Scott Wolchok	630ee57bc2	[PyTorch] Provide overload of torchCheckFail taking `const char*` (#51389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51389 This should reduce code size when STRIP_ERROR_MESSAGES is defined by allowing callers of TORCH_CHECK to avoid creating `std::string`s. ghstack-source-id: 120692772 Test Plan: Measure code size of STRIP_ERROR_MESSAGES builds Reviewed By: ezyang Differential Revision: D25891476 fbshipit-source-id: 34eef5af7464da6534989443859e2765887c243c	2021-02-01 16:48:46 -08:00
Bert Maher	c77fc2ee06	[nnc] Vectorize bitwise ops (#51492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51492 We missed these originally. This helps vectorize log_fast. ghstack-source-id: 120783427 Test Plan: ``` buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench ``` This might have made bench_approx faster but it could be noise. Before: ``` ---------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------------- log_nnc_fast/64 108 ns 108 ns 5576102 log/s=590.91M/s log_nnc_fast/512 569 ns 569 ns 1230258 log/s=899.961M/s log_nnc_fast/8192 8047 ns 8046 ns 89715 log/s=1018.08M/s log_nnc_fast/32768 31066 ns 31065 ns 22368 log/s=1054.81M/s logit_nnc_fast/64 149 ns 149 ns 4851520 logit/s=428.646M/s logit_nnc_fast/512 980 ns 979 ns 712033 logit/s=522.742M/s logit_nnc_fast/8192 13326 ns 13325 ns 51916 logit/s=614.805M/s logit_nnc_fast/32768 54743 ns 54739 ns 12844 logit/s=598.624M/s ``` After: ``` ---------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------------- log_nnc_fast/64 100 ns 100 ns 7012963 log/s=640.588M/s log_nnc_fast/512 496 ns 496 ns 1415357 log/s=1032.26M/s log_nnc_fast/8192 7600 ns 7595 ns 88258 log/s=1078.62M/s log_nnc_fast/32768 30300 ns 30298 ns 22442 log/s=1081.52M/s logit_nnc_fast/64 152 ns 152 ns 4505712 logit/s=420.279M/s logit_nnc_fast/512 816 ns 816 ns 873834 logit/s=627.267M/s logit_nnc_fast/8192 12090 ns 12088 ns 58234 logit/s=677.675M/s logit_nnc_fast/32768 51576 ns 51531 ns 14645 logit/s=635.888M/s ``` Reviewed By: bwasti Differential Revision: D26155792 fbshipit-source-id: 16724b419c944aa7d4389ae85838018455a5605f	2021-02-01 16:38:57 -08:00
Bert Maher	a23e82df10	[nnc] Tweak log_nnc_sleef so vectorization kicks in (#51491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51491 The vectorizer heuristic is pretty dumb and only kicks in if the unroll factor is exactly 8 or 4. It's still slower than direct implementation, which isn't surprising. ghstack-source-id: 120783426 Test Plan: `buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench` Before: ``` --------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------- log_nnc_sleef/64 438 ns 438 ns 1795511 log/s=146.259M/s log_nnc_sleef/512 3196 ns 3195 ns 210032 log/s=160.235M/s log_nnc_sleef/8192 77467 ns 77466 ns 8859 log/s=105.749M/s log_nnc_sleef/32768 310206 ns 310202 ns 2170 log/s=105.634M/s log_nnc_fast/64 100 ns 100 ns 7281074 log/s=637.144M/s log_nnc_fast/512 546 ns 546 ns 1335816 log/s=938.361M/s log_nnc_fast/8192 7360 ns 7359 ns 91971 log/s=1.11316G/s log_nnc_fast/32768 30793 ns 30792 ns 22633 log/s=1064.17M/s log_aten/64 427 ns 427 ns 1634897 log/s=150.021M/s log_aten/512 796 ns 796 ns 877318 log/s=643.566M/s log_aten/8192 6690 ns 6690 ns 102649 log/s=1.22452G/s log_aten/32768 25357 ns 25350 ns 27808 log/s=1.29263G/s ``` After: ``` --------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------- log_nnc_sleef/64 189 ns 188 ns 3872475 log/s=340.585M/s log_nnc_sleef/512 1307 ns 1307 ns 557770 log/s=391.709M/s log_nnc_sleef/8192 20259 ns 20257 ns 34240 log/s=404.404M/s log_nnc_sleef/32768 81556 ns 81470 ns 8767 log/s=402.209M/s log_nnc_fast/64 110 ns 110 ns 6564558 log/s=581.116M/s log_nnc_fast/512 554 ns 554 ns 1279304 log/s=923.376M/s log_nnc_fast/8192 7774 ns 7774 ns 91421 log/s=1053.75M/s log_nnc_fast/32768 31008 ns 31006 ns 21279 log/s=1056.83M/s ``` Reviewed By: bwasti Differential Revision: D26139067 fbshipit-source-id: db31897ee9922695ff9dff4ff46e3d3fbd61f4c2	2021-02-01 16:35:37 -08:00
Marat Subkhankulov	5b0a6482c1	Out variant for embedding_bag_4bit_rowwise_offsets (#51324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51324 Add out variant for embedding_bag_4bit_rowwise_offsets and add it to static runtime registry Test Plan: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 1 buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=$INLINE_CVR_DIR/210494966_0.predictor.disagg.remote_request_only_remote_cast.pt --pt_inputs=$INLINE_CVR_DIR/remote_ro_wrapped_input_data.pt --pt_enable_static_runtime=true --pt_cleanup_activations=true --pt_enable_out_variant=true --compare_results=true --iters=5000 --warmup_iters=5000 --num_threads=1 --do_profile=true ``` before: ``` 0.789023 ms. 54.8408%. quantized::embedding_bag_4bit_rowwise_offsets (82 nodes) ``` after: ``` 0.620817 ms. 49.7136%. quantized::embedding_bag_4bit_rowwise_offsets (82 nodes) ``` Reviewed By: ajyu Differential Revision: D26138322 fbshipit-source-id: 44d3f15d04636404ebd4c1e9eecf73c7ad972944	2021-02-01 16:15:57 -08:00
Nikita Vedeneev	b198cf4f1c	port `index_fill_` from TH to ATen. (#50578 ) Summary: As per title. The port is based on TensorIterator. Supports complex input. Resolves https://github.com/pytorch/pytorch/issues/24714. Resolves https://github.com/pytorch/pytorch/issues/24577. Resolves https://github.com/pytorch/pytorch/issues/36328. Possibly resolves https://github.com/pytorch/pytorch/issues/48230 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50578 Reviewed By: ngimel Differential Revision: D26049539 Pulled By: anjali411 fbshipit-source-id: 2be4e78f7a01700c593a9e893e01f69191e51ab1	2021-02-01 16:08:37 -08:00
anjali411	09bc58796e	Hashing logic for c10::complex (#51441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51441 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D26170195 Pulled By: anjali411 fbshipit-source-id: 9247c1329229405426cfbd8463cabcdbe5bdb740	2021-02-01 15:56:44 -08:00
Heitor Schueroff	8fa328f88e	[doc] Deprecate torch.cholesky in favor of torch.linalg.cholesky (#51460 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51460 This PR is part of a larger effort to ensure torch.linalg documentation is consistent (see #50287). * #51459 [doc] Fix linalg.cholesky doc consistency issues Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26176130 Pulled By: heitorschueroff fbshipit-source-id: cc89575db69cbfd5f87d970a2e71deb6522a35b1	2021-02-01 15:47:08 -08:00
Heitor Schueroff	8583f7cbe2	[doc] Fix linalg.cholesky doc consistency issues (#51459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51459 This PR is part of a larger effort to ensure torch.linalg documentation is consistent (see #50287). Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26176131 Pulled By: heitorschueroff fbshipit-source-id: 2ad88a339e6dff044965e8bf29dd8c852afecb34	2021-02-01 15:43:47 -08:00
Yi Wang	c08078031f	[Gradient Compression] Allow BatchedPowerSGD to run vanilla allreduce for the first K iterations (#51270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51270 Similar to #50973, allow the batched version to run vanilla allreduce for the first K iterations. This may be useful if the batched version can be applied to some use cases where the accuracy requirement is not very strict. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 120725858 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl baseline: f248001754 batched PowerSGD: f246960752 The training time was reduced from 54m48s to 30m33s, and the accuracy is approximately the same: 44.21 vs 44.35 Reviewed By: rohan-varma Differential Revision: D26077709 fbshipit-source-id: 6afeefad7a3fbdd7da2cbffb56dfbad855a96cb5	2021-02-01 15:26:29 -08:00
Rong Rong (AI Infra)	718e4b110b	add git submodule troubleshoot to CONTRIBUTING.md (#51458 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51355. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51458 Reviewed By: janeyx99 Differential Revision: D26176233 Pulled By: walterddr fbshipit-source-id: 758e4203e11c81489234bbca812d1a3738504148	2021-02-01 14:30:00 -08:00
Cheng Chang	109bc1047e	[NNC] Generate C++ code for Allocate and Free (#51070 ) Summary: This is the initial skeleton for C++ codegen, it includes generations for Allocate and Free. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51070 Test Plan: New unit tests are added to `test_cpp_codegen.cpp`. Reviewed By: ZolotukhinM Differential Revision: D26061818 Pulled By: cheng-chang fbshipit-source-id: b5256b2dcee6b2583ba73b6c9684994dbe7cdc1f	2021-02-01 13:06:51 -08:00
anjali411	642afcb168	Add sgn to torch.rst so that it appears in the built docs (#51479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51479 Fixes https://github.com/pytorch/pytorch/issues/50146 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26179734 Pulled By: anjali411 fbshipit-source-id: 1cda9a3dc9ce600e585900eea70fbecac0635d5c	2021-02-01 12:43:06 -08:00
Scott Wolchok	d1ddc5d65d	[PyTorch] Outline OperatorEntry::assertSignatureIsCorrect fail path (#51269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51269 This saves about 10% of the compile time of Functions.cpp. Found using clang-9's `-ftime-trace` feature + ClangBuildAnalyzer. Test Plan: Compared -ftime-trace + ClangBuildAnalyzer output. Before: P167884397 After: P167888502 Note that time spent generating assertSignatureIsCorrect is way down, though it's still kind of slow. Reviewed By: ezyang Differential Revision: D26121814 fbshipit-source-id: 949a85d8939c02e4fb5ac1adc35905ed34414724	2021-02-01 12:40:19 -08:00
Scott Wolchok	9877777fee	[PyTorch] check isValidUnboxed() in the dispatcher (#51247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51247 See code comment for explanation. This measures neutral compared to the previous diff with `perf stat` when running on a benchmark that calls empty in a loop. I think that we should commit it anyway because: 1) I have previously seen it make a difference when applied earlier in the stack. 2) This makes sense both on principle and via inspecting output assembly: we avoid having to touch the boxed kernel at all (usually) and instead use the unboxed kernel for both the validity check in `OperatorEntry::lookup` and the actual `KernelFunction::call`. ghstack-source-id: 120697497 Test Plan: Aforementioned perf measurement Reviewed By: ezyang Differential Revision: D26113650 fbshipit-source-id: 8448c4ed764d477f63eb7c0f6dd87b1fc0228b73	2021-02-01 12:40:14 -08:00
Scott Wolchok	4495b49ffa	[PyTorch] Pass TensorOptions by value (#51165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51165 `TensorOptions` does not have a non-trivial copy, move, or destroy operation and is small enough to fit in a register, so it seems like we should pass it by value. ghstack-source-id: 120697498 Test Plan: Measured timing for empty framework overhead benchmark before & after this change: Before: ``` I0126 16:02:50.662864 2137574 bench.cpp:139] Mean 0.268645 I0126 16:02:50.662891 2137574 bench.cpp:140] Median 0.267485 I0126 16:02:50.662896 2137574 bench.cpp:141] Min 0.266485 I0126 16:02:50.662901 2137574 bench.cpp:142] stddev 0.00219359 I0126 16:02:50.662915 2137574 bench.cpp:143] stddev / mean 0.00816537 2,968.37 msec task-clock # 0.997 CPUs utilized ( +- 0.03% ) 250 context-switches # 0.084 K/sec ( +- 2.21% ) 1 cpu-migrations # 0.000 K/sec 11,403 page-faults # 0.004 M/sec ( +- 0.28% ) 5,898,481,882 cycles # 1.987 GHz ( +- 0.03% ) (50.05%) 16,169,242,938 instructions # 2.74 insn per cycle ( +- 0.03% ) (50.06%) 3,076,546,626 branches # 1036.443 M/sec ( +- 0.05% ) (50.05%) 2,531,859 branch-misses # 0.08% of all branches ( +- 0.89% ) (50.03%) ``` After: ``` I0126 16:23:20.010062 2244624 bench.cpp:139] Mean 0.266814 I0126 16:23:20.010092 2244624 bench.cpp:140] Median 0.265759 I0126 16:23:20.010099 2244624 bench.cpp:141] Min 0.260291 I0126 16:23:20.010107 2244624 bench.cpp:142] stddev 0.00548279 I0126 16:23:20.010118 2244624 bench.cpp:143] stddev / mean 0.0205491 2,983.75 msec task-clock # 0.995 CPUs utilized ( +- 0.36% ) 243 context-switches # 0.082 K/sec ( +- 1.26% ) 1 cpu-migrations # 0.000 K/sec 11,422 page-faults # 0.004 M/sec ( +- 0.18% ) 5,928,639,486 cycles # 1.987 GHz ( +- 0.36% ) (50.02%) 16,105,928,210 instructions # 2.72 insn per cycle ( +- 0.05% ) (50.02%) 3,150,273,453 branches # 1055.809 M/sec ( +- 0.03% ) (50.05%) 3,713,617 branch-misses # 0.12% of all branches ( +- 0.83% ) (50.07%) ``` It looked close to neutral, so I used `perf stat` to confirm it's about a 1% instruction count win. For deciding whether this stack is worth it, I went back and ran `perf stat` on the baseline diff before I started touching the dispatcher: ``` 2,968.37 msec task-clock # 0.997 CPUs utilized ( +- 0.03% ) 250 context-switches # 0.084 K/sec ( +- 2.21% ) 1 cpu-migrations # 0.000 K/sec 11,403 page-faults # 0.004 M/sec ( +- 0.28% ) 5,898,481,882 cycles # 1.987 GHz ( +- 0.03% ) (50.05%) 16,169,242,938 instructions # 2.74 insn per cycle ( +- 0.03% ) (50.06%) 3,076,546,626 branches # 1036.443 M/sec ( +- 0.05% ) (50.05%) 2,531,859 branch-misses # 0.08% of all branches ( +- 0.89% ) (50.03%) ``` If I've done the arithmetic correctly, we have an 0.39% instruction count win. Reviewed By: ezyang Differential Revision: D25983863 fbshipit-source-id: 87d1451a01ead25738ea6b80db270d344bc583b2	2021-02-01 12:40:08 -08:00
Scott Wolchok	341c76dcc1	[PyTorch] Add C10_ALWAYS_INLINE to critical dispatcher paths (#51245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51245 Splitting this out from #51164 (D26069629) to allow it to land separately; I'm sure this is a good idea but I'm less sure about #51164. ghstack-source-id: 120697499 Test Plan: double-check effect on empty benchmark with perf stat; didn't move Reviweers: ezyang, messmer Reviewed By: ezyang Differential Revision: D26112627 fbshipit-source-id: 50d4418d351527bcedd5ccdc49106bc642699870	2021-02-01 12:39:58 -08:00
Scott Wolchok	673687e764	[PyTorch] Refactor Dispatcher to inline less code in fast path (#51163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51163 The Dispatcher seems to have been in a precarious local maximum: I tried to make several different changes to parameter passing and ended up with regressions due to reduced inlining that swamped any gains I might have gotten from the parameter passing changes. This diff reduces the amount of inline code on the fast path. It should both reduce code size and provide a platform for making further improvements to the dispatcher code. It is a slight performance regression, but it unblocked the following two diffs (which seem to get us back where we were) from landing. ghstack-source-id: 120693163 Test Plan: CI, framework overhead benchmarks to check the size of the regression Compared timing for empty framework overhead benchmark before/after. Build command: `buck build mode/no-gpu //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark mode/opt-clang --show-output` Run with `numactl -m 0 -C 3 path/to/cpp_benchmark -op empty -niter 100` Before: ``` I0126 16:02:04.373075 2135872 bench.cpp:139] Mean 0.266272 I0126 16:02:04.373106 2135872 bench.cpp:140] Median 0.266347 I0126 16:02:04.373111 2135872 bench.cpp:141] Min 0.263585 I0126 16:02:04.373117 2135872 bench.cpp:142] stddev 0.0021264 I0126 16:02:04.373131 2135872 bench.cpp:143] stddev / mean 0.00798581 ``` After: ``` I0126 16:02:30.377992 2137048 bench.cpp:139] Mean 0.27579 I0126 16:02:30.378023 2137048 bench.cpp:140] Median 0.275281 I0126 16:02:30.378029 2137048 bench.cpp:141] Min 0.270617 I0126 16:02:30.378034 2137048 bench.cpp:142] stddev 0.00308287 I0126 16:02:30.378044 2137048 bench.cpp:143] stddev / mean 0.0111783 ``` Yes, it's a regression, but I compared D26069629 stacked on this diff vs not: With this diff: ``` I0126 16:02:50.662864 2137574 bench.cpp:139] Mean 0.268645 I0126 16:02:50.662891 2137574 bench.cpp:140] Median 0.267485 I0126 16:02:50.662896 2137574 bench.cpp:141] Min 0.266485 I0126 16:02:50.662901 2137574 bench.cpp:142] stddev 0.00219359 I0126 16:02:50.662915 2137574 bench.cpp:143] stddev / mean 0.00816537 ``` Without: ``` I0126 20:40:27.815824 3240699 bench.cpp:139] Mean 0.270755 I0126 20:40:27.815860 3240699 bench.cpp:140] Median 0.268998 I0126 20:40:27.815866 3240699 bench.cpp:141] Min 0.268306 I0126 20:40:27.815873 3240699 bench.cpp:142] stddev 0.00260365 I0126 20:40:27.815886 3240699 bench.cpp:143] stddev / mean 0.00961624 ``` So we do seem to have accomplished something w.r.t. not overwhelming the inliner. Reviewed By: ezyang Differential Revision: D26091377 fbshipit-source-id: c9b7f4e187059fa15452b7c75fc29816022b92b1	2021-02-01 12:36:48 -08:00
Jacob Szwejbka	ec611aca88	[Pytorch Mobile] Expose _export_operator_list to python (#51312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51312 Follow up to D24690094 (`4a870f6518`) exposing the api in python. Created matching unit test. ghstack-source-id: 120611452 Test Plan: Ran unit test Reviewed By: dhruvbird Differential Revision: D26112765 fbshipit-source-id: ffe3bb97de0a4f08b31719b4b47dcebd7d2fd42a	2021-02-01 12:09:02 -08:00
James Reed	609f76f27a	[WIP][FX] Add Interpreter and Transformer (#50420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50420 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D25880330 Pulled By: jamesr66a fbshipit-source-id: 27d34888e36e39924821fed891d79f969237a104	2021-02-01 11:40:12 -08:00
Yi Wang	0831984ed5	[Resubmission][Gradient Compression] Refactor default_hooks.py and powerSGD_hook.py by creating a util function that make a vanilla allreduce future (#51400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51400 Resubmission of #51094 Address https://github.com/pytorch/pytorch/pull/50973#discussion_r564229818 Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 120725690 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl Reviewed By: rohan-varma Differential Revision: D26162333 fbshipit-source-id: ccc2eae5383a23673e00d61cb5570fb8bf749cd0	2021-02-01 11:34:41 -08:00
Scott Wolchok	6c24296795	[PyTorch] Devirtualize TensorImpl::has_storage (#51049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51049 This diff makes it OK to query has_storage() on all TensorImpls. I added debug assertions that storage_ is indeed never set on them, which is required for this to be correct. ghstack-source-id: 120714380 Test Plan: CI Reviewed By: ezyang Differential Revision: D26008498 fbshipit-source-id: b3f55f0b57b04636d13b09aa55bb720c6529542c	2021-02-01 11:30:23 -08:00
Scott Wolchok	765062c085	[PyTorch] Devirtualize TensorImpl::storage_offset (#51048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51048 There doesn't seem to be any reason to prohibit accessing the always-zero storage_offset of those TensorImpls that prohibit set_storage_offset. ghstack-source-id: 120714379 Test Plan: CI Reviewed By: ezyang Differential Revision: D26008499 fbshipit-source-id: cd92ac0afdebbd5cf8f04df141843635113b6444	2021-02-01 11:27:13 -08:00
kshitij12345	50fa415a4d	[testing] Add OpInfo for ceil and floor (#51198 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50006 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51198 Reviewed By: malfet Differential Revision: D26105099 Pulled By: mruberry fbshipit-source-id: 6cfa89f42b87cca66dbc5bf474d17a6cad7eb45a	2021-02-01 10:10:36 -08:00
Max Balandat	449098c2d2	[SobolEngine] Update direction numbers to 21201 dims (#49710 ) Summary: Performs the update that was suggested in https://github.com/pytorch/pytorch/issues/41489 Adjust the functionality to largely match that pf the scipy companion PR https://github.com/scipy/scipy/pull/10844/, including - a new `draw_base2` method - include zero as the first point in the (unscrambled) Sobol sequence The scipy PR is also quite opinionated if the `draw` method doesn't get called with a base 2 number (for which the resulting sequence has nice properties, see the scipy PR for a comprehensive discussion of this). Note that this update is a breaking change in the sense that sequences generated with the same parameters after as before will not be identical! They will have the same (better, arguably) distributional properties, but calling the engine with the same seed will result in different numbers in the sequence. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49710 Test Plan: ``` from torch.quasirandom import SobolEngine sobol = SobolEngine(3) sobol.draw(4) sobol = SobolEngine(4, scramble=True) sobol.draw(5) sobol = SobolEngine(4, scramble=True) sobol.draw_base2(2) ``` Reviewed By: malfet Differential Revision: D25657233 Pulled By: Balandat fbshipit-source-id: 9df50a14631092b176cc692b6024aa62a639ef61	2021-02-01 08:44:31 -08:00
Hameer Abbasi	b1907f5ebc	Fix pickling for Tensor subclasses (redo) (#47732 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47051 Redo of https://github.com/pytorch/pytorch/issues/47115 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47732 Reviewed By: izdeby Differential Revision: D25465382 Pulled By: ezyang fbshipit-source-id: 3a8d57281a2d6f57415d5735d34ad307f3526638	2021-02-01 07:32:52 -08:00
anjali411	508bab43e7	Support complex number list in JIT (#51145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51145 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D26154025 Pulled By: anjali411 fbshipit-source-id: 74645f9b6467757ddb9d75846e778222109848f0	2021-01-31 23:54:14 -08:00
Mike Ruberry	40c0fffb4b	Fixes docs (#51439 ) Summary: pytorch_python_doc_build is failing with: ``` Jan 31 04:30:45 /var/lib/jenkins/workspace/docs/source/notes/broadcasting.rst:6: WARNING: 'any' reference target not found: numpy.doc.broadcasting ``` this removes the incorrect reference and adds an updated link. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51439 Reviewed By: ngimel Differential Revision: D26170232 Pulled By: mruberry fbshipit-source-id: 829999db52e1e860d36d626d0d9f26e31283d14b	2021-01-31 22:00:26 -08:00
Jianyu Huang	d1dcd5f287	[fbgemm_gpu] Use the latest philox_cuda_state API for stochastic rounding (#51004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51004 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/493 Follow up on the failure case on FP16 stochastic rounding: - https://github.com/pytorch/pytorch/pull/50148 - D26006041 From Natalia: - https://github.com/pytorch/pytorch/pull/50916 is the fix, philox_engine_inputs is deprecated btw so if you could refactor it to use philox_cuda_state that would be great. - instructions to change the call https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/CUDAGeneratorImpl.h#L48-L83, it will be important to use philox_cuda_state with graph capture. Benchmark: - Before this Diff: ``` (base) [jianyuhuang@devgpu017.atn5.facebook.com: ~/fbsource/fbcode/hpc/ops/benchmarks] $ buck run mode/opt //hpc/ops/benchmarks:split_table_batched_embeddings_benchmark device -- --fp16 --stoc 2>&1 \| tee before_diff.log PARSING BUCK FILES: FINISHED IN 0.4s CREATING ACTION GRAPH: FINISHED IN 0.0s DOWNLOADED 0 ARTIFACTS, 0.00 BYTES, 0.0% CACHE MISS BUILDING: FINISHED IN 5.3s (100%) 6474/6474 JOBS, 0 UPDATED BUILD SUCCEEDED DEBUG:root:Using fused exact_row_wise_adagrad with optimizer_args=OptimizerArgs(stochastic_rounding=True, gradient_clipping=False, max_gradient=1.0, learning_rate=0.1, eps=0.1, beta1=0.9, beta2=0.999, weight_decay=0.0, eta=0.001, momentum=0.9) INFO:root:Embedding parameters: 0.41 GParam, 0.82GB INFO:root:Accessed weights per batch: 83.89MB INFO:root:Forward, B: 512, E: 100000, T: 32, D: 128, L: 20, W: False, BW: 607.48GB/s, T: 138us INFO:root:ForwardBackward, B: 512, E: 100000, T: 32, D: 128, L: 20, BW: 220.85GB/s, T: 1139us ``` - After this Diff: ``` (base) [jianyuhuang@devgpu017.atn5.facebook.com: ~/fbsource/fbcode/hpc/ops/benchmarks] $ buck run mode/opt //hpc/ops/[5/1935] ks:split_table_batched_embeddings_benchmark device -- --fp16 --stoc 2>&1 \| tee after_diff.log PARSING BUCK FILES: FINISHED IN 1.1s CREATING ACTION GRAPH: FINISHED IN 0.0s DEBUG:root:Using fused exact_row_wise_adagrad with optimizer_args=OptimizerArgs(stochastic_rounding=True, gradient_clipping=Fal se, max_gradient=1.0, learning_rate=0.1, eps=0.1, beta1=0.9, beta2=0.999, weight_decay=0.0, eta=0.001, momentum=0.9) INFO:root:Embedding parameters: 0.41 GParam, 0.82GB INFO:root:Accessed weights per batch: 83.89MB INFO:root:Forward, B: 512, E: 100000, T: 32, D: 128, L: 20, W: False, BW: 608.80GB/s, T: 138us INFO:root:ForwardBackward, B: 512, E: 100000, T: 32, D: 128, L: 20, BW: 229.17GB/s, T: 1098us ``` Test Plan: CI Reviewed By: ngimel Differential Revision: D26038596 fbshipit-source-id: 5360395c1c3b1a062b38e5695239258e892c63c4	2021-01-31 20:42:43 -08:00
jiej	0e1c5cb354	fixing index clamping for upsample nearest kernel backward (#51240 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51036 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51240 Reviewed By: ailzhang Differential Revision: D26139221 Pulled By: ngimel fbshipit-source-id: 0591ac6d1f988b54c1b1ee50d34fb7c2a3f97c4e	2021-01-31 15:22:58 -08:00
Rohan Varma	9cf62a4b5d	[1.8] Add additional tests for object-based APIs (#51341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51341 Adds tests for objects that contain CPU/GPU tensors to ensure that they can also be serialized/deserialized appropriately. ghstack-source-id: 120718120 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D26144100 fbshipit-source-id: f1a8ccb9741bb5372cb7809cb43cbe43bf47d517	2021-01-30 19:50:08 -08:00
Rohan Varma	c255628134	[Collective APIs] Make python object collective API args consistent (#50625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50625 Make API signatures consistent and provide default argument similar to the tensor collectives. ghstack-source-id: 120718121 Test Plan: CI Reviewed By: wanchaol Differential Revision: D25932012 fbshipit-source-id: d16267e236a65ac9d55e19e2178f9d9267b08a20	2021-01-30 19:47:16 -08:00
Marat Subkhankulov	721ba97eb6	Create op benchmark for stack (#51263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51263 - Add benchmark for stack op Test Plan: ``` buck build mode/opt //caffe2/benchmarks/operator_benchmark/pt:stack_test --show-output MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 buck-out/gen/caffe2/benchmarks/operator_benchmark/pt/stack_test.par --tag_filter=static_runtime \| grep Execution Forward Execution Time (us) : 6.380 Forward Execution Time (us) : 6.553 Forward Execution Time (us) : 14.904 Forward Execution Time (us) : 5.657 Forward Execution Time (us) : 5.612 Forward Execution Time (us) : 6.051 Forward Execution Time (us) : 4.225 Forward Execution Time (us) : 4.240 Forward Execution Time (us) : 6.280 Forward Execution Time (us) : 6.267 Forward Execution Time (us) : 418.932 Forward Execution Time (us) : 417.694 Forward Execution Time (us) : 1592.455 Forward Execution Time (us) : 2919.261 Forward Execution Time (us) : 211.458 Forward Execution Time (us) : 211.518 Forward Execution Time (us) : 783.953 Forward Execution Time (us) : 1457.823 Forward Execution Time (us) : 2032.816 Forward Execution Time (us) : 2090.662 Forward Execution Time (us) : 6487.098 Forward Execution Time (us) : 11874.702 Forward Execution Time (us) : 2123.830 Forward Execution Time (us) : 2195.453 Forward Execution Time (us) : 6435.978 Forward Execution Time (us) : 11852.205 Forward Execution Time (us) : 2036.526 Forward Execution Time (us) : 2055.618 Forward Execution Time (us) : 6417.192 Forward Execution Time (us) : 12468.744 Forward Execution Time (us) : 4959.704 Forward Execution Time (us) : 5121.823 Forward Execution Time (us) : 5082.105 Forward Execution Time (us) : 5395.936 Forward Execution Time (us) : 5162.756 Forward Execution Time (us) : 23798.080 Forward Execution Time (us) : 4957.921 Forward Execution Time (us) : 4971.234 Forward Execution Time (us) : 5005.909 Forward Execution Time (us) : 5159.614 Forward Execution Time (us) : 5013.221 Forward Execution Time (us) : 20238.741 Forward Execution Time (us) : 7632.439 Forward Execution Time (us) : 7589.376 Forward Execution Time (us) : 7859.937 Forward Execution Time (us) : 8214.213 Forward Execution Time (us) : 11606.562 Forward Execution Time (us) : 34612.919 ``` Reviewed By: hlu1 Differential Revision: D25859143 fbshipit-source-id: a1b735ce87f57b5eb67e223e549248a2cd7663c1	2021-01-30 10:32:14 -08:00
Natalia Gimelshein	e26fccc22b	update profiler doc strings (#51395 ) Summary: Fixes formatting for autograd.profiler doc string (was broken), slightly expands profiler.profile documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51395 Reviewed By: ilia-cher Differential Revision: D26162349 Pulled By: ngimel fbshipit-source-id: ac7af8e0f3dbae2aa899ad815d2311c2758ee57c	2021-01-29 23:37:06 -08:00
Ilia Cherniavskii	17b5683156	Multi-GPU Kineto profiler test (#51391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51391 Adding a test to check the kineto profiler on multiple gpus Test Plan: python test/test_profiler.py Reviewed By: ngimel Differential Revision: D26160788 Pulled By: ilia-cher fbshipit-source-id: f3554f52176cc26e7f331d205f1a514eb03aa758	2021-01-29 23:26:12 -08:00
Hao Lu	11cda929fb	[StaticRuntime] Fix bug in MemoryPlanner (#51342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51342 There is a subtle bug with the MemoryPlanner with regard to view ops with out variant. ``` def forward(self, a: Tensor, shape: List[int]): b = a.reshape(shape) return b + b ``` In this case, if we replace reshape with the out variant, b would be managed by the MemoryPlanner and the storage of its output would have been set to nullptr right after inference by the MemoryPlanner if opts.cleanup_activations is true. Because b is a view of a, the storage of a is also set to nullptr, and this violates the API which promises that a is const. To fix this bug, I changed the MemoryPlanner so that it puts b in the unmanaged part. Test Plan: Add unit test to enforce the constness of inputs ``` buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest ``` Reviewed By: ajyu Differential Revision: D26144203 fbshipit-source-id: 2dbacccf7685d0fe0f0b1195166e0510b2069fe3	2021-01-29 21:16:02 -08:00
Ansley Ussery	09e48dbd33	Handle error during dict expansion (#51374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51374 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D26155995 Pulled By: ansley fbshipit-source-id: 04e924cb641565341c570c6cf5e5eec42e4f9c8b	2021-01-29 18:46:10 -08:00
Natalia Gimelshein	7ab89f58be	expose memory_fraction and gpu_process docs (#51372 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/51372 Reviewed By: mruberry Differential Revision: D26157787 Pulled By: ngimel fbshipit-source-id: 97eac5f12881a2bf62c251f6f7eaf65fdbe34056	2021-01-29 18:22:34 -08:00
Natalia Gimelshein	7d30f67659	remove LegacyDefinitions as it is empty now (#51251 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/51251 Reviewed By: mruberry Differential Revision: D26120574 Pulled By: ngimel fbshipit-source-id: 223b4f358932f47e0af7413752c7db7c35402260	2021-01-29 18:15:11 -08:00
Yanli Zhao	d5541c50a3	add a c++ interface in processGroup to get its backend name (#51066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51066 backend name of a processgroup created using distributed_c10d python API is tracked, but there is no good way to track name of a processgroup created using processGroup c++ API. In some cases, knowing backend name of a processGroup is useful, e,g., log the backend name, or write some codes that have dependency on the known backend. ghstack-source-id: 120628432 Test Plan: unit tests Reviewed By: pritamdamania87 Differential Revision: D26059769 fbshipit-source-id: 6584c6695c5c3570137dc98c16e06cbe4b7f5503	2021-01-29 17:28:42 -08:00
Wanchao Liang	662b6d2115	[dist_optim] update the doc of DistributedOptimizer (#51314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51314 updating the doc of DistributedOptimizer to include TorchScript enablement information Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26156032 Pulled By: wanchaol fbshipit-source-id: 1f3841f55918a5c2ed531cf6aeeb3f6e3a09a6a8	2021-01-29 17:12:52 -08:00
kshitij12345	a88e1d3ddf	[complex] Complex support for masked_scatter and autograd support for masked_scatter and masked_select (#51281 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/33152 Changes * Enable complex support for masked_scatter * Enable half support for masked_scatter CPU * Enable complex autograd support for masked_scatter CPU and masked_select (both CPU and CUDA). Note: Complex Support for masked_scatter CUDA is disabled as it depends on `masked_fill` which is yet to be ported to ATen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51281 Reviewed By: ailzhang Differential Revision: D26127561 Pulled By: anjali411 fbshipit-source-id: 6284926b934942213c5dfc24b5bcc8538d0231af	2021-01-29 13:49:31 -08:00
Brian Skinn	fe645fdfc7	Update _torch_docs.py (#51212 ) Summary: Fix `torch.linalg.qr` reference where it's desired to render fully-qualified name into docs. Suggested fix for https://github.com/pytorch/pytorch/pull/47764/files#r565368195 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51212 Reviewed By: ezyang Differential Revision: D26142496 Pulled By: ailzhang fbshipit-source-id: 052b2085099baa372e3b515b403f25d23cf50785	2021-01-29 13:03:09 -08:00
Arindam Roy	da920fa141	Enable rocm tests in common nn (#51227 ) Summary: Fixes #{issue number} Resubmitting a new PR as the older one got reverted due to problems in test_optim.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51227 Reviewed By: ezyang Differential Revision: D26142505 Pulled By: ailzhang fbshipit-source-id: a2ab5d85630aac2d2ce17652ba19c11ea668a6a9	2021-01-29 12:54:04 -08:00
Eli Uriegas	52609c8c65	.github: Up frequency of stale checks (#51365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51365 We have a pretty big backlog of PRs when it comes to checking for stale and the action only supports processing 30 PRs at a given time. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D26153785 Pulled By: seemethere fbshipit-source-id: 585b36068683e04cf4e2cc59013482f143ec30a3	2021-01-29 12:50:40 -08:00
Ivan Kobzarev	dbfaf966b0	[android] turn on USE_VULKAN for android builds by default (#51291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51291 Turning on USE_VULKAN for android builds Remove standalone android vulkan build Testing all ci jobs (for master): https://github.com/pytorch/pytorch/pull/51292 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D26141891 Pulled By: IvanKobzarev fbshipit-source-id: e8e1a4ab612c0786ce09217ab9370fd75a71eb00	2021-01-29 11:58:21 -08:00
Hong Xu	ebd2a82559	Replace all AT_ASSERTM in RNN_miopen.cpp (#51072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51072 AT_ASSERTM is deprecated and should be replaced by either TORCH_CHECK or TORCH_INTERNAL_ASSERT, depending on the situation. Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D26074364 Pulled By: ezyang fbshipit-source-id: 742e28afe49e0a546c252a0fad487f93410d0cb5	2021-01-29 11:40:38 -08:00
Hong Xu	dfca1e48d3	Replace all AT_ASSERTM under c10/ (except Exception.h) (#50843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50843 AT_ASSERTM is deprecated and should be replaced by either TORCH_CHECK or TORCH_INTERNAL_ASSERT, depending on the situation. Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D26074365 Pulled By: ezyang fbshipit-source-id: 46e13588fad4e24828f3cc99635e9cb2223a6c2c	2021-01-29 11:37:07 -08:00
Shoichiro Kawauchi	c41ca4ae5b	[doc]Fix autograd.detect_anomaly docs incorrectly formatted (#51335 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51141 Two bullet points don't render as bullet points. Before <img width="657" alt="screenshot before" src="https://user-images.githubusercontent.com/19372617/106240701-125a3080-6248-11eb-9572-f915aa9b72e1.png"> After <img width="888" alt="screenshot after" src="https://user-images.githubusercontent.com/19372617/106240714-17b77b00-6248-11eb-8e54-51be103639e9.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/51335 Reviewed By: izdeby Differential Revision: D26148582 Pulled By: ezyang fbshipit-source-id: 5aff6f9bd7affdf13bec965e9bf1a417e5caa88d	2021-01-29 11:18:51 -08:00
Rohan Varma	5021582fe6	Fix benchmarks/distributed/ddp/benchmark.py (#51095 ) Summary: Fixes the issue reported in https://github.com/pytorch/pytorch/issues/50679 by using built-in object-based collectives. User has verified this patch works Test with: RANK=0 python3 pytorch-dist-benchmark.py --world-size 2 --master-addr 127.0.0.1 --master-port 23456 RANK=1 python3 pytorch-dist-benchmark.py --world-size 2 --master-addr 127.0.0.1 --master-port 23456 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51095 Reviewed By: SciPioneer Differential Revision: D26070275 Pulled By: rohan-varma fbshipit-source-id: 59abcaac9e395bcdd8a018bf6ba07521d94b2fdf	2021-01-29 11:10:13 -08:00
Richard Barnes	1b089c1257	Modernize for-loops (#50899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50899 Test Plan: Sandcastle tests + OSS CI Reviewed By: ezyang Differential Revision: D26001931 fbshipit-source-id: d829d520f647aacd178e1c7a9faa6196cc5af54e	2021-01-29 10:52:31 -08:00
Yi Zhang	edaa23c8ab	extend init_group_test timeout to 5s (#51330 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50662 ![image](https://user-images.githubusercontent.com/16190118/106225549-58030300-6220-11eb-948d-1998bdafc245.png) From: https://circleci.com/api/v1.1/project/github/pytorch/pytorch/10203733/output/105/0?file=true&allocation-id=60022ee190b8596d279f4531-0-build%2F195A7D58 (`e86f941395`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51330 Reviewed By: izdeby Differential Revision: D26148618 Pulled By: ezyang fbshipit-source-id: 708d7522843da2f5c919cf41919e6819f89903e2	2021-01-29 10:44:28 -08:00
Ivan Yashchuk	30675d0921	Added OpInfo-based testing of triangular_solve (#50948 ) Summary: Added OpInfo-based testing of `torch.triangular_solve`. These tests helped to discover that CPU `triangular_solve` wasn't working for empty matrices and for CUDA inputs a warning was printed to the terminal. It is fixed now. CUDA gradgrad checks are skipped. ``` 11.44s call test/test_ops.py::TestGradientsCUDA::test_fn_gradgrad_triangular_solve_cuda_complex128 2.97s call test/test_ops.py::TestGradientsCUDA::test_fn_gradgrad_triangular_solve_cuda_float64 1.60s call test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_complex128 1.36s call test/test_ops.py::TestOpInfoCUDA::test_supported_dtypes_triangular_solve_cuda_complex128 1.20s call test/test_ops.py::TestGradientsCUDA::test_fn_grad_triangular_solve_cuda_complex128 0.86s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_triangular_solve_cuda_complex64 0.85s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_triangular_solve_cuda_complex128 0.81s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_triangular_solve_cuda_float64 0.77s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_triangular_solve_cuda_float32 0.46s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_triangular_solve_cpu_complex128 0.44s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_triangular_solve_cpu_complex64 0.44s call test/test_ops.py::TestGradientsCUDA::test_fn_grad_triangular_solve_cuda_float64 0.42s call test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_float64 0.40s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_triangular_solve_cpu_float32 0.40s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_triangular_solve_cpu_float64 0.17s call test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_complex128 ``` Ref. https://github.com/pytorch/pytorch/issues/50006 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50948 Reviewed By: ailzhang Differential Revision: D26123998 Pulled By: mruberry fbshipit-source-id: 54136e8fc8a71f107dddb692c5be298c6d5ed168	2021-01-29 10:31:07 -08:00
Ansley Ussery	1b479416b7	Clarify logic in `ir_emitter` (#51299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51299 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D26131245 Pulled By: ansley fbshipit-source-id: ecd69275214775804f5aa92f9b4c0b19be19b596	2021-01-29 10:05:01 -08:00
Jeffrey Wan	c0966914bc	Internal gradcheck wrapper in testing._internal that sets certain flags to True (#51133 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49409 There are many call sites where, gradcheck/gradgradcheck is now being implicitly invoked with `check_batched_grad` as True, but they were previously False. Cases fall into two basic categories: 1) the call site was previously using `torch.autograd.gradcheck` but is now changed to use the globally imported function instead 3) the call site was already using globally imported function, but does not explicitly pass `check_batched_grad` flag Only in the _assertGradAndGradgradChecks cases, which are infrequent, I assumed that the the author is aware that omitting the flag means not applying check_batched_grad=True. (but maybe that is not the case?) Overall this PR in its current state assumes that unless the author explicitly specified `check_batched_grad=False`, they were just probably not aware of this flag and did not mean to have this flag as False. So far exceptions to the above (as discovered by CI) include: - Mkldnn (opaque tensors do not have strides) https://app.circleci.com/pipelines/github/pytorch/pytorch/264416/workflows/e4d87886-6247-4305-8526-2696130aa9a4/jobs/10401882/tests - all cases in test_sparse (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407103) - all cases in test_overrides (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407236) - test_autograd (test_LSTM_grad_and_gradgrad) - (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407235) - test_data_parallel (test_data_parallel_buffers_requiring_grad) - SIGSEGV (https://app.circleci.com/pipelines/github/pytorch/pytorch/264820/workflows/14d89503-040d-4e3d-9f7b-0bc04833589b/jobs/10422697) - test_nn (https://app.circleci.com/pipelines/github/pytorch/pytorch/264919/workflows/df79e3ed-8a31-4a8e-b584-858ee99686ff/jobs/10427315) Possible TODO is to prevent new tests from invoking external gradcheck. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51133 Reviewed By: ezyang Differential Revision: D26147919 Pulled By: soulitzer fbshipit-source-id: dff883b50f337510a89f391ea2fd87de2d531432	2021-01-29 09:13:37 -08:00
Iurii Zdebskyi	5a406c023e	Revert D26070147: [Gradient Compression] Refactor default_hooks.py and powerSGD_hook.py by creating a util function that make a vanilla allreduce future Test Plan: revert-hammer Differential Revision: D26070147 (`e7b3496232`) Original commit changeset: 8c9339f1511e fbshipit-source-id: fa1e9582baec9759a73b3004be9bb19bdeb6cd34	2021-01-29 09:06:24 -08:00
Yan Li	270111b7b6	split quantization jit op (#51329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51329 Currently the test_qbatch_norm_relu is containing too many examples and causing timeout. Splitting them for now to fix the timeout issue Test Plan: buck test caffe2/test:quantization Reviewed By: supriyar Differential Revision: D26141037 fbshipit-source-id: da877efa78924a252a35c2b83407869ebb8c48b7	2021-01-29 07:49:53 -08:00
Radhakrishnan Venkataramani	3397919dcf	Rowwise Prune op (Add the test to OSS run_test), Make the op private. (#46131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46131 Refer to the title. Test Plan: `buck test caffe2/test:pruning` Reviewed By: raghuramank100 Differential Revision: D24230472 fbshipit-source-id: 8f0a83446c23fdf30d0313b8c3f5ff1a463b50c7	2021-01-29 06:08:18 -08:00
Dhruv Matani	ebe26b81d2	[PyTorch Mobile] Enable partial loading of GPU models on linux CPU machines (#51236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51236 The problem we currently have with tracing is that GPU models can't load on devvm CPU machines. Here's why: 1. The Metal GPU ops don't exist so the validation that checks for missing ops kicks in and prevents loading 2. Even if the check for missing ops is commented out, the actual model contents can't be succssfully loaded (see t83364623 for details) Hence, to work around these problems and allow tracing to detect GPU models, and skip actual tracing for these (as discussed in the meeting 2 weeks ago and based on recommendations from raziel, iseeyuan, and xta0), we're adding code to detect these GPU models based on the set of operators that show up in the file `extra/mobile_info.json`. The code then skips tracing, and picks up the root operators from the model itself. The diff below this one will be removed before landing since we don't want to check in the model - I've kept it here in case anyone wants to patch this diff in and run the command on their devvm locally. ghstack-source-id: 120638092 Test Plan: See {P168657729} for a successful run of tracing on a GPU model (person segmentation tier-0, v1001) provided by xta0 Also ran `buck test //xplat/pytorch_models/build/...` successfully. Reviewed By: ljk53 Differential Revision: D26109526 fbshipit-source-id: 6119b0b59af8aae8b1feca0b8bc29f47a57a1a67	2021-01-29 01:00:08 -08:00
Bert Maher	534aabce14	[nnc] Don't use sleef where it's slower (#51246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51246 Using sleef is sometimes slower than libm (I haven't debugged why). The easy solution is to not use sleef in those cases. With this diff, plus the prior one to use sleef period, we've sped up every unary op: ghstack-source-id: 120614087 Test Plan: `buck run mode/opt -c python.package_style=inplace //caffe2/benchmarks/cpp/tensorexpr:bench_ops` Reviewed By: ZolotukhinM Differential Revision: D26113672 fbshipit-source-id: 6b731ac935b3652c8b3e3f4a5d2baa39ff31323a	2021-01-28 22:35:11 -08:00
Bert Maher	0a9764ecc1	[nnc] Expose vectorized math functions to jit fuser. (#51190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51190 We want to be able to call fast vectorized functions from sleef inside the jit fuser, but only when they're supported by the host processor. Enabling this feature has two parts: 1. Record the addresses of the symbols, assuming they're defined. Sleef only defines vectorized functions if AVX is enabled, so we need to define __AVX__ to get access to those symbols. We don't actually need to compile anything with AVX; the symbols just have to be present. 2. Before emitting a call to sleef, check if the host processor actually has AVX. LLVM makes this easy since we can just check the target feature string for "+avx". ghstack-source-id: 120614086 Test Plan: ``` buck run mode -c python.package_style=inplace //caffe2/benchmarks/cpp/tensorexpr:bench_ops ``` shows a significant speedup on most math functions (esp sigmoid, which goes from 13% of ATen speed to parity). Reviewed By: navahgar Differential Revision: D26096170 fbshipit-source-id: b7268a50d73f8dc03b4db11cc38b8402387eed2d	2021-01-28 22:35:07 -08:00
Bert Maher	d74a226daa	[nnc] Use sleef if its symbols are available (#51187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51187 Instead of relying on #ifdefs, we want to use sleef if its symbols are available. This diff adds the mechanism to do that check using LLVM's symbol lookup. This diff by itself is a no-op, because sleef isn't properly being exposed to LLVM yet (the `#ifdef __AVX__` checks are always false, because torch/jit isn't built with `-mavx`). The next diff will properly expose the symbols, and perform run time checks. ghstack-source-id: 120614091 Test Plan: `buck test //caffe2/test/cpp/tensorexpr:` Reviewed By: Krovatkin Differential Revision: D26096206 fbshipit-source-id: 3f2b37500276e8bf50a167ecf8aeeb295d7ec232	2021-01-28 22:35:03 -08:00
Bert Maher	0a065ebe86	[nnc][trivial] Refactor llvm_jit so the wrapper class doesn't depend on ifdefs (#51186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51186 Just a bit of drive-by cleanup; the wrapper class should be the same for all builds so let's not conditionally compile it for no reason. ghstack-source-id: 120614088 Test Plan: buck build Reviewed By: navahgar Differential Revision: D26096205 fbshipit-source-id: 1e4cb682614fae0e889ba35fb1edb489fb99158e	2021-01-28 22:34:59 -08:00
Bert Maher	1114fd6b3a	[nnc] Refactor generation of intrinsics to reduce the amount of macro-hell (#51125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51125 The big pile of X-macros used for emitting (possibly vectorized) intrinsics makes it really difficult to change that code in any systematic way (which I'm about to do in a later diff). We can factor most of what the macro does into a fairly simple function. There are still macros but they're just a bunch of case/call helper/break boilerplate. ghstack-source-id: 120614089 Test Plan: `buck test mode/opt -c python.package_style=inplace //caffe2/benchmarks/cpp/tensorexpr:bench_ops` Reviewed By: ZolotukhinM Differential Revision: D26078384 fbshipit-source-id: 843548033f73d88c5d9a031c285b92f73be21390	2021-01-28 22:31:49 -08:00
Nikita Shulga	43f0ccd1ec	torch.cuda.memory_allocated to return `{}` if not initialized (#51179 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49952 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51179 Reviewed By: ngimel Differential Revision: D26094932 Pulled By: malfet fbshipit-source-id: 0ec28ef9b0604245753d3f2b0e3536286700668d	2021-01-28 20:38:17 -08:00
Supriya Rao	916af892b3	[quant][fx] Update name of packed weight attributes (#51259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51259 Store the FQN of the module that is using the packed weights (the quantized op) In the case of fusion we update the scope mapping to store the module path of the fused node. Test Plan: python test/test_quantization.py test_packed_weight_fused_op Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26117964 fbshipit-source-id: 9d929997baafb1c91063dd9786a451b0040ae461	2021-01-28 20:31:08 -08:00
Vasiliy Kuznetsov	05c8cd748d	memory efficient per-channel fq: use it everywhere, delete old version (#51265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51265 This PR is the cleanup after #51159. High level, we make the new definition of fake_quant per channel be the definition used by autograd, but keep the old function around as a thin wrapper to keep the user facing API the same. In detail: 1. point fake_quantize_per_channel_affine's implementation to be fake_quantize_per_channel_affine_cachemask 2. delete the fake_quantize_per_channel_affine backward, autograd will automatically use the cachemask backward 3. delete all the fake_quantize_per_channel_affine kernels, since they are no longer used by anything Test Plan: ``` python test/test_quantization.py TestFakeQuantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26120957 fbshipit-source-id: 264426435fabd925decf6d1f0aa79275977ea29b	2021-01-28 19:42:25 -08:00
Vasiliy Kuznetsov	267e243064	fake_quant: more memory efficient per-channel backward (#51255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51255 This is the same as #50561, but for per-channel fake_quant. TODO before land write up better Memory and performance impact (MobileNetV2): TODO Performance impact (microbenchmarks): https://gist.github.com/vkuzo/fbe1968d2bbb79b3f6dd776309fbcffc * forward pass on cpu: 512ms -> 750ms (+46%) * forward pass on cuda: 99ms -> 128ms (+30%) * note: the overall performance impact to training jobs should be minimal, because this is used for weights, and relative importance of fq is dominated by fq'ing the activations * note: we can optimize the perf in a future PR by reading once and writing twice Test Plan: ``` python test/test_quantization.py TestFakeQuantize.test_forward_per_channel_cachemask_cpu python test/test_quantization.py TestFakeQuantize.test_forward_per_channel_cachemask_cuda python test/test_quantization.py TestFakeQuantize.test_backward_per_channel_cachemask_cpu python test/test_quantization.py TestFakeQuantize.test_backward_per_channel_cachemask_cuda ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26117721 fbshipit-source-id: 798b59316dff8188a1d0948e69adf9e5509e414c	2021-01-28 19:39:35 -08:00
Will Constable	f2e41257e4	Back out "Revert D26077905: Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter"" (#51267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51267 Original commit changeset: b70185916502 Test Plan: test locally, oss ci-all, fbcode incl deferred Reviewed By: suo Differential Revision: D26121251 fbshipit-source-id: 4315b7fd5476914c8e5d6f547e1cfbcf0c227781	2021-01-28 19:30:45 -08:00
Yi Wang	e7b3496232	[Gradient Compression] Refactor default_hooks.py and powerSGD_hook.py by creating a util function that make a vanilla allreduce future (#51094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51094 Address https://github.com/pytorch/pytorch/pull/50973#discussion_r564229818 Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 120619680 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl Reviewed By: rohan-varma Differential Revision: D26070147 fbshipit-source-id: 8c9339f1511e8f24cc906b9411cfe4850a5a6d81	2021-01-28 19:03:18 -08:00
Yi Wang	9d731e87de	[Gradient Compression] Explicitly specify the dtype of the error tensor (#50985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50985 Explicitly specify the dtype of error tensor when it is initialized by zeros. Previously if the dtype of input tensor is FP16, the error tensor is still created in FP32, although later it will be assigned by another FP16 tensor (`input_tensor_cp` - `input_tensor`). This change will make the dtype of error tensor look more clear. Additionally, also explicitly specify the dtype if rank-1 tensor buffer is empty. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 120377786 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D26034988 fbshipit-source-id: e0d323d0b77c6a2478cdbe8b31a1946ffd1a07da	2021-01-28 19:03:14 -08:00
Yi Wang	b619d37bb4	[Gradient Compression] Simplify the implementation of error feedback and warm-start (#50981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50981 Since vanilla allreduce will to be applied in the first few iterations, bucket rebuilding process will not affect caching per-variable tensors. Previously the cached tensors used for error feedback and warm-up need to be rebuilt later, because their corresponding input tensors' shape will be changed after the bucket rebuild process. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 120617971 Test Plan: real run Reviewed By: rohan-varma Differential Revision: D26034418 fbshipit-source-id: e8744431c7f3142d75b77b60110e6861c2ff5c14	2021-01-28 18:59:40 -08:00
Randall Hunt	00d4ec840e	clone pytorch.github.io with depth 1 (#48115 ) Summary: Speeds up clone of pytorch.github.io in CI/CD - currently takes ~7 minutes each run. Locally these are the results: 3.73 seconds vs 611.87 seconds. With depth 1: ``` $ time git clone https://github.com/pytorch/pytorch.github.io -b site --depth 1 ... 3.73s user 2.97s system 23% cpu 28.679 total ``` Without: ``` $ time git clone https://github.com/pytorch/pytorch.github.io -b site ... 611.87s user 66.16s system 96% cpu 11:41.99 total ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48115 Reviewed By: mrshenli Differential Revision: D25107867 Pulled By: ranman fbshipit-source-id: b6131b51df53b7f71d9b4905181182699c0c6c09	2021-01-28 18:40:10 -08:00
Ashkan Aliabadi	8a8fac6681	Remove debug-only assertion from vulkan::api::Command::Command as the buffer can legitimately be null. (#51160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51160 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D26131252 Pulled By: AshkanAliabadi fbshipit-source-id: 69f324ceed711753d77ab7c6b6a20a29cdbdf5f9	2021-01-28 18:33:50 -08:00
Ashkan Aliabadi	592a8ad1c8	Define static constexpr variable in at::native::vulkan:::api::Handle. (#51006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51006 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D26131253 Pulled By: AshkanAliabadi fbshipit-source-id: 950bf57b348726fe5da4fed6a8b1e108c7a52e11	2021-01-28 18:30:18 -08:00
lixinyu	5ed0ad4b6a	DataPipe naming convension update (#51262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51262 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26120628 Pulled By: glaringlee fbshipit-source-id: 6855a0dd6d4a93ff93adce1039960ffd7057a827	2021-01-28 17:44:36 -08:00
anjali411	f9f22c8b5c	Add serialization logic for complex numbers (#51287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51287 This reverts commit dfdb1547b9c1934904bfd137b4007d6a46a6f597. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D26131165 Pulled By: anjali411 fbshipit-source-id: 047167fac594ddb670c5e169446e90e74991679a	2021-01-28 17:25:35 -08:00
Ivan Yashchuk	6e4746c1ac	Port cholesky_inverse to ATen (#50269 ) Summary: Now we can remove `_th_potri`! Compared to the original TH-based `cholesky_inverse`, complex (https://github.com/pytorch/pytorch/issues/33152) and batched inputs (https://github.com/pytorch/pytorch/issues/7500) are now supported both on CPU and CUDA. Closes https://github.com/pytorch/pytorch/issues/24685. Closes https://github.com/pytorch/pytorch/issues/24543. Ref. https://github.com/pytorch/pytorch/issues/49421, https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50269 Reviewed By: bdhirsh Differential Revision: D26047548 Pulled By: anjali411 fbshipit-source-id: e4f191e39c684f241b7cb0f4b4c025de082cccef	2021-01-28 16:24:41 -08:00
Matti Picus	9f6e0de548	Update third_party/build_bundled.py (#51161 ) Summary: Follow up https://github.com/pytorch/pytorch/issues/50695. my bad for merging with one missing commit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51161 Reviewed By: ailzhang Differential Revision: D26134761 Pulled By: walterddr fbshipit-source-id: a606f6cfbb5c48b3c6f3859896522f294e1b077e Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>	2021-01-28 14:41:09 -08:00
Jerry Zhang	7097c0d4f3	[quant][graphmode][fx] Add support for functional conv1d and conv3d (#51155 ) (#51254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51254 This PR added support for quantizing functional conv1d, conv3d, conv1d_relu and conv3d_relu Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_functional_conv Imported from OSS Reviewed By: vkuzo Differential Revision: D26116172 fbshipit-source-id: 56e7d799c11963fe59ee3a1b6eb23f52007b91dc	2021-01-28 14:32:32 -08:00
Eli Uriegas	35990b5f56	.github: Remove title from stale alert (#51306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51306 Title wasn't rendering correctly so let's just remove it altogether, it shouldn't matter that much in the long run Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D26134907 Pulled By: seemethere fbshipit-source-id: 54485cb66fb57f549255f9e7bcfb39b51fe69776	2021-01-28 14:23:21 -08:00
Richard Zou	1379842f4a	Add private mechanism to toggle vmap fallback warnings (#51218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51218 Fixes #51144. Context ======= Users have complained about warning spam from batched gradient computation. This warning spam happens because warnings in C++ don't correctly get turned into Python warnings when those warnings arise from the autograd engine. To work around that, this PR adds a mechanism to toggle vmap warnings. By default, the vmap fallback will not warn when it is invoked. However, by using `torch._C._debug_only_display_vmap_fallback_warnings(enabled)`, one can toggle the existence of vmap fallback warnings. This API is meant to be a private, debug-only API. The goal is to be able to non-intrusively collect feedback from users to improve performance on their workloads. What this PR does ================= This PR adds an option to toggle vmap warnings. The mechanism is toggling a bool in ATen's global context. There are some other minor changes: - This PR adds a more detailed explanation of performance cliffs to the autograd.functional.{jacobian, hessian} documentation - A lot of the vmap tests in `test_vmap.py` rely on the fallback warning to test the presence of the fallback. In test_vmap, I added a context manager to toggle on the fallback warning while testing. Alternatives ============ I listed a number of alternatives in #51144. My favorite one is having a new "performance warnings mode" (this is currently a WIP by some folks on the team). This PR is to mitigate the problem of warning spam before a "performance warnings mode" gets shipped into PyTorch Concerns ======== I am concerned that we are advertising a private API (`torch._C._debug_only_display_vmap_fallback_warnings(enabled)`) in the PyTorch documentation. However, I hope the naming makes it clear to users that they should not rely on this API (and I don't think they have any reason to rely on the API). Test Plan ========= Added tests in `test_vmap.py` to check: - by default, the fallback does not warn - we can toggle whether the fallback warns or not Test Plan: Imported from OSS Reviewed By: pbelevich, anjali411 Differential Revision: D26126419 Pulled By: zou3519 fbshipit-source-id: 95a97f9b40dc7334f6335a112fcdc85dc03dcc73	2021-01-28 13:05:00 -08:00
Eli Uriegas	f68e5f1dbf	.github: Update stale messaging add newlines (#51298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51298 newlines weren't being respected so just add them through the `<br>` html tag. Also changes the wording for open source ones to designate that a maintainer may be needed to unstale a particular PR. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D26131126 Pulled By: seemethere fbshipit-source-id: 465bfc0ba4dc16a7a90e0c03c33d551184e35f5b	2021-01-28 12:39:29 -08:00
Kurt Mohler	b028653670	Add missing -inf order for linalg.norm OpInfo (#51233 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/50746 I accidentally missed the `ord=-inf` case in the OpInfo for `torch.linalg.norm` when I wrote it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51233 Reviewed By: malfet Differential Revision: D26117160 Pulled By: anjali411 fbshipit-source-id: af921c1d8004783612b3a477ae2025a82860ff4e	2021-01-28 12:33:00 -08:00
Abdelrauf	8b27c2ccca	add mising VSX dispatches (#51217 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51132 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51217 Reviewed By: malfet Differential Revision: D26120485 Pulled By: ezyang fbshipit-source-id: d83384964f9980c9a921d0c7159f07e88025ea92	2021-01-28 12:17:50 -08:00
Pritam Damania	96cedefd8e	[Pipe] Refactor convert_to_balance under non-test package. (#50860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50860 Since fairscale.nn.Pipe still uses 'balance' and 'devices' parameters, other frameworks like fairseq still use these parameters. As a result, the `convert_to_balance` method is a nice utility to use for migrating to PyTorch Pipe without changing a lot of code in other frameworks. In addition to this I've renamed the method to be more illustrative of what it does and also allowed an optional devices parameter. ghstack-source-id: 120430775 Test Plan: 1) waitforbuildbot 2) Tested with fairseq Reviewed By: SciPioneer Differential Revision: D25987273 fbshipit-source-id: dccd42cf1a74b08c876090d3a10a94911cc46dd8	2021-01-28 12:10:21 -08:00
Michael Carilli	cedfa4ccd8	Make DeviceCachingAllocator's error handling more defensive and a bit easier to read (#51158 ) Summary: ^ Currently, `alloc_block`'s error handling has a couple (imo) minor flaws. It might clear the error state even if the error had nothing to do with memory allocation. It might also clear the error state even if it didn't attempt a cudaMalloc, meaning it might clear an error state that came from some completely unrelated earlier cuda call. The diffs and comments are the best explanation of my preferred (new) error-checking policy. The diffs add very little work to the common (successful, allocation satisfied by existing block) hot path. Most of the additional logic occurs in `alloc_block`, which is a slow path anyway (it tries cudaMalloc). Pull Request resolved: https://github.com/pytorch/pytorch/pull/51158 Reviewed By: malfet, heitorschueroff Differential Revision: D26101515 Pulled By: ezyang fbshipit-source-id: 6b447f1770974a04450376afd9726be87af83c48	2021-01-28 10:54:20 -08:00
Zachary DeVito	33d5180684	[fx] improve args mutation error (#51175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51175 gives a suggestion about how to deal with immutable args/kwargs list Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D26093478 Pulled By: zdevito fbshipit-source-id: 832631c125561c3b343539e887c047f185060252	2021-01-28 10:19:38 -08:00
Luca Wehrstedt	4288f08d30	Enable TensorPipe's CUDA GDR channel (#50763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50763 ghstack-source-id: 120561489 Test Plan: Exported to GitHub Reviewed By: mrshenli Differential Revision: D25959672 fbshipit-source-id: b70f4b130806bf430869170bf4412697a6910275	2021-01-28 10:12:28 -08:00
Eli Uriegas	cc211bb43e	.github: Add workflow to stale pull requests (#51237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51237 Stales pull requests at 150 days and then will close them at 180 days Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: stonks Reviewed By: yns88 Differential Revision: D26112086 Pulled By: seemethere fbshipit-source-id: c6b3865aa5cde3415b6dd6622c308895a16e805f	2021-01-28 09:37:55 -08:00
Luca Wehrstedt	c9cebaf9b8	Enable TensorPipe's InfiniBand transport (#50761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50761 ghstack-source-id: 120561368 Test Plan: Ran CI on GitHub Reviewed By: mrshenli Differential Revision: D25959502 fbshipit-source-id: 3d0a49546a6ac175608b677986d4344fbb1cf845	2021-01-28 08:43:32 -08:00
Supriya Rao	288b94a8ee	[quant][fx] Make scale, zero_point buffers in the model, use FQN (for quantize_per_tensor ops) (#51171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51171 Following up on previous PR, this PR makes scale and zero_point for quantize_per_tensor to be registered as buffers in the module. Currently the dtype is still stored as attr (not registered as buffer) since we can only register tensor types. Test Plan: python test/test_quantization.py test_qparams_buffers Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26092964 fbshipit-source-id: a54d914db7863402f2b5a3ba2c8ce8b27c18b47b	2021-01-28 08:35:46 -08:00
Supriya Rao	4c3f59b70e	[quant][fx] Make scale, zero_point buffers in the model and use FQN (for quantized ops) (#51166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51166 Currently scale and zero_point values are stored as constant values in the graph. This prevents these values from being updated in the graph and also does not enable saving these values to state_dict After this PR we store scale/zero_point values for quantized ops as buffers in the root module and createe get_attr nodes for them in the graph. We also use the FQN of the module where the quantized ops are present to name these attributes so that they can be uniquely identified and mapped to quantized ops. Test Plan: python test/test_quantization.py TestQuantizeFx.test_qparams_buffers Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26092965 fbshipit-source-id: b549b2d3dccb45c5d38415ce95a09c26f5bd590b	2021-01-28 08:35:42 -08:00
Supriya Rao	096adf4b8b	[quant][fx] Scope support for call_function in QuantizationTracer (#51086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51086 Previously we only supported getting scope for call_module and custom qconfig dict for call_module. This PR extends the Scope class to record the scope for all node types. For call_function qconfig if module_name is specified it takes precedence over function qconfig. Test Plan: python test/test_quantization.py test_qconfig_for_call_func Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26077602 fbshipit-source-id: 99cdcdedde2280e51812db300e17d4e6d8f477d2	2021-01-28 08:32:24 -08:00
Nikitha Malgi	b955da3310	Adding correct error message for for..else (#51258 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51040 ======== Add error message for for..else statement in Torchscript Pull Request resolved: https://github.com/pytorch/pytorch/pull/51258 Test Plan: ===== pytest -k test_for_else test/test_jit.py Reviewed By: pbelevich Differential Revision: D26125148 Pulled By: nikithamalgifb fbshipit-source-id: 82b67ab1c68e29312162ff5d73b82c8c0c9553df	2021-01-28 08:17:31 -08:00
generatedunixname89002005325676	7a8c64da4d	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D26122735 fbshipit-source-id: 0ff54a67192835c2daa331c1f13c252a96f494cb	2021-01-28 04:35:22 -08:00
Garret Catron	0e8e739a9f	Move AcceleratedGraphModule out of graph_manipulation. (#51220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51220 testing with OS this time... Reviewed By: jfix71 Differential Revision: D26105140 fbshipit-source-id: b4b7a8f0f4cc8f96f9f8b270277a71061d5e5e84	2021-01-28 02:39:12 -08:00
Facebook Community Bot	df07e1cea8	Automated submodule update: tensorpipe (#51203 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `228f060ccc` Pull Request resolved: https://github.com/pytorch/pytorch/pull/51203 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D26099684 Pulled By: yns88 fbshipit-source-id: 0ff985e9d4914d0d00120f96d0f5ba77371f005c	2021-01-28 01:22:01 -08:00
jjsjann123	392abde8e6	patch nvrtc API for cuda TK >= 11.1 (#50319 ) Summary: CUDA TK >= 11.1 provides ptxjitcompiler that emits SASS instead of PTX. 1. This gives better backward-compatibility that allows future TK to work with older driver, which might not necessarily be able to load generated PTX through JIT compile and would error out at runtime; https://docs.nvidia.com/deploy/cuda-compatibility/#using-ptx 2. Meanwhile, SASS doesn't provide good future compatibility, so for unsupported arch, we fallback to PTX to support future device. https://docs.nvidia.com/deploy/cuda-compatibility/index.html#cubin-compatibility Pull Request resolved: https://github.com/pytorch/pytorch/pull/50319 Reviewed By: malfet Differential Revision: D26114475 Pulled By: ngimel fbshipit-source-id: 046e9e7b3312d910f499572608a0bc1fe53feef5	2021-01-27 23:58:20 -08:00
Peter Bell	9fe7c0633f	Add centered FFT example to fftshift docs (#51223 ) Summary: Closes https://github.com/pytorch/pytorch/issues/51022 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51223 Reviewed By: malfet Differential Revision: D26110201 Pulled By: mruberry fbshipit-source-id: c659c5dca30eda4b67ed6d931a93de9a33e72895	2021-01-27 23:50:48 -08:00
Hao Lu	d035d56bfb	[StaticRuntime] Add out variant for reshape and flatten (#51249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51249 - Add out variant for reshape and flatten. reshape and flatten only create tensor views when it can. In cases where it can't, it does a copy. The out variant reuses the TensorImpl for both cases. The difference is that the TensorImpl is a view in the first case, but a normal TensorImpl in the second case. - Create a separate registry for the view ops with out variants. Because Tensor views can't participate in memory reuse (memonger), we need to track these ops separately. - The MemoryPlanner does not track the StorageImpl of tensor views because they don't own the storage, however, in cases where reshape does not create a view, the MemoryPlanner does manage the output tensor. Reviewed By: ajyu Differential Revision: D25992202 fbshipit-source-id: dadd63b78088c129e491d78abaf8b33d8303ca0d	2021-01-27 22:44:11 -08:00
Akshit Khurana	16132a4b1d	Make sure ConstantPadNd op preserves memory format (#50898 ) Summary: * ConstantPadNd op didn't preserve memory format for non quantized cases Pull Request resolved: https://github.com/pytorch/pytorch/pull/50898 Test Plan: pytest test/test_nn.py::TestConstPadNd Reviewed By: kimishpatel Differential Revision: D26003407 Pulled By: axitkhurana fbshipit-source-id: a8b56d32734772acae6f5c2af4dfe0bd3434cab1	2021-01-27 22:36:44 -08:00
Peter Bell	52ab858f07	STFT: Improve error message when window is on wrong device (#51128 ) Summary: Closes https://github.com/pytorch/pytorch/issues/51042 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51128 Reviewed By: mruberry Differential Revision: D26108998 Pulled By: ngimel fbshipit-source-id: 1166c19c2ef6846e29b16c1aa06cb5c1ce3ccb0d	2021-01-27 22:31:57 -08:00
Jiakai Liu	83287a6f2b	[pytorch] change codegen dispatch key from string to enum (#51115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51115 Add enum type for dispatch key. Prepare to implement the DispatchTable computation logic in python for static dispatch. Verified byte-for-byte compatibility of the codegen output. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D26077430 Pulled By: ljk53 fbshipit-source-id: 86e74f3eb32266f31622a2ff6350b91668c8ff42	2021-01-27 22:28:52 -08:00
Hao Lu	773c71cb3a	[atem] Fix type check bug in bmm_out_or_baddbmm_ (#51248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51248 Fix bug raised in https://github.com/pytorch/pytorch/issues/50980 by adding a dtype check back to bmm_out_or_baddbmm_. Test Plan: ``` buck test //caffe2/test:linalg buck test //caffe2/aten:math_kernel_test ``` Reviewed By: ngimel Differential Revision: D26113575 fbshipit-source-id: 0d6e03eae70822f8ceeffefd915aee01030304ce	2021-01-27 21:51:04 -08:00
Meghan Lele	88baf470d1	[JIT] Provide more info when attribute fails to convert (#50870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50870 Summary Module attributes whose types cannot be determined based on annotations or inference based on their values at script time are added to the concrete type of the corresponding module as "failed attributes". Any attempt to access them in scripted code produces an error with a message explaining that the attribute could not be contributed to a corresponding attribute on the TorchScript module. However, this error is not more specific than that. This commit modifies `infer_type` in `_recursive.py` so that it returns `c10::InferredType` instead, which allows more information about typing failures to be communicated to the caller through the `reason()` method on this class. This information is appended to the hint added to the module concrete type for failed attributes. Testing This commit adds a unit test to `test_module_containers.py` that checks that extra information is provided about the reason for the failure when a module attribute consisting of a list of `torch.nn.Module` fails to convert. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26091472 Pulled By: SplitInfinity fbshipit-source-id: fcad6588b937520f250587f3d9e005662eb9af0d	2021-01-27 20:37:10 -08:00
Mike Ruberry	12a434abbc	Revert D26077905: Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter" Test Plan: revert-hammer Differential Revision: D26077905 (`dc2a44c4fc`) Original commit changeset: fae83bf9822d fbshipit-source-id: b70185916502ba9ebe16d781cf0659b9f7865c9a	2021-01-27 19:53:29 -08:00
Mike Ruberry	dfdb1547b9	Revert D26094906: Add serialization logic for complex numbers Test Plan: revert-hammer Differential Revision: D26094906 (`2de4ecd4eb`) Original commit changeset: 7b2614f3ee4a fbshipit-source-id: 6f32a9fc6bb2a904ca1a282bbc6b2df0aee50068	2021-01-27 19:44:26 -08:00
Vasiliy Kuznetsov	0335222a4a	memory efficient fq: use it everywhere, delete the old version (#51159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51159 This PR is the cleanup after #50561. High level, we make the new definition of fake_quant be the definition used by autograd, but keep the old function around as a thin wrapper to keep the user facing API the same. In detail: 1. point `fake_quantize_per_tensor_affine`'s implementation to be `fake_quantize_per_tensor_affine_cachemask` 2. delete the `fake_quantize_per_tensor_affine` backward, autograd will automatically use the cachemask backward 3. delete all the `fake_quantize_per_tensor_affine` kernels, since they are no longer used by anything Test Plan: ``` python test/test_quantization.py TestFakeQuantize ``` performance testing was done in the previous PR. Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26090869 fbshipit-source-id: fda042881f77a993a9d15dafabea7cfaf9dc7c9c	2021-01-27 19:39:05 -08:00
Vasiliy Kuznetsov	983b8e6b62	fake_quant: add a more memory efficient version (#50561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50561 Not for review yet, a bunch of TODOs need finalizing. tl;dr; add an alternative implementation of `fake_quantize` which saves a ask during the forward pass and uses it to calculate the backward. There are two benefits: 1. the backward function no longer needs the input Tensor, and it can be gc'ed earlier by autograd. On MobileNetV2, this reduces QAT overhead by ~15% (TODO: link, and absolute numbers). We add an additional mask Tensor to pass around, but its size is 4x smaller than the input tensor. A future optimization would be to pack the mask bitwise and unpack in the backward. 2. the computation of `qval` can be done only once in the forward and reused in the backward. No perf change observed, TODO verify with better matrics. TODO: describe in more detail Test Plan: OSS / torchvision / MobileNetV2 ``` python references/classification/train_quantization.py --print-freq 1 --data-path /data/local/packages/ai-group.imagenet-256-smallest-side/prod/ --output-dir ~/nfs/pytorch_vision_tests/ --backend qnnpack --epochs 5 TODO paste results here ``` TODO more Imported from OSS Reviewed By: ngimel Differential Revision: D25918519 fbshipit-source-id: ec544ca063f984de0f765bf833f205c99d6c18b6	2021-01-27 19:36:04 -08:00
Ilia Cherniavskii	d14d8c7f7f	Add convenience import (#51195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51195 Add kineto_available to torch.profiler Test Plan: >>> import torch.profiler >>> torch.profiler.kineto_available() True Reviewed By: ngimel Differential Revision: D26113906 Pulled By: ilia-cher fbshipit-source-id: fe4502d29d10d8bd9459b0504aa0ee856af43acc	2021-01-27 19:23:50 -08:00
Ilia Cherniavskii	ea0d304e2e	Rewrite "ProfilerStep#<num>" in profiler output (#51194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51194 Aggregate all "ProfilerStep#<num>" together Test Plan: python test/test_profiler.py -k test_kineto_profiler_api Reviewed By: ngimel Differential Revision: D26113907 Pulled By: ilia-cher fbshipit-source-id: 2bc803befc85153f07e770ea3c37b57e2870a1ba	2021-01-27 19:23:46 -08:00
Ilia Cherniavskii	4fb33f1d3a	Trim profiler file paths (#51192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51192 Trim profiler file paths when using stack traces Test Plan: python test/test_profiler.py -k test_source ``` SumBackward0 0.02% 6.000us 0.51% 154.000us 154.000us 1 test/test_profiler.py(91): test_source ...conda3/envs/pytorch/lib/python3.8/unittest/case.py(633): _callTestMethod ...r/local/miniconda3/envs/pytorch/lib/python3.8/unittest/case.py(676): run ...al/miniconda3/envs/pytorch/lib/python3.8/unittest/case.py(736): __call__ .../local/miniconda3/envs/pytorch/lib/python3.8/unittest/suite.py(122): run ``` Reviewed By: ngimel Differential Revision: D26113905 Pulled By: ilia-cher fbshipit-source-id: 2b71c31b6c4437855d33013d42d977745e6f489f	2021-01-27 19:12:27 -08:00
BowenBao	e2eb97dd76	[ONNX] Fix param names (#50764 ) (#50955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50955 Preserve name of parameters for ONNX. Looks like output->copyMetadata(input) API is giving the same debugName to the output. So the name of the original input is changed. This update avoid the name change by just copying the type. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26050880 Pulled By: SplitInfinity fbshipit-source-id: 8b04e41e6df7f33c5c9c873fb323c21462fc125b	2021-01-27 17:49:11 -08:00
BowenBao	84e9bff85d	[ONNX] Replace optional parameters of Resize with placeholder for ops13. (#50574 ) (#50954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50954 * Replace optional parameters of Resize with placeholder for ops13. * Use common methods to handle different versions. * Correct flake8 issue. * Update per comments. * Add something to trigger CI again. * Trigger another round of CI. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26050882 Pulled By: SplitInfinity fbshipit-source-id: aea6205a1ba4a0621fe1ac9e0c7d94b92b6d8f21	2021-01-27 17:49:07 -08:00
BowenBao	68034197e8	[ONNX] Support gelu for fp16 export (#50487 ) (#50911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50911 Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double) Test case skipped in CI due to that current CI job env does not have CUDA support. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26050889 Pulled By: SplitInfinity fbshipit-source-id: 1fdde23a68d4793e6b9a82840acc213e5c3aa760	2021-01-27 17:49:02 -08:00
BowenBao	70dcfe2991	[ONNX] Enable _jit_pass_onnx_fold_if only when dynamic_axes is None (#50582 ) (#50910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50910 Fixing pytorch/vision#3251 (PR #49410 triggers the torch vision test build failure, on three tests test_faster_rcnn, test_mask_rcnn, test_keypoint_rcnn. ) The offending PR is fine on pytorch UT, because the torchvision and pytorch test has a gap when we merge them - we are using different test API on two sides, therefore causing some discrepancy. This PR bridge the gap for the above three tests, and disable _jit_pass_onnx_fold_if pass until it gets fixed. Allow _jit_pass_onnx_fold_if only when dynamic_axes is None. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26050886 Pulled By: SplitInfinity fbshipit-source-id: b765ffe30914261866dcc761f0d0999fd16169e3	2021-01-27 17:48:58 -08:00
BowenBao	e90a480d40	[ONNX] Add logical_and, logical_or, logical_xor torch op support in pytorch exporter (#50570 ) (#50909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50909 Fixes #{} Add logical_and, logical_or, logical_xor torch op support in pytorch exporter. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26050884 Pulled By: SplitInfinity fbshipit-source-id: 2db564e9726c18a3477f9268a0ff862cd2c40e4d	2021-01-27 17:48:53 -08:00
BowenBao	b308fb78d1	[ONNX] Add binary_cross_entropy_with_logits op to ONNX opset version 12 (#49675 ) (#50908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50908 Fixes #{#47997} Exporting the operator binary_cross_entropy_with_logits to ONNX opset version 12. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26050885 Pulled By: SplitInfinity fbshipit-source-id: e4167895eed804739aa50481679500a4d564b360	2021-01-27 17:48:49 -08:00
BowenBao	1723ab53c4	[ONNX] Update Reducesum operator for opset 13 (#50532 ) (#50907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50907 * udpate symbolic for squeeze/unsqueeze * update c++ unsqueeze/squeeze creation * clang format * enable tests * clang format * remove prints * remove magic number * add helper function * fix build issue * update opset9 symbolic with helper function * fix utility test * fix prim_fallthrough opset skip * enable reducesum opset 13 * enable embedding_bag which contain reducesum op * add ReduceSum helper * remove block_listed_operators * remove local test code * remove embedding_bag() in opset13 file * remove unuse import Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26050888 Pulled By: SplitInfinity fbshipit-source-id: 88307af6a7880abf94eac126ec1638e962de8c1f Co-authored-by: BowenBao <bowbao@microsoft.com> Co-authored-by: hwangdeyu <deyhuang@qq.com>	2021-01-27 17:48:45 -08:00
BowenBao	7e4c956955	[ONNX] Support opset13 Squeeze and Unsqueeze (#50150 ) (#50906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50906 In opset 13, squeeze/unsqueeze is updated to take axes as input, instead of attribute. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26050883 Pulled By: SplitInfinity fbshipit-source-id: 7b5faf0e016d476bc75cbf2bfee6918d77e8aecd	2021-01-27 17:48:40 -08:00
BowenBao	1c9347c666	[ONNX] Use parameter values in onnx shape inference (#49706 ) (#50905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50905 Adds an additional run of onnx shape inference after constant folding, since initializer may have changed and affected shape inference. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26050881 Pulled By: SplitInfinity fbshipit-source-id: 9e5d69c52b647133cd3a0781988e2ad1d1a9c09d	2021-01-27 17:45:32 -08:00
Will Constable	dc2a44c4fc	Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter" (#51124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51124 Original commit changeset: 1c7133627da2 Test Plan: Test locally with interpreter_test and on CI Reviewed By: suo Differential Revision: D26077905 fbshipit-source-id: fae83bf9822d79e9a9b5641bc5191a7f3fdea78d	2021-01-27 16:49:42 -08:00
Mikhail Zolotukhin	e975169426	[TensorExpr] Redesign `Tensor` class. (#50995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50995 This change makes 'Tensor' a thin wrapper over 'Buf' and 'Stmt', and merges it with recently introduced 'CompoundTensor'. A statement for the tensor is either passed directly to the Tensor constructor (akin to 'CompoundTensor'), or is built immediately in constructor. LoopNest is no longer responsible for constructing statements from tensors - it simply stitches already constructed statements contained in Tensors. This has a side effect that now we cannot construct several loopnests from the same tensors - we need to explicitly clone statements if we want to do that. A special copy constructor was added to LoopNest to make it more convenient (note: this only affects tests, we don't usually create multiple loopnests in other places). Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D26038223 Pulled By: ZolotukhinM fbshipit-source-id: 27a2e5900437cfb0c151e8f89815edec53608e17	2021-01-27 16:14:22 -08:00
Mikhail Zolotukhin	b804084428	[TensorExpr] Move 'lowerToStmt' method from 'LoopNest' to 'Tensor'. (#50994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50994 Eventually, 'Tensor' will be fully responsible for its 'Stmt' and moving this method to it is one step in that direction. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D26038222 Pulled By: ZolotukhinM fbshipit-source-id: 0549f0ae6b46a93ff7608a22e79faa5115eef661	2021-01-27 16:14:18 -08:00
Mikhail Zolotukhin	42aeb68128	[TensorExpr] Move 'initializer' field from 'Tensor' to 'Buf'. (#50993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50993 This is the first step to make 'Tensor` a thin wrapper over 'Buf' and 'Stmt', which will be finished in subsequent PRs. This change also allows to remove 'buf_initializers_' from 'LoopNest', making it "less stateful". Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D26038224 Pulled By: ZolotukhinM fbshipit-source-id: f418816e54c62f291fa45812901487394e9b95b5	2021-01-27 16:10:53 -08:00
Luca(Wei) Chen	3f23ad5bce	[Bug] fix for module_has_exports (#50680 ) Summary: The attributes in `dir(mod)` may not be valid, this will throw error when calling `getattr`. Use `hasattr` to test if it is valid. Here is an example: ```python class A: def __init__(self, x): if x: self._attr = 1 property def val(self): return getattr(self, '_attr') a = A(False) print('val' in dir(a)) print(hasattr(a, 'val')) b = A(True) print('val' in dir(b)) print(hasattr(b, 'val')) ``` And the outputs: ``` True False True True ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50680 Reviewed By: malfet Differential Revision: D26103975 Pulled By: eellison fbshipit-source-id: 67a799afe7d726153c91654d483937c5e198ba94	2021-01-27 16:03:24 -08:00
Scott Wolchok	1321f2bfe6	[PyTorch] Port Caffe2 opti for BatchMatMul batch size 1 to baddbmm (#51057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51057 Caffe2 has an [optimization](`f8eefbdf7a/caffe2/operators/batch_matmul_op.h (L192)`) for the case where the batch size is 1 that uses the underlying `gemm` instead of `gemm_batched` BLAS function. This diff tries to port that optimization to `baddbmm_mkl`. Note that I have very little linear algebra background and am just going off existing code and cblas API documentation, so please review without assuming I know what I'm doing with the math itself. ghstack-source-id: 120342923 Reviewed By: hlu1 Differential Revision: D26056613 fbshipit-source-id: feef80344b96601fc2bd0a2e8c8f6b57510d7856	2021-01-27 15:59:57 -08:00
yangu	98d9a6317d	Rename profile.next_step() to profile.step() to consistent with optimizer.step() (#51032 ) Summary: Similar with Optimizer.step(), profile.next_step() occurs every iteration and calls at the end of each iteration. So it's better to make them same naming style. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51032 Reviewed By: heitorschueroff Differential Revision: D26097847 Pulled By: ilia-cher fbshipit-source-id: ea2e5c8e865d99f90b004ec7797271217efeeb68	2021-01-27 15:52:58 -08:00
Ailing Zhang	621198978a	Move USE_NUMPY to more appropriate targets (#51143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51143 Test Plan: CI Reviewed By: wconstab Differential Revision: D26084123 fbshipit-source-id: af4abe4ef87c1ebe5434938320526a925f5c34c8	2021-01-27 15:44:12 -08:00
anjali411	2de4ecd4eb	Add serialization logic for complex numbers (#50885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50885 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D26094906 Pulled By: anjali411 fbshipit-source-id: 7b2614f3ee4a30c4b4cf04aaa3432988b38a0721	2021-01-27 15:19:36 -08:00
Lillian Johnson	3b6f30824c	OpInfo JIT op.output_func handling support (#50775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50775 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25964541 Pulled By: Lilyjjo fbshipit-source-id: 8cf1ee9191d526cc46ae283f38c2d64bd60afdb2	2021-01-27 15:04:23 -08:00
kshitij12345	eaf5ca09dc	Migrate masked_scatter_ CUDA to ATen (#50039 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49542 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50039 Reviewed By: heitorschueroff Differential Revision: D26096247 Pulled By: ngimel fbshipit-source-id: ec1810d3412e0d7ab6b950265a3123519ad886c1	2021-01-27 14:17:02 -08:00
Scott Wolchok	1c8d11c9e2	[PyTorch] Save a refcount bump in make_variable (#51180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51180 This fast path still did a refcount bump because it copied the inner intrusive_ptr to the stack. Now it's moved. ghstack-source-id: 120460258 Test Plan: 1) profile empty benchmark & inspect assembly to verify move 2) run framework overhead benchmarks Reviewed By: bhosmer Differential Revision: D26094951 fbshipit-source-id: b2e09f9ad885cb633402885ca1e61a370723f6b8	2021-01-27 14:09:30 -08:00
Mike Ruberry	f7e90cf311	Revert D26089965: [quant][graphmode][fx] Add support for functional conv1d and conv3d Test Plan: revert-hammer Differential Revision: D26089965 (`dd1a97b3ae`) Original commit changeset: 4aea507d05b7 fbshipit-source-id: f54184cafb9dd07858683489d8bd147474e7e4b3	2021-01-27 13:27:10 -08:00
Pritam Damania	40eea6d9d1	Support device map for distributed autograd while using TensorPipe. (#44859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44859 TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: https://github.com/pytorch/pytorch/issues/44170 ghstack-source-id: 119950842 Test Plan: 1) waitforbuildbot 2) Unit test added. Reviewed By: mrshenli Differential Revision: D23751975 fbshipit-source-id: 2717d0ef5bde3db029a6172d98aad95734d52140	2021-01-27 13:01:44 -08:00
kshitij12345	6d098095eb	[numpy] torch.lgamma: promote integer inputs to float (#50140 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50140 Reviewed By: mrshenli Differential Revision: D25951094 Pulled By: mruberry fbshipit-source-id: e53f1dbddff889710f05d43dbc9587382d3decb0	2021-01-27 12:08:46 -08:00
Jerry Zhang	dd1a97b3ae	[quant][graphmode][fx] Add support for functional conv1d and conv3d (#51155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51155 This PR added support for quantizing functional conv1d, conv3d, conv1d_relu and conv3d_relu Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_functional_conv Imported from OSS Reviewed By: vkuzo Differential Revision: D26089965 fbshipit-source-id: 4aea507d05b744807e993f6d3711ab308fb7591b	2021-01-27 12:00:35 -08:00
Eli Uriegas	1b7a4f9cde	.github: Add GitHub Actions workflow to build wheels (#50633 ) Summary: Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/50633 Reviewed By: samestep Differential Revision: D26083492 Pulled By: seemethere fbshipit-source-id: c133671b9cf5074539133ee79fca5c680793a85d	2021-01-27 11:52:28 -08:00
Luca Wehrstedt	b77f72b5a0	Enable TensorPipe's SHM transport (#50760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50760 The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine. It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails. ghstack-source-id: 120470890 Test Plan: Exported three times to CircleCI with tests consistently passing Reviewed By: mrshenli Differential Revision: D23814828 fbshipit-source-id: f355cb6515776debad536924de4f4d3fbb05a874	2021-01-27 11:45:09 -08:00
Jerry Zhang	d3ec204ef2	[quant][graphmode][fx] Add functional conv2d + relu (#51079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51079 Added support for functional conv2d + relu, will add conv1d and conv3d in future PR Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_functional_conv Imported from OSS Reviewed By: vkuzo Differential Revision: D26089964 fbshipit-source-id: 8703de17de1469f7076651c386c83fb5922a56eb	2021-01-27 11:20:55 -08:00
Nikita Shulga	00adc7b07f	Fix more JIT tests under Python-3.9 (#51182 ) Summary: Mostly replace `global Foo` with `make_global(Foo)` The only real fix is generating Subscript annotation, which is a follow up from https://github.com/pytorch/pytorch/pull/48676 Fixes https://github.com/pytorch/pytorch/issues/49617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51182 Reviewed By: gmagogsfm Differential Revision: D26095244 Pulled By: malfet fbshipit-source-id: 0e043d9a2cf43fff71dfbb341f708cd7af87c39a	2021-01-27 10:57:03 -08:00
Peter Bell	9b6d463704	Move std and var tests to OpInfos (#50901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50901 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26083289 Pulled By: mruberry fbshipit-source-id: 7e14ff37bba46dd456e0bc0aa9c4e0a632d0734c	2021-01-27 10:50:51 -08:00
Vasiliy Kuznetsov	e9ffad088f	numeric suite: add types to eager (#51168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51168 Adds types to function I/O for numeric suite. This is for readability and static type checking with mypy. Test Plan: ``` mypy torch/quantization/ ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26092454 fbshipit-source-id: d37cf61e4d9604f4bc550b392f55fb59165f7624	2021-01-27 10:40:49 -08:00
Gao, Xiang	16dd5ca8ab	Followup of kron PR (#51045 ) Summary: Followup of https://github.com/pytorch/pytorch/pull/50927 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51045 Reviewed By: mruberry Differential Revision: D26089204 Pulled By: ngimel fbshipit-source-id: 77291dd83fba32d6f80a8540910b112a1d85a892	2021-01-27 10:33:05 -08:00
anjali411	4a2aa0f5f1	index_put_ for complex tensors on CUDA (#51148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51148 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26102025 Pulled By: anjali411 fbshipit-source-id: b1b6fd12fda03c4520a3c3200226edf352496188	2021-01-27 09:11:37 -08:00
Joel Schlosser	0b5303e833	Propagate CreationMeta when chaining views (#51061 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49824 ## Background When creating a view of a view, there was a possibility that the new view would be less restrictive than the previous view, incorrectly sidestepping the error that should be thrown when using in-place operations on the new view. The fix addresses this by propagating `CreationMeta` from the previous view to the new view. Currently, the old view's `creation_meta` is only propagated when the new view's `creation_meta == CreationMeta::DEFAULT`. This ensures that the new view is not less restrictive than the previous view wrt. allowing in-place operations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51061 Test Plan: ``` python test/test_autograd.py TestAutogradDeviceTypeCPU.test_inplace_view_of_multiple_output_view_cpu python test/test_autograd.py TestAutogradDeviceTypeCUDA.test_inplace_view_of_multiple_output_view_cuda python test/test_autograd.py TestAutogradDeviceTypeCPU.test_inplace_multiple_output_view_of_view_cpu python test/test_autograd.py TestAutogradDeviceTypeCUDA.test_inplace_multiple_output_view_of_view_cuda ``` Reviewed By: heitorschueroff Differential Revision: D26076434 Pulled By: jbschlosser fbshipit-source-id: c47f0ddcef9b8449427b671aff9ad08edca70fcd	2021-01-27 09:00:51 -08:00
mattip	5ec2e26310	DOC, BLD: make the python docs build failures print a nicer message (#50356 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50330 - Encapsulate the `make html` call and capture the stdout/stderr with a `tee` command - If the buld fails, print out the `WARNING:` lines of the build log and finish up with a message I tried it out on my branch, but did not write a test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50356 Reviewed By: ezyang Differential Revision: D26101762 Pulled By: brianjo fbshipit-source-id: ba2b704d3244ef5139ca9026c5250537bf45734f	2021-01-27 07:41:00 -08:00
Richard Zou	22ac4f3c59	Add `vectorize` flag to torch.autograd.functional.{jacobian, hessian} (#50915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50915 Fixes #50584 Add a vectorize flag to torch.autograd.functional.jacobian and torch.autograd.functional.hessian (default: False). Under the hood, the vectorize flag uses vmap as the backend to compute the jacobian and hessian, respectively, providing speedups to users. Test Plan: - I updated all of the jacobian and hessian tests to also use vectorized=True - I added some simple sanity check tests that check e.g. jacobian with vectorized=False vs jacobian with vectorized=True. - The mechanism for vectorized=True goes through batched gradient computation. We have separate tests for those (see other PRs in this stack). Reviewed By: heitorschueroff Differential Revision: D26057674 Pulled By: zou3519 fbshipit-source-id: a8ae7ca0d2028ffb478abd1b377f5b49ee39e4a1	2021-01-27 07:32:30 -08:00
anjali411	fd9a85d21b	Doc update for complex numbers (#51129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51129 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26094947 Pulled By: anjali411 fbshipit-source-id: 4e1cdf8915a8c6a86ac3462685cdce881e1bcffa	2021-01-27 07:32:26 -08:00
yanli924	ada916675f	update HistogramObserver to be scriptable (#51081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51081 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51001 fix tests in TestQuantizeJitOps Test Plan: Imported from OSS python test/test_quantization.py Reviewed By: raghuramank100 Differential Revision: D26038759 Pulled By: lyoka fbshipit-source-id: 0977ba7b8b26a9f654f20f5c698a7a20ec078c35	2021-01-27 07:27:03 -08:00
Jeff Daily	0a4bc72890	[ROCm] work around compiler issue for IGammaKernel.cu (#50970 ) Summary: Add const to static variable inside `__host__ __device__` function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50970 Reviewed By: izdeby Differential Revision: D26081478 Pulled By: heitorschueroff fbshipit-source-id: 77cf145f7e0570359aa00aec4c8b82c950815f81	2021-01-27 07:22:53 -08:00
mattip	b60494000b	DOC: udate left navbar links for vision and text (#51103 ) Summary: A tiny PR to update the links in the lefthand navbar under Libraries. The canonical link for vision and text is `https://pytorch.org/vision/stable` and `https://pytorch.org/text/stable` respectively. The link without the `/stable` works via a redirect, this is cleaner. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51103 Reviewed By: izdeby Differential Revision: D26079760 Pulled By: heitorschueroff fbshipit-source-id: df1fa64d7895831f4e6242445bae02c1faa5e4dc	2021-01-27 07:19:00 -08:00
pbialecki	7b85adf20f	Add back pycuda.autoinit to test_pt_onnx_trt (#51106 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51105 by adding back the `import pycuda.autoinit`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51106 Reviewed By: mingzhe09088 Differential Revision: D26086808 Pulled By: heitorschueroff fbshipit-source-id: 88d98796c87a44cedaa1f6666e9f71a424293641	2021-01-27 07:10:11 -08:00
Scott Wolchok	1935880860	[PyTorch] Remove unnecessary dispatcher.h include in torch/library.h (#51162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51162 It's unused. ghstack-source-id: 120427120 Test Plan: CI Reviewed By: bhosmer Differential Revision: D25859010 fbshipit-source-id: 7bb21312843debaedaa6a969727c171b2bb0e6b2	2021-01-26 22:19:32 -08:00
David Clissold	42929e573a	add missing return statement to inlined vec_signed (#51116 ) Summary: Fixes #{issue number} This is not really a new issue, just a proposed minor fix to a recent previous issue (now closed) https://github.com/pytorch/pytorch/issues/50640 which was a fix for https://github.com/pytorch/pytorch/issues/50439. That fix added inlining for vec_signed (and others) but in one case the return was accidentally omitted. This results in a build error: ``` from �[01m�[K../aten/src/ATen/cpu/vec256/vec256.h:19�[m�[K, from �[01m�[Katen/src/ATen/native/cpu/FillKernel.cpp.VSX.cpp:3�[m�[K: �[01m�[K../aten/src/ATen/cpu/vec256/vsx/vsx_helpers.h:�[m�[K In function ‘�[01m�[Kvint32 vec_signed(const vfloat32&)�[m�[K’: �[01m�[K../aten/src/ATen/cpu/vec256/vsx/vsx_helpers.h:33:1:�[m�[K �[01;31m�[Kerror: �[m�[Kno return statement in function returning non-void [�[01;31m�[K-Werror=return-type�[m�[K] ``` I've confirmed that the error disappears after this one-line fix. (Note: There is another issue encountered later in the build unrelated to this particular fix, as I noted in a separate comment in the original issue. I'm trying to make some sense of that one, but in any event it would be a subject for another issue/PR). Pull Request resolved: https://github.com/pytorch/pytorch/pull/51116 Reviewed By: heitorschueroff Differential Revision: D26078213 Pulled By: malfet fbshipit-source-id: 59b2ee19138fa1b8d8ec1d35ca4a5ef0a67bc123	2021-01-26 20:16:18 -08:00
Xiang Gao	ba316a7612	Fix TF32 failures in test_linalg.py (#50453 ) Summary: On Ampere GPU, matmuls are computed by default with TF32 when the dtype is `torch.float`: https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices, which results in reduced precision in results. However, linear algebra usually need higher precision, therefore lots of tests in `test_linalg.py` are failing on Ampere GPU because of precision issue. To fix this issue: - Most linear algebra methods, except for matmuls, should add `NoTF32Guard` - Expected results in unit tests should compute matmuls using numpy instead of pytorch cuda. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50453 Reviewed By: glaringlee Differential Revision: D26023005 Pulled By: ngimel fbshipit-source-id: f0ea533494fee322b07925565b57e3b0db2570c5	2021-01-26 19:51:20 -08:00
Anjali Chourdia	b6eaca9f1f	Add type annotation logic for complex numbers (#50884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50884 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D26086963 fbshipit-source-id: f103f7f529d63d701c4f17862e30eafbab7d0c68	2021-01-26 19:39:35 -08:00
neerajprad	e2041ce354	Fix docstring to clarify logits usage for multiclass case (#51053 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50378. Additionally, this has some minor fixes: - [x] Fix mean for half-cauchy to return `inf` instead of `nan`. - [x] Fix constraints/support for the relaxed categorical distribution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51053 Reviewed By: heitorschueroff Differential Revision: D26077966 Pulled By: neerajprad fbshipit-source-id: ca0213baa9bbdbc661aebbb901ab5e7fded38a5f	2021-01-26 17:01:39 -08:00
Bram Wasti	221d7d99e1	[torch vitals] move into namespace and fix windows tests Summary: as in title resolves D25791248 (`069602e028`) Test Plan: buck test //caffe2/aten:vitals Reviewed By: EscapeZero, malfet Differential Revision: D26090442 fbshipit-source-id: 07937f246ec0a6eb338d21208ada61758237ae42	2021-01-26 16:50:45 -08:00
Georgia Hong	3cc14a0dff	[p2c2] Add support for Int8FCPackWeight in model transformation Summary: In order to enable FC int8 quantization in P2C2, we are trying to run the caffe2 op Int8FCPackWeight in the model transformation pipeline. The net is being generated from the python side, and passed back into C++ and run here: https://fburl.com/diffusion/3zt1mp03, with these dependencies included: https://fburl.com/diffusion/rdjtdtcf However, when the net is executed, it errors out with: ``` Cannot create operator of type 'Int8FCPackWeight' on the device 'CPU' ``` This diff attempts to fix this issue. Test Plan: To reproduce, just this test without ``` buck test //aiplatform/modelstore/transformation/tests:pyper_to_caffe2_dispatcher_test ``` Reviewed By: jspark1105 Differential Revision: D25965167 fbshipit-source-id: a7414669abb8731177c14e8792de58f400970732	2021-01-26 16:35:23 -08:00
mattip	345844d9d8	test, fix deepcopy of tensor with grad (#50663 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/3307 Previously, `self.grad` was not ~cloned~ deepcopied to the returned tensor in `deepcopy`. Added a test and an implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50663 Reviewed By: heitorschueroff Differential Revision: D26074811 Pulled By: albanD fbshipit-source-id: 536dad36415f1d03714b4ce57453f406ad802b8c	2021-01-26 16:19:53 -08:00
Nikita Shulga	97ea95ddd7	Delete tabs from becnh_approx.cpp (#51157 ) Summary: Introduced by D25981260 (`f08464f31d`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51157 Reviewed By: bwasti Differential Revision: D26090008 Pulled By: malfet fbshipit-source-id: b63f1bb1683c7261902de7eaab24a05a5159ce7e	2021-01-26 15:53:47 -08:00
Nikita Shulga	57484103be	Revert D25675618: Move AcceleratedGraphModule out of graph_manipulation. Test Plan: revert-hammer Differential Revision: D25675618 (`c8a24ebe54`) Original commit changeset: 55636bb2d3d6 fbshipit-source-id: 7b196f7c32830061eca9c89bbcb346cdd66a211e	2021-01-26 15:31:18 -08:00
mattip	24eab1d80d	BLD: create a LICENSE_BUNDLED.txt file from third_party licenses (#50745 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50695. Rather than maintain a LICENSE_BUNDLED.txt by hand, this build it out of the subrepos. I ~copied and adapted the sdist handling from Numpy~ added a separate file, so the LICENSE.txt file of the repo remains in pristine condition and the GitHub website still recognizes it. If we modify the file, the website will no longer recognize the license. This is not enough, since the license in the ~wheel~ wheel and sdist is not modified. Numpy has a [separate step](https://github.com/MacPython/numpy-wheels/blob/master/patch_code.sh) when preparing the wheel to concatenate the licenses. I am not sure where/if the [conda-forge numpy-feedstock](https://github.com/conda-forge/numpy-feedstock/) also fixes up the license. ~Should~ I ~commit~ commited the artifact to the repo and ~add~ added a test that reproducing the file is consistent. Edit: now the file is part of the repo. Edit: rework the mention of sdist. After this is merged another PR is needed to make the sdist and wheel ship the proper merged license. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50745 Reviewed By: seemethere, heitorschueroff Differential Revision: D26074974 Pulled By: walterddr fbshipit-source-id: bacd5d6870e9dbb419a31a3e3d2fdde286ff2c94	2021-01-26 14:55:07 -08:00
Bert Maher	c4029444d1	[nnc] Per-operator benchmarks (#51093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51093 Operator level benchmarks comparing eager-mode PyTorch to NNC-generated fused kernels. We wouldn't normally see these in isolation, but it points out where NNC is falling short (or doing well). I threw in a composed hardswish for fun, because it's my favorite activation function. Notably, it exposes a bug in our build process that's preventing vectorization from using `sleef`, so we're using scalar calls to libm with predictably lousy performance. Fix incoming. This benchmark is similar to the pure NNC approach in `microbenchmarks.py`, but will include the overhead of dispatching the fused kernel through TorchScript. ghstack-source-id: 120403675 Test Plan: ``` op eager nnc speedup hardswish 0.187 0.051 3.70 hardswish 0.052 0.052 1.00 sigmoid 0.148 1.177 0.13 reciprocal 0.049 0.050 0.98 neg 0.038 0.037 1.02 relu 0.037 0.036 1.03 isnan 0.119 0.020 5.86 log 0.082 1.330 0.06 log10 0.148 1.848 0.08 log1p 0.204 1.413 0.14 log2 0.285 1.167 0.24 exp 0.063 1.123 0.06 expm1 0.402 1.417 0.28 erf 0.167 0.852 0.20 erfc 0.181 1.098 0.16 cos 0.124 0.793 0.16 sin 0.126 0.838 0.15 tan 0.285 1.777 0.16 acos 0.144 1.358 0.11 asin 0.126 1.193 0.11 cosh 0.384 1.761 0.22 sinh 0.390 2.279 0.17 atan 0.240 1.564 0.15 tanh 0.320 2.259 0.14 sqrt 0.043 0.069 0.63 rsqrt 0.118 0.117 1.01 abs 0.038 0.037 1.03 ceil 0.038 0.038 1.01 floor 0.039 0.039 1.00 round 0.039 0.292 0.13 trunc 0.040 0.036 1.12 lgamma 2.045 2.721 0.75 ``` Reviewed By: zheng-xq Differential Revision: D26069791 fbshipit-source-id: 236e7287ba1b3f67fdcb938949a92bbbdfa13dba	2021-01-26 14:10:08 -08:00
Bram Wasti	f08464f31d	[nnc] Add benchmarks Summary: Adding a set of benchmarks for key operators Test Plan: buck build mode/opt -c 'fbcode.caffe2_gpu_type=none' caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 numactl -C 3 ./buck-out/gen/caffe2/benchmarks/cpp/tensorexpr/tensorexpr_bench Reviewed By: ZolotukhinM Differential Revision: D25981260 fbshipit-source-id: 17681fc1527f43ccf9bcc80704415653a627b396	2021-01-26 13:51:33 -08:00
Nikita Shulga	6f3aa58d80	Fix autograd thread crash with python-3.9 (#50998 ) Summary: Update pybind repo to include `gil_scoped_acquire::disarm()` methods In python_engine allocate scoped_acquire as unique_ptr and leak it if engine is finalizing for Python-3.9+ Fixes https://github.com/pytorch/pytorch/issues/50014 and https://github.com/pytorch/pytorch/issues/50893 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50998 Reviewed By: ezyang Differential Revision: D26038314 Pulled By: malfet fbshipit-source-id: 035411e22825e8fdcf1348fed36da0bc33e16f60	2021-01-26 13:29:47 -08:00
Bram Wasti	069602e028	[torch vitals] Initial implementation (#51047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51047 If the environment variable `TORCH_VITAL` is set to a non-zero length string, the vitals a dumped at program end. The API is very similar to google's logging Test Plan: buck test //caffe2/aten:vitals Reviewed By: bitfort Differential Revision: D25791248 fbshipit-source-id: 0b40da7d22c31d2c4b2094f0dcb1229a35338ac2	2021-01-26 13:09:58 -08:00
Scott Wolchok	83bfab2fb6	toTensor cleanup on sparsenn & static runtime ops (#51113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51113 toTensor() on an lvalue IValue returns a reference; no need to copy. ghstack-source-id: 120317233 Test Plan: fitsships Compared `perf stat` results before/after (was on top of a diff stack so don't take baseline as where master is) Before: ``` 74,178.77 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 17,125 context-switches # 0.231 K/sec ( +- 3.41% ) 3 cpu-migrations # 0.000 K/sec 109,535 page-faults # 0.001 M/sec ( +- 1.04% ) 146,803,364,372 cycles # 1.979 GHz ( +- 0.30% ) (50.03%) 277,726,600,254 instructions # 1.89 insn per cycle ( +- 0.02% ) (50.03%) 43,299,659,815 branches # 583.720 M/sec ( +- 0.03% ) (50.03%) 130,504,094 branch-misses # 0.30% of all branches ( +- 1.14% ) (50.03%) ``` After: ``` 72,695.01 msec task-clock # 0.999 CPUs utilized ( +- 1.18% ) 15,994 context-switches # 0.220 K/sec ( +- 5.21% ) 3 cpu-migrations # 0.000 K/sec 107,743 page-faults # 0.001 M/sec ( +- 1.55% ) 145,647,684,269 cycles # 2.004 GHz ( +- 0.30% ) (50.05%) 277,341,084,993 instructions # 1.90 insn per cycle ( +- 0.02% ) (50.04%) 43,200,717,263 branches # 594.273 M/sec ( +- 0.02% ) (50.05%) 143,873,086 branch-misses # 0.33% of all branches ( +- 0.59% ) (50.05%) ``` Looks like an 0.7% cycles win (barely outside the noise) and an 0.1% instructions win. Reviewed By: hlu1 Differential Revision: D26051766 fbshipit-source-id: 05f8d71d8120d79f7cd80aca747dfc537bf7d382	2021-01-26 13:06:46 -08:00
Nikita Shulga	a949d7b1c8	Workaround Python3.9 limitations in test_jit_py3 (#51088 ) Summary: In Python-3.9 and above `inspect.getsource` of a local class does not work if it was marked as default, see https://bugs.python.org/issue42666 https://github.com/pytorch/pytorch/issues/49617 Workaround by defining `make_global` function that programmatically accomplishes the same Partially addresses issue raised in https://github.com/pytorch/pytorch/issues/49617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51088 Reviewed By: gmagogsfm Differential Revision: D26069189 Pulled By: malfet fbshipit-source-id: 7cf14b88ae5d2b95d2b0fd852717a9202b86356e	2021-01-26 12:49:35 -08:00
Garret Catron	c8a24ebe54	Move AcceleratedGraphModule out of graph_manipulation. Test Plan: buck test //caffe2/test:test_fx_experimental buck test //glow/fb/fx_nnpi_importer:test_importer Reviewed By: jfix71 Differential Revision: D25675618 fbshipit-source-id: 55636bb2d3d6102b400f2044118a450906954083	2021-01-26 12:39:49 -08:00
Iurii Zdebskyi	81ae8edf16	Revert D26018916: [pytorch][PR] Automated submodule update: tensorpipe Test Plan: revert-hammer Differential Revision: D26018916 (`5f297cc665`) Original commit changeset: dc8aaa98d4e0 fbshipit-source-id: cd81a7950c7141e0711faabf03292098a8cf14d3	2021-01-26 11:45:48 -08:00
Jerry Zhang	afa79a4df5	[quant][graphmode][fx] cleanup linear module test case (#50976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50976 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D26032531 fbshipit-source-id: 9725bab8f70ac79652e7bf9f94376917438d60e0	2021-01-26 11:14:22 -08:00
Xiang Gao	b822aba8ec	Enable BFloat support for gemms on arch other than ampere (#50442 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/50442 Reviewed By: bdhirsh Differential Revision: D26044981 Pulled By: mruberry fbshipit-source-id: 65c42f2c1de8d24e4852a1b5bd8f4b1735b2230e	2021-01-26 11:07:07 -08:00
Wanchao Liang	3562ca2da2	[dist_optim] add warning to distributed optimizer (#50630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50630 Add a warning log to distributed optimizer, to warn user the optimizer is created without TorchScript support. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D25932777 Pulled By: wanchaol fbshipit-source-id: 8db3b98bdd27fc04c5a3b8d910b028c0c37f138d	2021-01-26 10:30:55 -08:00
Sam Estep	6dda0363bb	[reland] Refactor mypy configs list into editor-friendly wrapper (#50826 ) Summary: Closes https://github.com/pytorch/pytorch/issues/50513 by resolving all four checkboxes. If this PR is merged, I will also modify one or both of the following wiki pages to add instructions on how to use this `mypy` wrapper for VS Code editor integration: - [Guide for adding type annotations to PyTorch](https://github.com/pytorch/pytorch/wiki/Guide-for-adding-type-annotations-to-PyTorch) - [Lint as you type](https://github.com/pytorch/pytorch/wiki/Lint-as-you-type) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50826 Test Plan: Unit tests for globbing function: ``` python test/test_testing.py TestMypyWrapper -v ``` Manual checks: - Uninstall `mypy` and run `python test/test_type_hints.py` to verify that it still works when `mypy` is absent. - Reinstall `mypy` and run `python test/test_type_hints.py` to verify that this didn't break the `TestTypeHints` suite. - Run `python test/test_type_hints.py` again (should finish quickly) to verify that this didn't break `mypy` caching. - Run `torch/testing/_internal/mypy_wrapper.py` on a few Python files in this repo to verify that it doesn't give any additional warnings when the `TestTypeHints` suite passes. Some examples (compare with the behavior of just running `mypy` on these files): ```sh torch/testing/_internal/mypy_wrapper.py $PWD/README.md torch/testing/_internal/mypy_wrapper.py $PWD/tools/fast_nvcc/fast_nvcc.py torch/testing/_internal/mypy_wrapper.py $PWD/test/test_type_hints.py torch/testing/_internal/mypy_wrapper.py $PWD/torch/random.py torch/testing/_internal/mypy_wrapper.py $PWD/torch/testing/_internal/mypy_wrapper.py ``` - Remove type hints from `torch.testing._internal.mypy_wrapper` and verify that running `mypy_wrapper.py` on that file gives type errors. - Remove the path to `mypy_wrapper.py` from the `files` setting in `mypy-strict.ini` and verify that running it again on itself no longer gives type errors. - Add `test/test_type_hints.py` to the `files` setting in `mypy-strict.ini` and verify that running the `mypy` wrapper on it again now gives type errors. - Change a return type in `torch/random.py` and verify that running the `mypy` wrapper on it again now gives type errors. - Add the suggested JSON from the docstring of `torch.testing._internal.mypy_wrapper.main` to your `.vscode/settings.json` and verify that VS Code gives the same results (inline, while editing any Python file in the repo) as running the `mypy` wrapper on the command line, in all the above cases. Reviewed By: walterddr Differential Revision: D26049052 Pulled By: samestep fbshipit-source-id: 0b35162fc78976452b5ea20d4ab63937b3c7695d	2021-01-26 09:04:14 -08:00
Nikita Shulga	31194750f2	[jit] Fix ResolutionCallback definition (#51089 ) Summary: `ResolutionCallback` returns `py::object` (i.e. `Any`) rather than `py::function` (i.e. `Callable`) Discovered while debugging test failures after updating pybind11 This also makes resolution code slightly faster, as it eliminates casts from object to function and back for every `py::object obj = rcb_(name);` statement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51089 Reviewed By: jamesr66a Differential Revision: D26069295 Pulled By: malfet fbshipit-source-id: 6876caf9b4653c8dc8e568aefb6778895decea05	2021-01-26 08:47:38 -08:00
Xiang Gao	5834b3b204	Fix test_jit_cuda_archflags on machine with more than one arch (#50405 ) Summary: This fixes the following flaky test on machine with gpus of different arch: ``` _________________________________________________________________________________________________________________ TestCppExtensionJIT.test_jit_cuda_archflags __________________________________________________________________________________________________________________ self = <test_cpp_extensions_jit.TestCppExtensionJIT testMethod=test_jit_cuda_archflags> unittest.skipIf(not TEST_CUDA, "CUDA not found") unittest.skipIf(TEST_ROCM, "disabled on rocm") def test_jit_cuda_archflags(self): # Test a number of combinations: # - the default for the machine we're testing on # - Separators, can be ';' (most common) or ' ' # - Architecture names # - With/without '+PTX' capability = torch.cuda.get_device_capability() # expected values is length-2 tuple: (list of ELF, list of PTX) # note: there should not be more than one PTX value archflags = { '': (['{}{}'.format(capability[0], capability[1])], None), "Maxwell+Tegra;6.1": (['53', '61'], None), "Pascal 3.5": (['35', '60', '61'], None), "Volta": (['70'], ['70']), } if int(torch.version.cuda.split('.')[0]) >= 10: # CUDA 9 only supports compute capability <= 7.2 archflags["7.5+PTX"] = (['75'], ['75']) archflags["5.0;6.0+PTX;7.0;7.5"] = (['50', '60', '70', '75'], ['60']) for flags, expected in archflags.items(): > self._run_jit_cuda_archflags(flags, expected) test_cpp_extensions_jit.py:198: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test_cpp_extensions_jit.py:158: in _run_jit_cuda_archflags _check_cuobjdump_output(expected[0]) test_cpp_extensions_jit.py:134: in _check_cuobjdump_output self.assertEqual(actual_arches, expected_arches, ../../.local/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1211: in assertEqual super().assertEqual(len(x), len(y), msg=self._get_assert_msg(msg, debug_msg=debug_msg)) E AssertionError: 2 != 1 : Attempted to compare the lengths of [iterable] types: Expected: 2; Actual: 1. E Flags: , Actual: ['sm_75', 'sm_86'], Expected: ['sm_86'] E Stderr: E Output: ELF file 1: cudaext_archflags.1.sm_75.cubin E ELF file 2: cudaext_archflags.2.sm_86.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50405 Reviewed By: albanD Differential Revision: D25920200 Pulled By: mrshenli fbshipit-source-id: 1042a984142108f954a283407334d39e3ec328ce	2021-01-26 08:38:54 -08:00
Facebook Community Bot	5f297cc665	Automated submodule update: tensorpipe (#50946 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `f463e0ebfc` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50946 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D26018916 fbshipit-source-id: dc8aaa98d4e002e972d5c6783f2351c29f7db239	2021-01-26 08:21:30 -08:00
Arindam Roy	95ae9a20e4	Enable ROCM Skipped tests in test_ops.py (#50500 ) Summary: Removed skipCUDAIfRocm to re-enable tests for ROCM platform. Initially, Only 4799 cases were being run. Out of those, 882 cases were being skipped. After removing skipCUDAIfRocm from two places in test_ops.py, now more than 8000 cases are being executed, out of which only 282 cases are bing skipped, which are FFT related tests. Signed-off-by: Arindam Roy <rarindam@gmail.com> Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/50500 Reviewed By: albanD Differential Revision: D25920303 Pulled By: mrshenli fbshipit-source-id: b2d17b7e2d1de4f9fdd6f1660fb4cad5841edaa0	2021-01-26 08:09:18 -08:00
Emilio Castillo	233e4ebdb6	Implement autograd functions for c10d communication operations (#40762 ) Summary: Closes https://github.com/pytorch/pytorch/issues/40702, Fixes https://github.com/pytorch/pytorch/issues/40690 Currently wip. But I would appreciate some feedback. Functions should be double-differentiable. Contrary to `b35cdc5200/torch/nn/parallel/_functions.py` This PR generates list of tensors instead of aggregating the received data in a single tensor. Is this behavior correct? Thanks! Pull Request resolved: https://github.com/pytorch/pytorch/pull/40762 Reviewed By: glaringlee Differential Revision: D24758889 Pulled By: mrshenli fbshipit-source-id: 79285fb4b791cae3d248f34e2aadb11c9ab10cce	2021-01-26 07:52:51 -08:00
Richard Zou	83315965ab	Turn on batched grad testing for CriterionTest (#50744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50744 This PR adds a `check_batched_grad=True` option to CriterionTest and turns it on by default for all CriterionTest-generated tests Test Plan: - run tests Reviewed By: ejguan Differential Revision: D25997676 Pulled By: zou3519 fbshipit-source-id: cc730731e6fae2bddc01bc93800fd0e3de28b32d	2021-01-26 07:37:15 -08:00
Mike Ruberry	e843974a6e	Revert D25850783: Add torch::deploy, an embedded torch-python interpreter Test Plan: revert-hammer Differential Revision: D25850783 (`3192f9e4fe`) Original commit changeset: a4656377caff fbshipit-source-id: 1c7133627da28fb12848da7a9a46de6d3b2b67c6	2021-01-26 02:07:44 -08:00
Ailing Zhang	a51b9a823c	Improve docs around Math/DefaultBackend & add PythonDispatcher class. (#50854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50854 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D26008542 Pulled By: ailzhang fbshipit-source-id: e9c0aa97ac2537ff612f5faf348fcb613da09479	2021-01-25 23:10:36 -08:00
Yi Wang	9f19843d19	[Gradient Compression] Typo fixes in PowerSGD (#50974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50974 Typo fixes. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 120257221 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D26031679 fbshipit-source-id: 9d049b50419a3e40e53f7f1275a441e31b87717b	2021-01-25 22:55:54 -08:00
Yi Wang	ffaae32d60	[Gradient Compression] Allow PowerSGD to run vallina allreduce for the first K iterations (#50973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50973 This can extend the original PowerSGD method to a hybrid approach: vanilla allreduce + PowerSGD. This can help further improve the accuracy, at the cost of a lower speedup. Also add more comments on the fields in `PowerSGDState`. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 120257202 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D26031478 fbshipit-source-id: d72e70bb28ba018f53223c2a4345306980b3084e	2021-01-25 22:38:39 -08:00
Antonio Cuni	880f007480	Add torch.eig complex forward (CPU, CUDA) (#49168 ) Summary: Related to issue https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49168 Reviewed By: mrshenli Differential Revision: D25954027 Pulled By: mruberry fbshipit-source-id: e429f9587efff5e638bfd0e4de864c06f41c63b1	2021-01-25 21:27:08 -08:00
Horace He	502ca0105d	Added cuda bindings for NNC (#51046 ) Summary: See above. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51046 Reviewed By: ZolotukhinM Differential Revision: D26053419 Pulled By: Chillee fbshipit-source-id: 9cc2dc434239a1ad77d30a1e5c0a9592be4944dc	2021-01-25 20:41:40 -08:00
Kimish Patel	6ef66213ee	[PT QNNPACK] Temporarily disable input pointer caching (#51051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51051 Disable input pointer caching on ios. We are seeing some issues with this on some ios devices. Test Plan: FB: Test this in of IG with BT effect. Reviewed By: IvanKobzarev, AshkanAliabadi Differential Revision: D25984429 fbshipit-source-id: f6ceef606994b22de9cdd9752115b3481cd7bd96	2021-01-25 20:34:06 -08:00
Sam Estep	5adbace8e6	Abort node in fast_nvcc if ancestor fails (#51043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51043 This PR makes `fast_nvcc` stop at failing commands, rather than continuing on to run commands that would otherwise run after those commands. It is still possible for `fast_nvcc` to run more commands than `nvcc` would run if there's no dependency between them, but this should still help to reduce noise from failing `fast_nvcc` runs. Test Plan: Unfortunately the test suite for this script is FB-internal. It would probably be a good idea to move it into the PyTorch GitHub repo, but I'm not entirely sure how to do so, since I don't believe we currently have a good place to put tests for things in `tools`. Reviewed By: malfet Differential Revision: D26007788 fbshipit-source-id: 8fe1e7d020a29d32d08fe55fb59229af5cdfbcaa	2021-01-25 18:12:51 -08:00
Fritz Obermeyer	a347c747df	Fix TransformedDistribution shaping logic (#50581 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50496 Fixes https://github.com/pytorch/pytorch/issues/34859 Fixes https://github.com/pytorch/pytorch/issues/21596 This fixes many bugs involving `TransformedDistribution` and `ComposeTransform` when the component transforms changed their event shapes. Part of the fix is to introduce an `IndependentTransform` analogous to `distributions.Independent` and `constraints.independent`, and to introduce methods `Transform.forward_shape()` and `.inverse_shape()`. I have followed fehiepsi's suggestion and replaced `.input_event_dim` -> `.domain.event_dim` and `.output_event_dim` -> `.codomain.event_dim`. This allows us to deprecate `.event_dim` as an attribute. ## Summary of changes - Fixes `TransformDistribution` and `ComposeTransform` shape errors. - Fixes a behavior bug in `LogisticNormal`. - Fixes `kl_divergence(TransformedDistribution, TransformedDistribution)` - Adds methods `Transform.forward_shape()`, `.inverse_shape()` which are required for correct shape computations in `TransformedDistribution` and `ComposeTransform`. - Adds an `IndependentTransform`. - Adds a `ReshapeTransform` which is invaluable in testing shape logic in `ComposeTransform` and `TransformedDistribution` and which will be used by stefanwebb flowtorch. - Fixes incorrect default values in `constraints.dependent.event_dim`. - Documents the `.event_dim` and `.is_discrete` attributes. ## Changes planned for follow-up PRs - Memoize `constraints.dependent_property` as we do with `lazy_property`, since we now consult those properties much more often. ## Tested - [x] added a test for `Dist.support` vs `Dist(**params).support` to ensure static and dynamic attributes agree. - [x] refactoring is covered by existing tests - [x] add test cases for `ReshapedTransform` - [x] add a test for `TransformedDistribution` on a wide grid of input shapes - [x] added a regression test for https://github.com/pytorch/pytorch/issues/34859 cc fehiepsi feynmanliang stefanwebb Pull Request resolved: https://github.com/pytorch/pytorch/pull/50581 Reviewed By: ezyang, glaringlee, jpchen Differential Revision: D26024247 Pulled By: neerajprad fbshipit-source-id: f0b9a296f780ff49659b132409e11a29985dde9b	2021-01-25 16:34:12 -08:00
Yanli Zhao	250c71121b	Create a DDPLoggingData and expose it to python interface (#50622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50622 1. Define a DDPLoggingData struct that is the placeholder for all the ddp related logging fields 2. Put the DDPLoggingData struct in the C10 directory so that it can be easily imported by c10 and torch files 3. Expose get_ddp_logging_data() method in python so that users can get the logging data and dump in their applications 4. Unit test tested the logging data can be set and got as expected 5. Follow up will add more logging fields such as perf stats, internal states, env variables and etc ghstack-source-id: 120275870 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D25930527 fbshipit-source-id: 290c200161019c58e28eed9a5a2a7a8153113f99	2021-01-25 15:23:07 -08:00
Will Constable	3192f9e4fe	Add torch::deploy, an embedded torch-python interpreter (#50458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50458 libinterpreter.so contains a frozen python distribution including torch-python bindings. Freezing refers to serializing bytecode of python standard library modules as well as the torch python library and embedding them in the library code. This library can then be dlopened multiple times in one process context, each interpreter having its own python state and GIL. In addition, each python environment is sealed off from the filesystem and can only import the frozen modules included in the distribution. This change relies on newly added frozenpython, a cpython 3.8.6 fork built for this purpose. Frozenpython provides libpython3.8-frozen.a which contains frozen bytecode and object code for the python standard library. Building on top of frozen python, the frozen torch-python bindings are added in this diff, providing each embedded interpreter with a copy of the torch bindings. Each interpreter is intended to share one instance of libtorch and the underlying tensor libraries. Known issues - Autograd is not expected to work with the embedded interpreter currently, as it manages its own python interactions and needs to coordinate with the duplicated python states in each of the interpreters. - Distributed and cuda stuff is disabled in libinterpreter.so build, needs to be revisited - __file__ is not supported in the context of embedded python since there are no files for the underlying library modules. using __file__ - __version__ is not properly supported in the embedded torch-python, just a workaround for now Test Plan: tested locally and on CI with cmake and buck builds running torch::deploy interpreter_test Reviewed By: ailzhang Differential Revision: D25850783 fbshipit-source-id: a4656377caff25b73913daae7ae2f88bcab8fd88	2021-01-25 15:14:28 -08:00
Ivan Yashchuk	ddf26816d3	Make torch.svd return V, not V.conj() for complex inputs (#51012 ) Summary: BC-breaking note: torch.svd() added support for complex inputs in PyTorch 1.7, but was not documented as doing so. The complex "V" tensor returned was actually the complex conjugate of what's expected. This PR fixes the discrepancy. This will silently break all users of torch.svd() with complex inputs. Original PR Summary: This PR resolves https://github.com/pytorch/pytorch/issues/45821. The problem was that when introducing the support of complex inputs for `torch.svd` it was overlooked that LAPACK/MAGMA returns the conjugate transpose of V matrix, not just the transpose of V. So `torch.svd` was silently returning U, S, V.conj() instead of U, S, V. Behavior of `torch.linalg.pinv`, `torch.pinverse` and `torch.linalg.svd` (they depend on `torch.svd`) is not changed in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51012 Reviewed By: bdhirsh Differential Revision: D26047593 Pulled By: albanD fbshipit-source-id: d1e08dbc3aab9ce1150a95806ef3b5da98b5d3ca	2021-01-25 14:06:41 -08:00
Vasiliy Kuznetsov	f8eefbdf7a	fake_quant: fix device affinity and buffer resizing for state_dict (#50868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50868 Ensures that `FakeQuantize` respects device affinity when loading from state_dict, and knows how to resize scale and zero_point values (which is necessary for FQ classes wrapping per channel observers). This is same as https://github.com/pytorch/pytorch/pull/44537, but for `FakeQuantize`. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25991570 fbshipit-source-id: 1193a6cd350bddabd625aafa0682e2e101223bb1	2021-01-25 13:50:28 -08:00
Pritam Damania	68c218547c	Add documentation page for pipeline parallelism. (#50791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50791 Add a dedicated pipeline parallelism doc page explaining the APIs and the overall value of the module. ghstack-source-id: 120257168 Test Plan: 1) View locally 2) waitforbuildbot Reviewed By: rohan-varma Differential Revision: D25967981 fbshipit-source-id: b607b788703173a5fa4e3526471140506171632b	2021-01-25 13:47:13 -08:00
Heitor Schueroff	a7cf04ec40	Workaround for MAGMA accessing illegal memory in batched cholesky (#50957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50957 MAGMA has an off-by-one error in their batched cholesky implementation which is causing illegal memory access for certain inputs. The workaround implemented in this PR is to pad the input to MAGMA with 1 extra element. Benchmark Ran the script below for both before and after my PR and got similar results. Script ``` import torch from torch.utils import benchmark DTYPE = torch.float32 BATCHSIZE = 512 * 512 MATRIXSIZE = 16 a = torch.eye(MATRIXSIZE, device='cuda', dtype=DTYPE) t0 = benchmark.Timer( stmt='torch.cholesky(a)', globals={'a': a}, label='Single' ) t1 = benchmark.Timer( stmt='torch.cholesky(a)', globals={'a': a.expand(BATCHSIZE, -1, -1)}, label='Batched' ) print(t0.timeit(100)) print(t1.timeit(100)) ``` Results before ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400> Single 2.08 ms 1 measurement, 100 runs , 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400> Batched 7.68 ms 1 measurement, 100 runs , 1 thread ``` Results after ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400> Single 2.10 ms 1 measurement, 100 runs , 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400> Batched 7.56 ms 1 measurement, 100 runs , 1 thread ``` Fixes https://github.com/pytorch/pytorch/issues/41394, https://github.com/pytorch/pytorch/issues/26996, https://github.com/pytorch/pytorch/issues/48996 See also https://github.com/pytorch/pytorch/issues/42666, https://github.com/pytorch/pytorch/pull/26789 TODO --- - [x] Benchmark to check for perf regressions Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D26050978 Pulled By: heitorschueroff fbshipit-source-id: 7a5ba7e34c9d74b58568b2a0c631cc6d7ba63f86	2021-01-25 13:39:24 -08:00
Guilherme Leobas	9dfbfe9fca	Add type annotations to torch.overrides (#50824 ) Summary: This is a follow up PR of https://github.com/pytorch/pytorch/issues/48493. Fixes https://github.com/pytorch/pytorch/issues/48492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50824 Reviewed By: bdhirsh Differential Revision: D26050736 Pulled By: ezyang fbshipit-source-id: 049605fd271cff28c8b6e300c163e9df3b3ea23b	2021-01-25 13:20:09 -08:00
Xiang Gao	75cba9d0d1	More about cudnn refactor (#50827 ) Summary: - Resolves ngimel's review comments in https://github.com/pytorch/pytorch/pull/49109 - Move `ConvolutionArgs` from `ConvShared.h` to `Conv_v7.cpp`, because cuDNN v8 uses different descriptors therefore will not share the same `ConvolutionArgs`. - Refactor the `ConvolutionParams` (the hash key for benchmark): - Remove `input_stride` - Add `input_dim` - Add `memory_format` - Make `repro_from_args` to take `ConvolutionParams` instead of `ConvolutionArgs` as arguments so that it can be shared for v7 and v8 - Rename some `layout` to `memory_format`. `layout` should be sparse/strided and `memory_format` should be contiguous/channels_last. They are different things. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50827 Reviewed By: bdhirsh Differential Revision: D26048274 Pulled By: ezyang fbshipit-source-id: f71aa02d90ffa581c17ab05b171759904b311517	2021-01-25 12:58:25 -08:00
Jerry Zhang	28869d5a80	[quant][graphmode][fx] Add support for quantizing functional linear + {functional relu/module relu} (#50975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50975 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D26032532 fbshipit-source-id: a084fb4fd711ad52b2da1c6378cbcc2b352976c6	2021-01-25 12:49:58 -08:00
Niklas Schmitz	95a0a1a18f	Update docstring on return type of `jvp` and `vjp` (#51035 ) Summary: Updates the docstrings, that `jvp` and `vjp` both return the primal `func_output` first as part of the return tuple, in line with the docstrings of [hvp](`c620572a34/torch/autograd/functional.py (L671)`) and [vhp](`c620572a34/torch/autograd/functional.py (L583)`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/51035 Reviewed By: bdhirsh Differential Revision: D26047693 Pulled By: albanD fbshipit-source-id: 5f2957a858826b4c1884590b6be7a8bed0791efd	2021-01-25 12:40:30 -08:00
Arindam Roy	09b896261c	Skip test_lc_1d for ROCM (#50964 ) Summary: The test is flaky on ROCM when deadline is set to 1 second. This is affecting builds as it is failing randomly. Disabling for now. Signed-off-by: Arindam Roy <rarindam@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/50964 Reviewed By: houseroad Differential Revision: D26049370 Pulled By: BIT-silence fbshipit-source-id: 22337590a8896ad75f1281e56fbbeae897f5c3b2	2021-01-25 11:43:37 -08:00
Thomas Viehmann	ac0a3cc5fd	Merge CompilationUnit from torch._C and torch.jit (#50614 ) Summary: This simplifies our handling and allows passing CompilationUnits from Python to C++ defined functions via PyBind easily. Discussed on Slack with SplitInfinity Pull Request resolved: https://github.com/pytorch/pytorch/pull/50614 Reviewed By: anjali411 Differential Revision: D25938005 Pulled By: SplitInfinity fbshipit-source-id: 94aadf0c063ddfef7ca9ea17bfa998d8e7b367ad	2021-01-25 11:06:40 -08:00
Edward Yang	5e79b8e06d	Back out "Revert D25903846: [pytorch][PR] Structured kernel definition for upsample_nearest2d" (#50794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50794 Original commit changeset: b4a7948088c0 There are some subtle extra tweaks on top of the original. I can unbundle them, but I've opted to keep it with the port because it's the easiest way to make sure the changes are exercised. * There's a bugfix in the codegen to test if a dispatch key is structured before short circuiting because the dispatch key was missing in the table. This accounts for mixed structured-nonstructured situations where the dispatch table is present, but the relevant structured key isn't (because the dispatch table only exists to register, e.g., QuantizedCPU) * Dispatch tables for functions which delegate to structured kernels don't have Math entries from generated for them. * It's now illegal to specify a structured dispatch key in a delegated structured kernel (it will be ignored!) add is now fixed to follow this * There are some extra sanity checks for NativeFunctions validation * Finally, unlike the original PR, I switched the .vec variant of upsample_nearest2d to also be DefaultBackend, bringing it inline with upsample_nearest1d. ghstack-source-id: 120038038 Test Plan: ``` buck test mode/dev //coreai/tiefenrausch:python_tests -- --exact 'coreai/tiefenrausch:python_tests - test_can_run_local_async_inference_cpu (coreai.tiefenrausch.tests.python_test.TiefenrauschPY)' --run-disabled ``` Reviewed By: ngimel Differential Revision: D25962873 fbshipit-source-id: d29a9c97f15151db3066ae5efe7a0701e6dc05a3	2021-01-25 10:43:53 -08:00
Hameer Abbasi	f7b339d11c	Clarify wording around overrides subclasses. (#51031 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47117 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51031 Reviewed By: bdhirsh Differential Revision: D26047498 Pulled By: albanD fbshipit-source-id: dd0a7d9f97c0f6469b3050d2e3b4473f1bee3820	2021-01-25 08:19:13 -08:00
Edward Yang	a6257b2fe2	Fix #48903 (#50817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50817 Replace some longs with int64_t. Thanks Tom Heaven for contributing this patch. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D25975915 Pulled By: ezyang fbshipit-source-id: c1061a85f80ad17fa4fb313da797bc6d5ba203c2	2021-01-25 07:44:41 -08:00
Rong Rong (AI Infra)	806010b75e	[BE] move more unittest.main() to run_tests() (#50923 ) Summary: Relate to https://github.com/pytorch/pytorch/issues/50483. Everything except ONNX, detectron and release notes tests are moved to use common_utils.run_tests() to ensure CI reports XML correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50923 Reviewed By: samestep Differential Revision: D26027621 Pulled By: walterddr fbshipit-source-id: b04c03f10d1fe96181b720c4c3868e86e4c6281a	2021-01-25 07:23:09 -08:00
Peter Bell	8690819618	OpInfo: Add DecorateInfo class similar to SkipInfo for decorators (#50501 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/50435 I have confirmed this works by running ``` pytest test_ops.py -k test_fn_gradgrad_fft` ``` with normally and with `PYTORCH_TEST_WITH_SLOW=1 PYTORCH_TEST_SKIP_FAST=1`. In the first case all tests are skipped, in the second they all run as they should. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50501 Reviewed By: ezyang Differential Revision: D25956416 Pulled By: mruberry fbshipit-source-id: c896a8cec5f19b8ffb9b168835f3743b6986dad7	2021-01-25 04:51:04 -08:00
generatedunixname89002005325676	5a5bca8ef0	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D26043955 fbshipit-source-id: 0a5740a82bdd3ac7bd1665a325ff7fe79488ccea	2021-01-25 04:20:03 -08:00
Ivan Yashchuk	627a331257	Port CPU torch.orgqr to ATen (#50502 ) Summary: Now we can remove `_th_orgqr`! Compared to the original TH-based `orgqr`, complex (https://github.com/pytorch/pytorch/issues/33152) and batched inputs are now supported. CUDA support will be added in a follow-up PR. Closes https://github.com/pytorch/pytorch/issues/24747 Ref. https://github.com/pytorch/pytorch/issues/49421, https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50502 Reviewed By: mrshenli Differential Revision: D25953300 Pulled By: mruberry fbshipit-source-id: f52a74e1c8f51b5e24f7b461430ca8fc96e4d149	2021-01-25 02:57:05 -08:00
Nikita Shulga	48b6b9221a	[BE] Make Vec256 header only library (#50708 ) Summary: Do it by removing extraneous header dependencies. None of the at::vec256 primitive depend on notion of Tensor, therefore none of the headers that vec256 depends on should include <ATen/Tensor.h> Implicity test it be removing c10 and tensor dependency when building `vec256_test_all_types` Split affine_quantizer into affine_quantizer_base (that contains methods operating on raw types) and affine_quantizer (which contains Tensor specific methods) Fixes https://github.com/pytorch/pytorch/issues/50567 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50708 Reviewed By: walterddr Differential Revision: D25949168 Pulled By: malfet fbshipit-source-id: c3323be7252865a52c7d94026a5a39b494e44efb	2021-01-24 21:46:36 -08:00
Xiao Wang	186c3da037	Add cusolver gesvdj and gesvdjBatched to the backend of torch.svd (#48436 ) Summary: This PR adds cusolver `gesvdj` and `gesvdjBatched` to the backend of `torch.svd`. I've tested the performance using cuda 11.1 on 2070, V100, and A100. The cusolver gesvdj and gesvdjBatched performances are better than magma in all square matrix cases. So cusolver backend will replace magma backend when available. When both matrix dimensions are no greater than 32, `gesvdjBatched` is used. Otherwise, `gesvdj` is used. Detailed benchmark is available at https://github.com/xwang233/code-snippet/tree/master/svd. Some relevant code and discussions - https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/linalg/svd_op_gpu.cu.cc - https://github.com/google/jax/blob/master/jaxlib/cusolver.cc - https://github.com/cupy/cupy/issues/3174 - https://github.com/tensorflow/tensorflow/issues/13603 - https://www.nvidia.com/en-us/on-demand/session/gtcsiliconvalley2019-s9226/ See also https://github.com/pytorch/pytorch/issues/42666 https://github.com/pytorch/pytorch/issues/47953 Close https://github.com/pytorch/pytorch/pull/50516 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48436 Reviewed By: ejguan Differential Revision: D25977046 Pulled By: heitorschueroff fbshipit-source-id: c27e705cd29b6fd7c8ac674c1f9f490fa26ee1bf	2021-01-24 15:47:05 -08:00
Jordan Fix	1f40f2a172	Add improved support for parallelization and related graph opts (#5257 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/5257 - Add RescaleQuantized parallelization support to graph opts' parallelization code - On NNPI, mirror Rescale parallelization for FC/Relus that come before it - Sink Reshapes below Quantize and ConvertTo - Remove unnecessary ConvertTo when following a Dequantize (i.e. just change the elem kind of the Dequantize instead) Test Plan: Added unit tests Reviewed By: hyuen, mjanderson09 Differential Revision: D25947824 fbshipit-source-id: 771abd36a1bc7270bf1f901d1ec6cb6d78e9fd1f	2021-01-23 17:20:30 -08:00
Yanli Zhao	c9cae1446f	fix unflatten_dense_tensor when there is empty tensor inside (#50321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50321 Quantization team reported that when there are two empty tensors are replicated among ranks, the two empty tensors start to share storage after resizing. The root cause is unflatten_dense_tensor unflattened the empty tensor as view of flat tensor and thus share storage with other tensors. This PR is trying to avoid unflatten the empty tensor as view of flat tensor so that empty tensor will not share storage with other tensors. Test Plan: unit test Reviewed By: pritamdamania87 Differential Revision: D25859503 fbshipit-source-id: 5b760b31af6ed2b66bb22954cba8d1514f389cca	2021-01-23 12:14:34 -08:00
anjali411	e544d74c55	[CPU] Add torch.trace for complex tensors (#50380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50380 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25949361 Pulled By: anjali411 fbshipit-source-id: 9910bc5b532c9bf3add530221d643b2c41c62d01	2021-01-23 09:04:31 -08:00
Wanchao Liang	2c3c2a4b7a	[dist_optim] add distributed functional AdamW optimizer (#50620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50620 Add TorchScript compatible AdamW functional optimizer to distributed optimizer Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D25932774 Pulled By: wanchaol fbshipit-source-id: 64eb4aeaa3cab208d0ebbec7c4d91a9d43951947	2021-01-23 01:04:45 -08:00
Wanchao Liang	3f982e56b1	[dist_optim] add distributed functional RMSprop optimizer (#50619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50619 Add TorchScript compatible RMSprop functional optimizer to distributed optimizer Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D25932775 Pulled By: wanchaol fbshipit-source-id: bd4854f9f95a740e02a1bebe24f780488460ba4d	2021-01-23 01:04:41 -08:00
Wanchao Liang	6c81b4d917	[dist_optim] add distributed functional Adadelta optimizer (#50623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50623 Add TorchScript compatible Adadelta functional optimizer to distributed optimizer Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D25932772 Pulled By: wanchaol fbshipit-source-id: d59b04e5f0b6bab7e0d1c5f68e66249a65958e0b	2021-01-23 01:04:36 -08:00
Wanchao Liang	cd2067539e	[dist_optim] add distributed functional sgd optimizer (#50618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50618 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D25932778 Pulled By: wanchaol fbshipit-source-id: 8df3567b477bc5ba3556b8c5294cd3da5db963ad	2021-01-23 01:04:32 -08:00
Wanchao Liang	5cbe1e4933	[dist_optim] add distributed functional Adam optimizer (#50624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50624 Add TorchScript compatible Adam functional optimizer to distributed optimizer Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D25932770 Pulled By: wanchaol fbshipit-source-id: cab3f1164c76186969c284a2c52481b79bbb7190	2021-01-23 01:01:37 -08:00
Rohan Varma	5a661e0171	[WIP][Grad Compression] Unittest to verify allreduce_hook parity (#50851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50851 Improves upon the previous unittest to ensure allreduce_hook results in the same gradients as vanilla allreduce in DDP. ghstack-source-id: 120229103 Test Plan: buck build mode/dev-nosan //caffe2/test/distributed:distributed_nccl_fork --keep-going BACKEND=nccl WORLD_SIZE=2 ~/fbcode/buck-out/dev/gen/caffe2/test/distributed/distributed_nccl_fork#binary.par -r test_ddp_hook_parity Reviewed By: SciPioneer Differential Revision: D25963654 fbshipit-source-id: d55eee0aee9cf1da52aa0c4ba1066718aa8fd9a4	2021-01-23 00:47:08 -08:00
Hao Lu	6aec1eba15	[aten] Make aten::flatten call native::reshape (#50859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50859 Test Plan: Unit test: ``` buck test //caffe2/test:torch ``` Benchmark: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 13 \ ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \ --scripted_model=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/traced_precomputation.pt \ --pt_inputs=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/container_precomputation_bs20.pt \ --iters=10000 --warmup_iters=10000 --num_threads=1 --pt_enable_static_runtime=true \ --pt_cleanup_activations=true --pt_enable_out_variant=true --do_profile=true ``` Reduces the total time spent on flatten from 1.22% to 0.97% (net 0.25% reduction). ``` Before: Static runtime ms per iter: 0.0725054. Iters per second: 13792.1 0.000857179 ms. 1.21862%. aten::flatten (1 nodes) After: Static runtime ms per iter: 0.0720371. Iters per second: 13881.7 0.000686155 ms. 0.97151%. aten::flatten (1 nodes) ``` Reviewed By: ajyu Differential Revision: D25986759 fbshipit-source-id: dc0f542c56a688d331d349845b78084577970476	2021-01-22 23:12:01 -08:00
mariosasko	069e68a2a4	Fix ScriptModule docstring (#48608 ) Summary: Fixes a typo in `ScriptModule`'s docstring and converts it to the raw format (`r"""...`). Fixes https://github.com/pytorch/pytorch/issues/48634 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48608 Reviewed By: anjali411 Differential Revision: D25242022 Pulled By: gmagogsfm fbshipit-source-id: 5199868af999c6c360c7dd5e2813659f1028acab	2021-01-22 22:32:18 -08:00
Dhruv Matani	ce0f335515	[PyTorch Mobile] Add an overload for deserialize() that doesn't accept the extra_files map. (#50932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50932 After the change to split `_load_for_mobile()` into multiple methods, one which takes in the `extra_files` map, and one which doesn't, we can change the implementation of the `deserialize()` method with different overloads as well. Suggested by raziel on D25968216 (`bb909d27d5`). ghstack-source-id: 120185089 Test Plan: Build/Sandcastle. Reviewed By: JacobSzwejbka Differential Revision: D26014084 fbshipit-source-id: 914142137346a6246def1acf38a3204dd4c4f52f	2021-01-22 21:54:24 -08:00
Xiang Gao	ab331da7ac	Rewrite kron with broadcasting at::mul (#50927 ) Summary: Because it is shorter, faster, and does not have TF32 issue. Benchmark: https://github.com/zasdfgbnm/things/blob/master/2021Q1/kron.ipynb Pull Request resolved: https://github.com/pytorch/pytorch/pull/50927 Reviewed By: glaringlee Differential Revision: D26022385 Pulled By: ngimel fbshipit-source-id: 513c9e9138c35c70d3a475a8407728af21321dae	2021-01-22 20:58:17 -08:00
James Reed	789f6f1250	[FX] Minor docs changes (#50966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50966 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D26029101 Pulled By: jamesr66a fbshipit-source-id: 4374771be74d0a4d05fdd29107be5357130c2a76	2021-01-22 16:23:19 -08:00
Sam Estep	5c1c858ca8	Revert D25977352: [pytorch][PR] Refactor mypy configs list into editor-friendly wrapper Test Plan: revert-hammer Differential Revision: D25977352 (`73dffc8452`) Original commit changeset: 4b3a5e8a9071 fbshipit-source-id: a0383ea4158f54be6f128b9ddb2cd12fc3a3ea53	2021-01-22 15:53:44 -08:00
Michael Carilli	ffc8a26991	philox_engine_inputs should also round increment to a multiple of 4 (#50916 ) Summary: `philox_engine_inputs()` is deprecated. Callers should refactor to use `philox_cuda_state()`, and afaik all call sites in aten have already been refactored, but in the meantime on behalf of other consumers (ie extensions, possibly some lingering call sites in jit), `philox_engine_inputs` should handle the increment the same way `philox_cuda_state` does. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50916 Reviewed By: mrshenli Differential Revision: D26022618 Pulled By: ngimel fbshipit-source-id: 17178ad099ddc17d3596b9508ae4dce729b44f57	2021-01-22 15:51:15 -08:00
Richard Zou	63838b9330	Turn on batched_grad testing for NewModuleTest (#50740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50740 This PR adds a `check_batched_grad=True` option to NewModuleTest-generated NN tests. Test Plan: - run tests (`pytest test/test_nn.py -v -rf`) Reviewed By: ejguan Differential Revision: D25997679 Pulled By: zou3519 fbshipit-source-id: b75e73d7e86fd3af9bad6efed7127b36551587b3	2021-01-22 15:33:09 -08:00
Nikita Shulga	de8cd6b201	[BE] Replace M_PI with c10::pi constexpr variable (#50819 ) Summary: Also, get rid of MSVC specific `_USE_MATH_DEFINES` Test at compile time that c10::pi<double> == M_PI Pull Request resolved: https://github.com/pytorch/pytorch/pull/50819 Reviewed By: albanD Differential Revision: D25976330 Pulled By: malfet fbshipit-source-id: 8f3ddfd58a5aa4bd382da64ad6ecc679706d1284	2021-01-22 15:15:31 -08:00
Jason Ansel	a66851a2ad	[FX] torch.fx.symbolic_trace patching improvements and `math.` support (#50793 ) Summary: This contains some improvements and refactoring to how patching is done in `torch.fx.symbolic_trace`. 1) Functions from `math.` are now supported without needing to call `torch.fx.wrap()`. `wrap()` actually errors on some of these function because they are written in C and don't have `__code__` requiring use of the string version. `math` usage is relatively common, for example [BERT uses math.sqrt here](`6f79061bd1/torchbenchmark/models/BERT_pytorch/bert_pytorch/model/attention/single.py (L16)`). Both `math.sqrt()` and `from math import sqrt` (copying to module namespace) are supported. When modules are called FX now searches the module's global scope to find methods to patch. 2) [Guarded behind `env FX_PATCH_GETITEM=1`] Fixes a failed trace of [PositionalEmbedding from BERT](`6f79061bd1/torchbenchmark/models/BERT_pytorch/bert_pytorch/model/embedding/position.py (L24)`), which failed to trace with the error `TypeError: slice indices must be integers or None or have an __index__ method` (a Proxy() is getting passed into `Tensor.__getitem__`). See https://github.com/pytorch/pytorch/issues/50710 for why this is disabled by default. 3) Support for automatically wrapping methods that may have been copied to a different module scope via an import like `from foo import wrapped_function`. This also isn't exposed in `torch.fx.wrap`, but is used to implement `math.*` support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50793 Test Plan: Added unittests to check each feature Reviewed By: jamesr66a Differential Revision: D25999788 Pulled By: jansel fbshipit-source-id: f1ce11a69b7d97f26c9e2741c6acf9c513a84467	2021-01-22 15:05:24 -08:00
jiej	dd1c2a06b7	refactor profiling optional (#47667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47667 Test Plan: Imported from OSS Reviewed By: anjali411, ngimel Differential Revision: D25255572 Pulled By: Krovatkin fbshipit-source-id: d0152c9ef5b1994e27be9888bcb123dca3ecd88f	2021-01-22 14:45:28 -08:00
Shen Li	f0e72e54cc	Fix CUDA RPC Stream Synchronization (#50949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50949 When converting RPC Message into Python objects, we were not using a CUDAFuture for the chained Future. As a result, the streams are not synchronized when calling `rpc_async(...).wait()`. This commit uses `Future::then` API to create the chained Future, which will be creating a CUDAFuture if the existing Future is a CUDA one. fixes #50881 fixes #50839 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D26020458 Pulled By: mrshenli fbshipit-source-id: 25195fbc10b99f4c401ec3ed7a382128464b5f08	2021-01-22 14:05:43 -08:00
Kaustubh Vartak	78f30386c5	Implement Swish(SiLU) operator in FP16 Summary: Used Caffe2 Swish implemenmtation to implement the operator. Will need to resolve the error introduced. ``` test_quantized_swish_2D (tests.operators.testQuantizedSilu.TestSiLU) ... input: (tensor([[-6.0000, -5.9961, -5.9922, ..., -5.7734, -5.7695, -5.7656], [-5.7617, -5.7539, -5.7500, ..., -5.5352, -5.5312, -5.5234], [-5.5195, -5.5156, -5.5117, ..., -5.2930, -5.2891, -5.2852], ..., [ 5.2852, 5.2891, 5.2930, ..., 5.5117, 5.5156, 5.5195], [ 5.5234, 5.5312, 5.5352, ..., 5.7500, 5.7539, 5.7617], [ 5.7656, 5.7695, 5.7734, ..., 5.9922, 5.9961, 6.0000]]),) base_res: tensor([[-0.0148, -0.0149, -0.0149, ..., -0.0179, -0.0180, -0.0180], [-0.0181, -0.0182, -0.0182, ..., -0.0218, -0.0218, -0.0220], [-0.0220, -0.0221, -0.0222, ..., -0.0265, -0.0266, -0.0266], ..., [ 5.2585, 5.2625, 5.2665, ..., 5.4895, 5.4935, 5.4975], [ 5.5015, 5.5094, 5.5134, ..., 5.7318, 5.7357, 5.7437], [ 5.7476, 5.7516, 5.7555, ..., 5.9773, 5.9812, 5.9852]]) tnco_res: tensor([[-0.0148, -0.0149, -0.0149, ..., -0.0179, -0.0180, -0.0180], [-0.0181, -0.0182, -0.0182, ..., -0.0218, -0.0218, -0.0220], [-0.0220, -0.0221, -0.0222, ..., -0.0265, -0.0265, -0.0266], ..., [ 5.2578, 5.2617, 5.2656, ..., 5.4922, 5.4922, 5.4961], [ 5.5000, 5.5078, 5.5156, ..., 5.7305, 5.7383, 5.7422], [ 5.7461, 5.7500, 5.7539, ..., 5.9766, 5.9805, 5.9844]]) nnpi_res: tensor([[-0.0148, -0.0149, -0.0149, ..., -0.0179, -0.0180, -0.0180], [-0.0181, -0.0182, -0.0182, ..., -0.0218, -0.0218, -0.0220], [-0.0220, -0.0221, -0.0222, ..., -0.0265, -0.0266, -0.0266], ..., [ 5.2585, 5.2625, 5.2665, ..., 5.4895, 5.4935, 5.4975], [ 5.5015, 5.5094, 5.5134, ..., 5.7318, 5.7357, 5.7437], [ 5.7476, 5.7516, 5.7555, ..., 5.9773, 5.9812, 5.9852]]) diff: tensor([[4.1956e-06, 9.8441e-07, 6.0154e-06, ..., 4.2785e-06, 7.6480e-06, 1.0842e-05], [1.3988e-06, 4.1034e-06, 6.5863e-06, ..., 5.3961e-06, 2.9635e-06, 1.0209e-05], [1.2219e-06, 7.9758e-06, 1.7386e-05, ..., 3.0547e-07, 2.2141e-05, 1.4316e-05], ..., [7.0286e-04, 7.8678e-04, 8.7023e-04, ..., 2.6422e-03, 1.3347e-03, 1.4052e-03], [1.4753e-03, 1.6141e-03, 2.2225e-03, ..., 1.2884e-03, 2.5592e-03, 1.4634e-03], [1.5216e-03, 1.5793e-03, 1.6365e-03, ..., 6.9284e-04, 7.4100e-04, 7.8964e-04]]) nnpi traced graph: graph(%self : __torch__.tests.operators.testQuantizedSilu.SiLUModel, %x : Float(, , requires_grad=0, device=cpu)): %3 : None = prim::Constant() %4 : bool = prim::Constant[value=0]() # /data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/tests/operators/testQuantizedSilu.py:13:0 %5 : Device = prim::Constant[value="cpu"]() # /data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/tests/operators/testQuantizedSilu.py:13:0 %6 : int = prim::Constant[value=0]() # /data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/tests/operators/testQuantizedSilu.py:13:0 %7 : int = prim::Constant[value=6]() # /data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/tests/operators/testQuantizedSilu.py:13:0 %8 : Float(, , requires_grad=0, device=cpu) = aten::zeros_like(%x, %7, %6, %5, %4, %3) # /data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/tests/operators/testQuantizedSilu.py:13:0 %input : Float(, , requires_grad=0, device=cpu) = glow::FusionGroup_0(%x, %8) %10 : Tensor = aten::silu(%input) # /data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/torch/nn/functional.py:1804:0 return (%10) with glow::FusionGroup_0 = graph(%0 : Float(, , requires_grad=0, device=cpu), %1 : Float(, , requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input : Float(, , requires_grad=0, device=cpu) = aten::add(%0, %1, %2) # /data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/tests/operators/testQuantizedSilu.py:13:0 %4 : int = prim::Constant[value=1]() # /data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/tests/operators/testQuantizedSilu.py:13:0 return (%input) tnco traced graph: graph(%self : __torch__.tests.operators.testQuantizedSilu.___torch_mangle_0.SiLUModel, %x : Float(, , requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() # /data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/tests/operators/testQuantizedSilu.py:13:0 %3 : None = prim::Constant() %4 : bool = prim::Constant[value=0]() # /data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/tests/operators/testQuantizedSilu.py:13:0 %5 : Device = prim::Constant[value="cpu"]() # /data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/tests/operators/testQuantizedSilu.py:13:0 %6 : int = prim::Constant[value=0]() # /data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/tests/operators/testQuantizedSilu.py:13:0 %7 : int = prim::Constant[value=6]() # /data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/tests/operators/testQuantizedSilu.py:13:0 %8 : Float(, , requires_grad=0, device=cpu) = aten::zeros_like(%x, %7, %6, %5, %4, %3) # /data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/tests/operators/testQuantizedSilu.py:13:0 %12 : Tensor = fakeNNPI::addFP16(%x, %8, %2) %11 : Tensor = fakeNNPI::siluFP16(%12) return (%11) FAIL ====================================================================== FAIL: test_quantized_swish_2D (tests.operators.testQuantizedSilu.TestSiLU) ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/tests/operators/testQuantizedSilu.py", line 26, in test_quantized_swish_2D validate_nnpi_model(model, (x,), expected_ops, []) File "/data/users/kaus/fbsource/fbcode/buck-out/dev/gen/glow/fb/torch_glow/custom_nnpi_ops/testQuantizedSilu#binary,link-tree/tests/utils.py", line 73, in validate_nnpi_model assert is_equal AssertionError ``` Test Plan: Run test with buck test mode/dev //glow/fb/torch_glow/custom_nnpi_ops:testQuantizedSilu Reviewed By: hyuen Differential Revision: D25981369 fbshipit-source-id: dd0f3686b3cbf6fc575c959c7661125ecbf0b0db	2021-01-22 13:57:54 -08:00
Lemo	ca3ce77746	Dump torch::jit::AliasDb objects as Graphviz files (#50452 ) Summary: This PR adds a simple debugging helper which exports the AliasDb state as a [GraphViz](http://www.graphviz.org/) graph definition. The generated files can be viewed with any Graphviz viewer (including online based, for example http://viz-js.com) Usage: 1. Call `AliasDb::dumpToGraphvizFile()` from a debugger. Using gdb for example: `call aliasDb_->dumpToGraphvizFile("alias.dot")` 2. Add explicit calls to `AliasDb::dumpToGraphvizFile()`, which returns `true` if it succeeds. An example output file is attached: [example.zip](https://github.com/pytorch/pytorch/files/5805840/example.zip) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50452 Reviewed By: ngimel Differential Revision: D25980222 Pulled By: eellison fbshipit-source-id: 47805a0a81ce73c6ba859340d37b9a806f9000d5	2021-01-22 13:38:47 -08:00
Sam Estep	73dffc8452	Refactor mypy configs list into editor-friendly wrapper (#50826 ) Summary: Closes https://github.com/pytorch/pytorch/issues/50513 by resolving the first three checkboxes. If this PR is merged, I will also modify one or both of the following wiki pages to add instructions on how to use this `mypy` wrapper for VS Code editor integration: - [Guide for adding type annotations to PyTorch](https://github.com/pytorch/pytorch/wiki/Guide-for-adding-type-annotations-to-PyTorch) - [Lint as you type](https://github.com/pytorch/pytorch/wiki/Lint-as-you-type) The test plan below is fairly manual, so let me know if I should add more automated tests to this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50826 Test Plan: Unit tests for globbing function: ``` python test/test_testing.py TestMypyWrapper -v ``` Manual checks: - Uninstall `mypy` and run `python test/test_type_hints.py` to verify that it still works when `mypy` is absent. - Reinstall `mypy` and run `python test/test_type_hints.py` to verify that this didn't break the `TestTypeHints` suite. - Run `python test/test_type_hints.py` again (should finish quickly) to verify that this didn't break `mypy` caching. - Run `torch/testing/_internal/mypy_wrapper.py` on a few Python files in this repo to verify that it doesn't give any additional warnings when the `TestTypeHints` suite passes. Some examples (compare with the behavior of just running `mypy` on these files): ```sh torch/testing/_internal/mypy_wrapper.py README.md torch/testing/_internal/mypy_wrapper.py tools/fast_nvcc/fast_nvcc.py torch/testing/_internal/mypy_wrapper.py test/test_type_hints.py torch/testing/_internal/mypy_wrapper.py torch/random.py torch/testing/_internal/mypy_wrapper.py torch/testing/_internal/mypy_wrapper.py ``` - Remove type hints from `torch.testing._internal.mypy_wrapper` and verify that running `mypy_wrapper.py` on that file gives type errors. - Remove the path to `mypy_wrapper.py` from the `files` setting in `mypy-strict.ini` and verify that running it again on itself no longer gives type errors. - Add `test/test_type_hints.py` to the `files` setting in `mypy-strict.ini` and verify that running the `mypy` wrapper on it again now gives type errors. - Remove type hints from `torch/random.py` and verify that running the `mypy` wrapper on it again now gives type errors. - Add the suggested JSON from the docstring of `torch.testing._internal.mypy_wrapper.main` to your `.vscode/settings.json` and verify that VS Code gives the same results (inline, while editing any Python file in the repo) as running the `mypy` wrapper on the command line, in all the above cases. Reviewed By: glaringlee, walterddr Differential Revision: D25977352 Pulled By: samestep fbshipit-source-id: 4b3a5e8a9071fcad65a19f193bf3dc7dc3ba1b96	2021-01-22 13:35:44 -08:00
Rohan Varma	7e10fbfb71	Add note about TCP init in RPC tests to contributing doc. (#50861 ) Summary: We added this option in https://github.com/pytorch/pytorch/pull/48248, but it would be good to document it somewhere as well, hence adding it to this contributing doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50861 Reviewed By: mrshenli Differential Revision: D26014505 Pulled By: rohan-varma fbshipit-source-id: c1321679f01dd52038131ff571362ad36884510a	2021-01-22 13:28:03 -08:00
Edward Yang	2ab497012f	Add at::cpu namespace of functions for structured kernels (#49505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49505 I have a problem which is that static runtime needs a way to bypass dispatch and call into kernels directly. Previously, it used native:: bindings to do this; but these bindings no longer exist for structured kernels! Enter at::cpu: a namespace of exactly at:: compatible functions that assume all of their arguments are CPU and non-autograd! The header looks like this: ``` namespace at { namespace cpu { CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & upsample_nearest1d_out(Tensor & out, const Tensor & self, IntArrayRef output_size, c10::optional<double> scales=c10::nullopt); CAFFE2_API Tensor upsample_nearest1d(const Tensor & self, IntArrayRef output_size, c10::optional<double> scales=c10::nullopt); CAFFE2_API Tensor & upsample_nearest1d_backward_out(Tensor & grad_input, const Tensor & grad_output, IntArrayRef output_size, IntArrayRef input_size, c10::optional<double> scales=c10::nullopt); CAFFE2_API Tensor upsample_nearest1d_backward(const Tensor & grad_output, IntArrayRef output_size, IntArrayRef input_size, c10::optional<double> scales=c10::nullopt); }} ``` This slows down static runtime because these are not the "allow resize of nonzero tensor" variant binding (unlike the ones I had manually written). We can restore this: it's a matter of adding codegen smarts to do this, but I haven't done it just yet since it's marginally more complicated. In principle, non-structured kernels could get this treatment too. But, like an evil mastermind, I'm withholding it from this patch, as an extra carrot to get people to migrate to structured muahahahaha. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25616105 Pulled By: ezyang fbshipit-source-id: 84955ae09d0b373ca1ed05e0e4e0074a18d1a0b5	2021-01-22 13:11:59 -08:00
Rong Rong (AI Infra)	7b12893155	[BE] .gitignore adding test-reports/ folder (#50952 ) Summary: Cant think of a reason not .gitignore test-reports folder. this can be helpful when 1. running `python test/test*.py` from github root directory since it creates the folder at root. 2. CI test report path generated by `torch/testing/_internal/common_utils.py` creates the folder in the same path where the test python file locates. Creating a PR to make sure CI is happy. this is also needed by https://github.com/pytorch/pytorch/issues/50923 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50952 Reviewed By: samestep Differential Revision: D26022436 Pulled By: walterddr fbshipit-source-id: 83e6296de802bd1754b802b8c70502c317f078c9	2021-01-22 12:12:45 -08:00
kshitij12345	a291b254ee	Migrate masked_scatter_ CPU to ATen (#49732 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49541 Reference: https://github.com/pytorch/pytorch/issues/24507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49732 Reviewed By: ejguan Differential Revision: D25991438 Pulled By: ngimel fbshipit-source-id: a43bd0bfe043d8e32a6cadbbf736a0eaa697e7ec	2021-01-22 12:05:56 -08:00
Peter Bell	db079a9877	Padding: support complex dtypes (#50594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50594 Fixes #50234 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25987316 Pulled By: anjali411 fbshipit-source-id: c298b771fe52b267a86938e886ea402badecfe3e	2021-01-22 11:57:42 -08:00
Ivan Kobzarev	c908ebd4a1	[android] fix yuv conversion - remove define (#50951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50951 Test Plan: Imported from OSS Reviewed By: fmassa Differential Revision: D26021488 Pulled By: IvanKobzarev fbshipit-source-id: 6d295762bb1160a3ed8bafac08e03e1eeb07d688	2021-01-22 11:30:57 -08:00
Kurt Mohler	8ab1a1495d	Rename `set_deterministic` to `use_deterministic_algorithms` (#49904 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49100 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49904 Reviewed By: ezyang, mrshenli Differential Revision: D25956761 Pulled By: mruberry fbshipit-source-id: 86a59289d50825a0ebbd7c358b483c8d8039ffa6	2021-01-22 11:27:07 -08:00
narain pattabhiraman	7cb4712b38	count_nonzero with requires grad (#50866 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50792 fixes `count_nonzero` for tensors with requires_grad and also includes test Pull Request resolved: https://github.com/pytorch/pytorch/pull/50866 Reviewed By: ejguan Differential Revision: D25996202 Pulled By: albanD fbshipit-source-id: 61f2d7d62dd04e574a65ad03ef3a358b141fbae7	2021-01-22 11:19:59 -08:00
Ansley Ussery	d5dc65a45c	Document example of Proxy use (#50583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50583 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26010501 Pulled By: ansley fbshipit-source-id: 947121af7e57c16c96f849fbbb3fa83e97d003b2	2021-01-22 11:05:51 -08:00
Richard Barnes	89cafde8a4	Modernize for-loops (#50912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50912 Test Plan: Sandcastle tests Reviewed By: ansley Differential Revision: D26001948 fbshipit-source-id: 3bfe6a8283a2b1882ed472f836ae1b6e720e519f	2021-01-22 10:53:24 -08:00
Dhruv Matani	156da22566	[PyTorch] Eliminate static default_extra_files_mobile from header import.h (#50832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50832 Please see the previous diff in this stack for the motivation to do so. This makes the same change but for the non-mobile codebase. ghstack-source-id: 120184012 Test Plan: Sandcastle + Build Reviewed By: raziel, iseeyuan Differential Revision: D25979986 fbshipit-source-id: 7708f4f6a50cb16d7a23651e5655144d277d0a4f	2021-01-22 09:59:56 -08:00
Bram Wasti	d60d108280	[nnc] Expose fast tanh/sigmoid (#50736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50736 Exposes tanh and sigmoid to other backends Test Plan: buck test caffe2/test/cpp/tensorexpr:tensorexpr -- "ATen.fast" Reviewed By: bertmaher Differential Revision: D25884911 fbshipit-source-id: f9a5286450331f60935cfd40bb23f4a4f4c1d087	2021-01-22 09:56:02 -08:00
Peter Bell	47f0bda3ef	Improve complex support in common_nn test machinery (#50593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50593 There are no equivalent to torch.FloatTensor, torch.cuda.FloatTensor for complex types. So `get_gpu_type` and `get_cpu_type` are broken for complex tensors. Also found a few places that explicitly cast inputs to floating point types, which would drop the imaginary component before running the test. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25954050 Pulled By: mruberry fbshipit-source-id: 1fa8e5af233aa095c839d5e2f860564baaf92aef	2021-01-22 09:44:45 -08:00
anjali411	9ac30d96aa	Add complex IValues (#50883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50883 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26003682 Pulled By: anjali411 fbshipit-source-id: f02967d2d236d740cd8647891f732f1d63098d3e	2021-01-22 09:44:40 -08:00
Alexander	002d978428	Sparse benchmarking utils (#48397 ) Summary: This is a benchmarking tooling to work with sparse tensors. To implement this, we extended PR `benchmarking util` [https://github.com/pytorch/pytorch/issues/38338](https://github.com/pytorch/pytorch/pull/38338) for sparse tensors. In order to extend the proposed utility library the FuzzedTensor class was extended by creating the new FuzzedSparseTensor class. In addition two new operator classes were added, the `UnaryOpSparseFuzzer` and `BinaryOpSparseFuzzer`. The class `FuzzedSparseTensor` adds new input parameters to the constructor: 1. `sparse_dim`: The number of sparse dimensions in a sparse tensor. 2. `nnz`: Number of non-zero elements in the sparse tensor. 3. `density`: The density of the sparse tensor. 4. `coalesced`: As we know the sparse tensor format permits coalesced/uncoalesced sparse tensors. and removes `probability_contiguous`, `max_allocation_bytes`, `roll_parameter`, `tensor_constructor` as they are dense-tensors related parameters. In addition, I've extended the `torch.utils.benchmark.examples` to work with the new classes `FuzzedSparseTensor`, `UnaryOpSparseFuzzer` and `BinaryOpSparseFuzzer`. Hopefully, this tooling and these examples will help to make other benchmarks in other PRs. Looking forward to your thoughts and feedback. cc robieta, mruberry, ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/48397 Reviewed By: ejguan Differential Revision: D26008137 Pulled By: mruberry fbshipit-source-id: 2f37811c7c3eaa3494a0f2500e519267f2186dfb	2021-01-22 09:40:59 -08:00
Peter Bell	0436ea125b	OpInfo: Remove promotes_integers_to_float and infer it instead (#50279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50279 This allows different sample inputs to have different behavior for the same operator. For example, `div(..., rounding_mode='true')` will promote but other rounding modes don't. The current boolean flag is too restrictive to allow this. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25950011 Pulled By: mruberry fbshipit-source-id: 7e82b82bedc626b2b6970d92d5b25676183ec384	2021-01-22 09:32:37 -08:00
Will Constable	4bbff92014	Refactor build targets for torch::deploy (#50288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50288 torch::deploy will bundle the objects contained in libtorch-python together with frozenpython into a shared library. Therefore, the libtorch-python objs can't bring with them a dependency on system python. Buck TARGETS are added throughout the caffe2 tree to make available objects or headers that will be needed by torch::deploy but would have brought unsuitable dependencies if accessed using existing targets. CMakeLists are modified to separate a torch-python-objs object library which lets torch::deploy compile these objs with the same compile flags as libttorch_python used, but without some of the link-time dependencies such as python. CudaIPCTypes is moved from libtorch_python to libtorch_cuda because it is really not a python binding, and it statically registers a cuda_ipc_callback which would be duplicated if included in each copy of torch::deploy. Test Plan: no new functionality, just ensure existing tests continue to pass Reviewed By: malfet Differential Revision: D25850785 fbshipit-source-id: b0b81c050cbee04e9de96888f8a09d29238a9db8	2021-01-22 09:16:32 -08:00
Mikhail Zolotukhin	5f07b53ec2	[TensorExpr] Add LoopNest::simplify. (#50850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50850 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D25985085 Pulled By: ZolotukhinM fbshipit-source-id: e51709423c2c12b37b449a9d7bb22be04cda7ef1	2021-01-22 08:43:34 -08:00
Michael Suo	2ba2ab9e46	[packaging] add support for BytesIO (#50838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50838 Similar to `torch.save` and `torch.jit.save`, accept a IO-like object instead of just a file. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D25982719 Pulled By: suo fbshipit-source-id: 42f3665932bbaa6897215002d116df6338edae50	2021-01-22 08:33:39 -08:00
Richard Zou	c7d348fea6	Turn on batched grad testing for non-autogenerated tests in test_nn.py (#50739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50739 This does not turn on batched grad testing for autogenerated NewModuleTest tests and CriterionTest tests. Those are coming later. Test Plan: - run tests Reviewed By: ejguan Differential Revision: D25997677 Pulled By: zou3519 fbshipit-source-id: b4b2d68e0f99c3d573faf237e1e531d0b3fced40	2021-01-22 07:40:20 -08:00
Jeff Daily	b2e5617553	[ROCm] rename HIP_HCC_FLAGS to HIP_CLANG_FLAGS (#50917 ) Summary: ROCm 3.5 replaced hcc with hip-clang and deprecated HIP_HCC_FLAGS. HIP_CLANG_FLAGS should be used moving forward. HIP_HCC_FLAGS will be removed soon. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50917 Reviewed By: ejguan Differential Revision: D26008094 Pulled By: walterddr fbshipit-source-id: cfec4f96fbd9bd338834a841c37267f6a4703cab	2021-01-22 07:24:05 -08:00
M.L. Croci	8eb90d4865	Add Gaussian NLL Loss (#50886 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48520. cc albanD (This is a clean retry PR https://github.com/pytorch/pytorch/issues/49807) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50886 Reviewed By: ejguan Differential Revision: D26007435 Pulled By: albanD fbshipit-source-id: 88fe91b40dea6f72e093e6301f0f04fcc842d2f0	2021-01-22 06:56:49 -08:00
Ilia Cherniavskii	e34992ebee	Set USE_KINETO=1 (#49897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49897 Resend of https://github.com/pytorch/pytorch/pull/49201 Test Plan: see 49201 Reviewed By: malfet Differential Revision: D25717102 Pulled By: ilia-cher fbshipit-source-id: 5e794a7f5fe160ca64ac9d190c4fd3e8f1e443e6	2021-01-22 00:09:21 -08:00
Ansley Ussery	7494f0233a	snake_case FX IR names (#50876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50876 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26002640 Pulled By: ansley fbshipit-source-id: 4de8a63ef227ae3d46fab231f739c8472289ca4d	2021-01-21 22:25:57 -08:00
Ansley Ussery	7f22af13b9	Add alternative prettyprinting method to `Graph` (#50878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50878 Test Plan: Imported from OSS Reviewed By: SplitInfinity, eellison Differential Revision: D26009183 Pulled By: ansley fbshipit-source-id: 300913ea634d9a0e5b00deb831154ef126ad4180	2021-01-21 22:15:56 -08:00
Hong Xu	d33cc4c01b	Use quiet_NaN() in calc_digamma, not NAN (#50412 ) Summary: This not only specifies the data types of these NaNs, but also indicate that the function isn't signaling anything unusual. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50412 Reviewed By: mrshenli Differential Revision: D25899828 Pulled By: ezyang fbshipit-source-id: a8ded10954ad08cba3098aa473c6b77f2e03dc93	2021-01-21 22:02:00 -08:00
Dhruv Matani	bb909d27d5	[PyTorch Mobile] Eliminate static default_extra_files_mobile from header import.h (#50795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50795 There's [a post](https://fb.workplace.com/groups/2148543255442743/permalink/2583012411995823/) about a customer having to pass in `-Wno-global-constructors` to disable warnings related to calling constructors for global objects. This is related to the initialization of `default_extra_files_mobile` in `import.h`. It requires end users to pass in the compiler flag, since the definition is now in code (.cpp files) that they will be compiling. In addition, it makes the API for `_load_for_mobile` non-re-entrant (i.e. can not be safely used concurrently from multiple threads without the caller taking a mutex/lock) if the `extra_files_mobile` argument is not explicitly passed in. Instead, a better option would be to create different overloads; one which requires all 3 parameters, and one that can work with 1-2. This solves the problem without creating a static variable. ghstack-source-id: 120127083 Test Plan: Build Lite Interpreter and sandcastle. Reviewed By: raziel Differential Revision: D25968216 fbshipit-source-id: fbd80dfcafb8ef7231aca301445c4a2ca9a08995	2021-01-21 21:22:48 -08:00
Sebastian Messmer	d46210958e	Remove `use_c10_dispatcher: full` lines added in the last couple days (#50769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50769 There were a couple new of these lines added in the last couple of days but they're not necessary anymore. This PR removes them and also adds an assertion to make sure we don't add any more. ghstack-source-id: 120133715 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D25961316 fbshipit-source-id: e2befc5b6215b42decb2acedcacfb50734857e2f	2021-01-21 20:35:26 -08:00
Nikita Shulga	57fb2c0fcc	[PPC] Add missing vec_[signed\|neg\|sldw] definitions (#50640 ) Summary: Base on quickwritereader 's comment: https://github.com/pytorch/pytorch/issues/50439#issuecomment-760025933 Those builtins were added in gcc-8 or newer Fixes https://github.com/pytorch/pytorch/issues/50439 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50640 Reviewed By: walterddr Differential Revision: D25934384 Pulled By: malfet fbshipit-source-id: b5dcfcf644ab92a78279c4dca5dbffbb8d8aae0c	2021-01-21 19:57:53 -08:00
Jane Xu	533cb9530e	Introducing TORCH_CUDA_CPP_API and TORCH_CUDA_CU_API to the code (#50627 ) Summary: Sub-step of my attempt to split up the torch_cuda library, as it is huge. Please look at https://github.com/pytorch/pytorch/issues/49050 for details on the split and which files are in which target. This PR introduces two new macros for Windows DLL purposes, TORCH_CUDA_CPP_API and TORCH_CUDA_CU_API. Both are defined as TORCH_CUDA_API for the time being. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50627 Reviewed By: mruberry Differential Revision: D25955441 Pulled By: janeyx99 fbshipit-source-id: ff226026833b8fb2fb7c77df6f2d6c824f006869	2021-01-21 19:09:11 -08:00
Scott Wolchok	3aed177484	[PyTorch] inline Dispatcher::singleton (#50644 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50644 The dispatcher is a very hot code path; not inlining `Dispatcher::singleton()` was hurting perf. Test Plan: Profiled our internal empty() benchmark. `perf stat` shows about a 1.7% reduction in cycles spent; the benchmark's timing itself shows a small reduction. Reviewed By: dzhulgakov, bhosmer Differential Revision: D25935275 fbshipit-source-id: a328f8ac8ea479bbe5c6ddb80f98838ae6058bbd	2021-01-21 19:01:16 -08:00
Fritz Obermeyer	21c2542b6a	Independent constraint (#50547 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/50496 This fixes a number of inconsistencies in torch.distributions.constraints as used for parameters and supports of probability distributions. - Adds a `constraints.independent` and replaces `real_vector` with `independent(real, 1)`. (this pattern has long been used in Pyro) - Adds an `.event_dim` attribute to all constraints. - Tests that `constraint.check(data)` has the correct shape. (Previously the shapes were incorrect). - Adds machinery to set static `.is_discrete` and `.event_dim` for `constraints.dependent`. - Fixes constraints for a number of distributions. ## Tested - added a new check to the constraints tests - added a new check for `.event_dim` cc fehiepsi feynmanliang stefanwebb Pull Request resolved: https://github.com/pytorch/pytorch/pull/50547 Reviewed By: VitalyFedyunin Differential Revision: D25918330 Pulled By: neerajprad fbshipit-source-id: a648c3de3e8704f70f445c0f1c39f2593c8c74db	2021-01-21 18:42:45 -08:00
James Reed	5016637955	[FX] Update overview docstring (#50896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50896 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D26002067 Pulled By: jamesr66a fbshipit-source-id: 3b4d4b96017d16739a31f25a306f55b6f96324dc	2021-01-21 17:31:54 -08:00
Jagadish Krishnamoorthy	eb0fe70680	[distributed_test]Enable disabled ROCm tests. (#50421 ) Summary: Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/50421 Reviewed By: ejguan Differential Revision: D26006844 Pulled By: zhaojuanmao fbshipit-source-id: aa6ac5ee2d37f354d52328c72eb2cd23f5665f53	2021-01-21 17:22:40 -08:00
Bert Maher	aa3c28a29e	[static runtime] Shortcut resize_({0}) Summary: We do a lot of resize_({0}) to force `out` operators to properly resize their results, and `resize_` does a fair bit of extraneous work (e.g. trip through dispatch, checks for memory_format and named tensors, etc.). If we strip it down to the bare minimum it's just setting the sizes to 0, so lets do that directly. Test Plan: Perf results suggest maybe a 1% win: ``` batch 20: P163138256 (large win, 1.7%, mostly in fb_fc_out) batch 1: P163139591 (smaller win, 0.88%, mostly in resize_) ``` Reviewed By: swolchok Differential Revision: D25932595 fbshipit-source-id: d306a0a15c0e1be12fde4a7f149e3ed35665e3c0	2021-01-21 17:08:47 -08:00
Jane Xu	8e9ed27a53	install magma for cuda 11.2 in conda (#50559 ) Summary: This PR allows us to start adding CUDA 11.2 Linux tests onto CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50559 Reviewed By: ejguan Differential Revision: D26007595 Pulled By: janeyx99 fbshipit-source-id: e179dbe54e9390899d556dd201a1a179b2399d20	2021-01-21 15:44:39 -08:00
neginraoof	137f2a385a	[ONNX] Handle sequence output for models (#50599 ) Summary: Duplicate of https://github.com/pytorch/pytorch/issues/46542 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50599 Reviewed By: SplitInfinity Differential Revision: D25928897 Pulled By: bzinodev fbshipit-source-id: a898cef7b2d15a287aedd9798ce1423cebf378d4	2021-01-21 15:36:41 -08:00
Kurt Mohler	c082e2184d	Add autograd tests for complex matrix norm nuclear and +/-2 (#50746 ) Summary: Also upgrades `linalg.norm`'s autograd and jit tests to `OpInfo` Fixes https://github.com/pytorch/pytorch/issues/48842 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50746 Reviewed By: mruberry Differential Revision: D25968246 Pulled By: anjali411 fbshipit-source-id: d457069ddb4caf2a5caed1aa64c791ef0790952c	2021-01-21 15:33:08 -08:00
Facebook Community Bot	201f0c1fdf	Automated submodule update: tensorpipe (#50895 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `ee15f7a7c5` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50895 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: ejguan Differential Revision: D26001623 fbshipit-source-id: 680d182ba5a6ce1d9cb2467136e8b27fe8266d0f	2021-01-21 15:28:22 -08:00
Dan Fan	3cd8ed972a	add and adjust kernel launch checks under fbcode/caffe2/caffe2/utils (#50862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50862 add all missing kernal launch check for all cu and cuh files under caffe2/caffe2/utils Test Plan: building ```buck build //caffe2/caffe2:``` gives no error Tests all pass ```buck test //caffe2/caffe2:``` check using the check to ensure there is no show under `fbcode/caffe2/caffe2/utils` the PR on github shows all tests are passed https://github.com/pytorch/pytorch/actions/runs/500036434 Reviewed By: r-barnes Differential Revision: D25987367 fbshipit-source-id: 52add63a14f2da855c784ab24468f64056c93836	2021-01-21 15:20:55 -08:00
Richard Zou	16691516a5	Add batched grad testing to OpInfo (#50818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50818 This PR does two things: 1. Add batched grad testing to OpInfo 2. Improve the error message from `gradcheck` if batched gradient computation fails to include suggestions for workarounds. To add batched grad testing to OpInfo, this PR: - adds new `check_batched_grad=True` and `check_batched_gradgrad=True` attributes to OpInfo. These are True by default because we expect most operators to support batched gradient computation. - If `check_batched_grad=True`, then `test_fn_grad` invokes gradcheck with `check_batched_grad=True`. - If `check_batched_gradgrad=True`, then `test_fn_gradgradgrad` invokes gradgradcheck with `check_batched_grad=True`. The improved gradcheck error message looks like the following when an exception is thrown while computing batched gradients: https://gist.github.com/zou3519/5a0f46f908ba036259ca5e3752fd642f Future - Sometime in the not-near future, we will separate out "batched grad testing" from "gradcheck" for the purposes of OpInfo to make the testing more granular and also so that we can test that the vmap fallback doesn't get invoked (currently batched gradient testing only tests that the output values are correct). Test Plan: - run tests `pytest test/test_ops.py -v -k "Gradients"` Reviewed By: ejguan Differential Revision: D25997703 Pulled By: zou3519 fbshipit-source-id: 6d2d444d6348ae6cdc24c32c6c0622bd67b9eb7b	2021-01-21 15:13:06 -08:00
Ilia Cherniavskii	1cce4c5eee	Update Kineto revision (#50855 ) Summary: Update Kineto revision Pull Request resolved: https://github.com/pytorch/pytorch/pull/50855 Test Plan: build with USE_KINETO=1 test/test_profiler.py Reviewed By: gdankel Differential Revision: D25987298 Pulled By: ilia-cher fbshipit-source-id: d3f22832df74b2d14c338715e601f6f4bae85d6a	2021-01-21 14:34:57 -08:00
Richard Zou	884fb48794	Miscellaneous batched grad testing (#50738 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50738 This PR adds batched grad testing for: - test_linalg.py - test_unary_ufuncs.py Future: - add batched grad testing for test_nn - enable option for batched grad testing in OpInfo Test Plan: - run tests Reviewed By: ejguan Differential Revision: D25997678 Pulled By: zou3519 fbshipit-source-id: 9a9f6694c041580061bd52b5e45661c872b0b761	2021-01-21 14:26:46 -08:00
Bert Maher	8ede828df7	[te] Speed up relu on cpu Summary: We were implementing it using ifThenElse, which creates conditional branches that complicate llvm's vectorization. Using CompareSelect directly yields clean vectorized code with nothing but vmovups and vmaxps. Test Plan: Trivial benchmark shows 33% speedup on large tensors (256k elements). Reviewed By: eellison Differential Revision: D25986637 fbshipit-source-id: 72dd7776924f73c036d46dca30dff22404d86b82	2021-01-21 14:16:23 -08:00
Ivan Kobzarev	98e2914614	[android] Fix YUV camera image to tensor (#50871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50871 Issue: https://discuss.pytorch.org/t/trouble-with-yuv420-to-float-tensor-conversion/106721/3 Decoding was wrong and the result image had artifacts. Testing: Patch test_app with: [input_tensor_to_bitmap.txt](https://github.com/pytorch/pytorch/files/5847553/input_tensor_to_bitmap.txt) gradle -p android test_app:installMnetLocalCameraDebug -PABI_FILTERS=arm64-v8a Before fix: ![before_yuv_fix](https://user-images.githubusercontent.com/6638825/105317604-63a35980-5b90-11eb-9609-2ed5818130bd.png) After fix: ![after_yuv_fix](https://user-images.githubusercontent.com/6638825/105317643-70c04880-5b90-11eb-88b7-92dd90db8ed2.png) Test Plan: Imported from OSS Reviewed By: fmassa Differential Revision: D25992519 Pulled By: IvanKobzarev fbshipit-source-id: 4a46ed39c1cd70f8987fcc1023520e9659ae5d59	2021-01-21 13:53:57 -08:00
Jerry Zhang	b5242d66b6	[quant][doc] Adding a table comparing eager and fx graph mode (#50413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50413 Test Plan: . Imported from OSS Reviewed By: vkuzo Differential Revision: D25886960 fbshipit-source-id: b99178d3900eedec920dbff28ab956f97be2661a	2021-01-21 13:43:42 -08:00
Natalia Gimelshein	4d169258ef	Revert D25976245: [pytorch][PR] Enable Skipped ROCM Tests in common_nn.py Test Plan: revert-hammer Differential Revision: D25976245 (`24a0272132`) Original commit changeset: 801032534f91 fbshipit-source-id: 561e6d761cb694451d5f87557b4f96f37d19dd90	2021-01-21 13:28:37 -08:00
Horace He	4cca08368b	Adds per-op microbenchmarks for NNC (#50845 ) Summary: Runs through vast majority of primitive ops that exist in NNC and benchmarks them against PyTorch ops on CPU. Dumps out a plot like this. ![nnc](https://user-images.githubusercontent.com/6355099/105247994-a854d380-5b43-11eb-9ac9-1ee779e5ab54.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50845 Reviewed By: ngimel Differential Revision: D25989080 Pulled By: Chillee fbshipit-source-id: 6d6a39eb06b3de9a999993224d5e718537c0c8c4	2021-01-21 13:21:01 -08:00
Ansley Ussery	4ac489091a	Improve call provenance during GraphModule scripting (#50538 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50538 Test Plan: Imported from OSS Reviewed By: pbelevich, SplitInfinity Differential Revision: D25935403 Pulled By: ansley fbshipit-source-id: 2baf5e0ba0fa3918e645fc713a9e80d10bbc84e5	2021-01-21 12:03:19 -08:00
Wanchao Liang	df96344968	[optimizer] refactor AdamW to use functional API (#50411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50411 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D25932776 Pulled By: wanchaol fbshipit-source-id: e8e1696b3390ba7909b36fd0107c58b892520432	2021-01-21 11:00:45 -08:00
Wanchao Liang	ce1781d8db	[optimizer] refactor RMSProp to use functional API (#50410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50410 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D25932779 Pulled By: wanchaol fbshipit-source-id: b0d6007ea83d77e2d70d04681163ea7e4632c5cd	2021-01-21 11:00:41 -08:00
Wanchao Liang	d6fb27ce72	[optimizer] refactor Adadelta to use functional API (#50409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50409 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D25932780 Pulled By: wanchaol fbshipit-source-id: 2fc025f66a0e0863f21689892e19d8a5681f2f2f	2021-01-21 11:00:36 -08:00
Wanchao Liang	a0cf5566d8	[optimizer] refactor SGD to use functional API (#45597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45597 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D25932773 Pulled By: wanchaol fbshipit-source-id: bc5f830d6812f847475b9bdcc67865d9968e3282	2021-01-21 10:57:08 -08:00
Xiaoqiang Zheng	b96a6516a6	Add CPP Full Reduction Benchmarks. (#50193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50193 * Supports aten, native reference implementation, and NNC TE implementations. * Support functionality checks against aten, in addition to performance checks. Test plans: * After enable "BUILD_TENSOREXPR_BENCHMARK" in CMakeLists.txt, * bin/tensorexpr_bench --benchmark_filter=Reduce1D Measurements: On a Broadwell E5-2686 CPU, Reduce1D/Torch/16777216 5638547 ns 5638444 ns 119 BYTES=11.902G/s Reduce1D/Naive/16777216 19308235 ns 19308184 ns 36 BYTES=3.47567G/s Reduce1D/NativeRfactor/16777216 8433348 ns 8433038 ns 85 BYTES=7.95785G/s Reduce1D/NativeVector/16777216 5608836 ns 5608727 ns 124 BYTES=11.9651G/s Reduce1D/NativeTiled/16777216 5550233 ns 5550221 ns 126 BYTES=12.0912G/s Reduce1D/TeNaive/16777216 21451047 ns 21450752 ns 33 BYTES=3.12851G/s Reduce1D/TeSplitTail/16777216 23701732 ns 23701229 ns 30 BYTES=2.83145G/s Reduce1D/TeSplitMask/16777216 23683589 ns 23682978 ns 30 BYTES=2.83363G/s Reduce1D/TeRfactorV2/16777216 5378019 ns 5377909 ns 131 BYTES=12.4786G/s Result summary: * The single-threaded performance with NNC TeRfactorV2 matches and exceeds Aten and avx2 naive counterpart. Follow-up items: * rfactor does not work well with split * We don't have a multi-threaded implementation yet. * Missing "parallel" scheduling primitive, which is not different from what we need for pointwise ops. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25821880 Pulled By: zheng-xq fbshipit-source-id: 8df3f40d1eed8749c8edcaacae5f0544dbf6bed3	2021-01-21 10:00:50 -08:00
Xiaoqiang Zheng	88b36230f5	Add full reduction benchmark. (#50057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50057 As part of the effort to calibrate TE reduction performance, adding a full reduction benchmark. Also add a "skip_input_transformation" option. Fixed other reduction benchmarks to accept specific benchmarks that was listed. Test plans: * python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce_full * python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce_full_fwd_cpu_16777216_s1 * python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce_full_fwd_cpu_16777216_s0 * python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_inner * python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_inner_fwd_cpu_640_524288 * python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_outer * python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_outer_fwd_cpu_640_524288 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25774138 Pulled By: zheng-xq fbshipit-source-id: fd4598e5c29991be476e42235a059e8021d4f083	2021-01-21 09:56:46 -08:00
Arindam Roy	24a0272132	Enable Skipped ROCM Tests in common_nn.py (#50753 ) Summary: Removed test_cuda=(not TEST_WITH_ROCM) in common_nn.py to enable the skipped tests for ROCM. Signed-off-by: Arindam Roy <rarindam@gmail.com> Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/50753 Reviewed By: mrshenli Differential Revision: D25976245 Pulled By: ngimel fbshipit-source-id: 801032534f911d24d231bc9f0d3235a4506412c0	2021-01-21 09:48:47 -08:00
Facebook Community Bot	480bb7d356	Automated submodule update: tensorpipe (#50807 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `9f84778d47` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50807 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D25973473 fbshipit-source-id: 62a9808a6ce5e6c4b51fdf272b687118a8c116b8	2021-01-21 01:23:05 -08:00
Yi Wang	439afda090	[Gradient Compression] Fix warm-start for PowerSGD laywerwise compression (#50283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50283 Realize that for the layerwise compression, the previous warm-start implementation only skips memory allocations, but does not skip filling random values for Qs. Also fix the unit test in distributed_test.py. Previously the process group was not created correctly, and not communication occurred in the test_DistributedDataParallel_powerSGD_ddp_comm_hook. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 120101220 Test Plan: Verified the fix by adding added some loggings locally. Also verified no NE diff on Ads 1x. Reviewed By: rohan-varma Differential Revision: D25846222 fbshipit-source-id: 1ebeeb55ceba64d4d904ea6ac1bb42b1b2241520	2021-01-20 22:31:44 -08:00
James Reed	d0e942f9a7	[FX][docs] Add limitations of symbolic tracing (#50638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50638 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D25933780 Pulled By: jamesr66a fbshipit-source-id: 0aa97ea05203fbcb707b0e947a465e206104b7df	2021-01-20 21:42:16 -08:00
Ansley Ussery	c88eed97c7	Make `split_module` results deterministic (#50470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50470 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D25899130 Pulled By: ansley fbshipit-source-id: 45d63992cbe17eb01f709d02800c2eef1bd2ad08	2021-01-20 21:35:04 -08:00
Jane Xu	4954417163	CONTRIBUTING.md: add instructions on how to remote desktop into Windows CI (#50841 ) Summary: Adds a link to existing instructions in CONTRIBUTING.md, so those instructions are more visible to contributors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50841 Reviewed By: samestep Differential Revision: D25983089 Pulled By: janeyx99 fbshipit-source-id: 0b777ec760765153c607515ab09441dd0cfddf3c	2021-01-20 18:46:56 -08:00
XiaobingSuper	c945a5bb5e	fix typo of quantized README.md (#50681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50681 Reviewed By: ngimel Differential Revision: D25978905 Pulled By: jerryzh168 fbshipit-source-id: e8bff59a7a6b2b6f79273c010c32480db0997e7d	2021-01-20 17:43:25 -08:00
anjali411	7fdc6a27b8	Skip test_variant_consistency_eager_addr_cpu_bfloat16 (#50836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50836 Fixes the broken master Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D25981125 Pulled By: anjali411 fbshipit-source-id: 4043b6a7287700c7c9f0ce703eef53bb666ff655	2021-01-20 16:03:00 -08:00
Sam Estep	c147aa306c	Use doctest directly to get docstring examples (#50596 ) Summary: This PR addresses [a two-year-old TODO in `test/test_type_hints.py`](`12942ea52b/test/test_type_hints.py (L21-L22)`) by replacing most of the body of our custom `get_examples_from_docstring` function with [a function from Python's built-in `doctest.DocTestParser` class](https://docs.python.org/3/library/doctest.html#doctest.DocTestParser.get_examples). This mostly made the parser more strict, catching a few errors in existing doctests: - missing `...` in multiline statements - missing space after `>>>` - unmatched closing parenthesis Also, as shown by [the resulting diff of the untracked `test/generated_type_hints_smoketest.py` file](https://pastebin.com/vC5Wz6M0) (also linked from the test plan below), this introduces a few incidental changes as well: - standalone comments are no longer preserved - indentation is now visually correct - [`example_torch_promote_types`](`4da9ceb743/torch/_torch_docs.py (L6753-L6772)`) is now present - an example called `example_torch_tensor___array_priority__` is added, although I can't tell where it comes from - the last nine lines of code from [`example_torch_tensor_align_as`](`5d45140d68/torch/_tensor_docs.py (L386-L431)`) are now present - the previously-misformatted third line from [`example_torch_tensor_stride`](`5d45140d68/torch/_tensor_docs.py (L3508-L3532)`) is now present Pull Request resolved: https://github.com/pytorch/pytorch/pull/50596 Test Plan: Checkout the base commit, typecheck the doctests, and save the generated file: ``` $ python test/test_type_hints.py TestTypeHints.test_doc_examples $ cp test/generated_type_hints_smoketest.py /tmp ``` Then checkout this PR, do the same thing, and compare: ``` $ python test/test_type_hints.py TestTypeHints.test_doc_examples $ git diff --no-index {/tmp,test}/generated_type_hints_smoketest.py ``` The test should succeed, and the diff should match [this paste](https://pastebin.com/vC5Wz6M0). Reviewed By: walterddr Differential Revision: D25926245 Pulled By: samestep fbshipit-source-id: 23bc379ff438420e556263c19582dba06d8e42ec	2021-01-20 15:55:36 -08:00
Alex Suhan	1bde5a216f	[TensorExpr] Use wider type for scalars (#50774 ) Summary: Scalars have to be double / 64-bit integers to match eager semantics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50774 Test Plan: python test/test_jit_fuser_te.py -k TestTEFuser.test_clamp Reviewed By: ngimel Differential Revision: D25978214 Pulled By: asuhan fbshipit-source-id: ba765b7d215239f2bf0f3d467e4dce876f7ccb91	2021-01-20 15:12:27 -08:00
Jiakai Liu	24fd84313f	[pytorch] fix ConstRefCType usage in codegen/api/native.py (#50742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50742 Fixed the other usage of `BaseCType('const ...&)` on #49138. Checked byte-for-byte compatibility of the codegen output. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25955565 Pulled By: ljk53 fbshipit-source-id: 83ebd6b039892b805444867ed97a6e2fa6e72225	2021-01-20 15:01:37 -08:00
Xiang Gao	44922f26f5	Add support for NCCL alltoall (#44374 ) Summary: In https://github.com/pytorch/pytorch/issues/42514, NCCL `alltoall_single` is already added. This PR adds NCCL `alltoall`. The difference between `alltoall_single` and `alltoall` is: `alltoall_single` works on a single tensor and send/receive slices of that tensor, while `alltoall` works on a list of tensor, and send/receive tensors in that list. cc: ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/44374 Reviewed By: zhangguanheng66, mrshenli Differential Revision: D24455427 Pulled By: srinivas212 fbshipit-source-id: 42fdebdd14f8340098e2c34ef645bd40603552b1	2021-01-20 14:57:12 -08:00
Benjamin Lefaudeux	87fb3707d9	ZeroRedundancyOptimizer: an implementation of a standalone sharded optimizer wrapper (#46750 ) Summary: Implement the first stage of ZeRO, sharding of the optimizer state, as described in [this blog post](https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/) and [this paper](https://arxiv.org/abs/1910.02054). This implementation is completely independent from the [DeepSpeed](https://github.com/microsoft/DeepSpeed) framework, and aims at providing ZeRO-compliant building blocks within the PyTorch scheme of things. This works by: - acting as a wrapper to a pytorch optimizer. ZeROptimizer does not optimize anything by itself, it only shards optimizers for distributed jobs - each rank distributes parameters according to a given partitioning scheme (could be updated), and owns the update of a given shard only - the .step() is called on each rank as expected, the fact that the optimizer actually works on a shard of the model is not visible from the outside - when the update is completed, each rank broadcasts the updated model shard to all the other ranks This can be used with DDP, although some communications are wasted in that case (gradients are all-reduced to all ranks). This implementation was initially developed in [Fairscale](https://github.com/facebookresearch/fairscale), and can also be used with an optimized DDP which only reduces to the relevant ranks. More context on ZeRO and PyTorch can be found in [this RFC](https://github.com/pytorch/pytorch/issues/42849) The API with respect to loading and saving the state is a known pain point and should probably be discussed an updated. Other possible follow ups include integrating more closely to a [modularized DDP](https://github.com/pytorch/pytorch/issues/37002), [making the checkpoints partition-agnostic](https://github.com/facebookresearch/fairscale/issues/164), [exposing a gradient clipping option](https://github.com/facebookresearch/fairscale/issues/98) and making sure that mixed precision states are properly handled. original authors include msbaines, min-xu-ai and myself Pull Request resolved: https://github.com/pytorch/pytorch/pull/46750 Reviewed By: mruberry Differential Revision: D25958918 Pulled By: blefaudeux fbshipit-source-id: 14280f2fd90cf251eee8ef9ac0f1fa6025ae9c50	2021-01-20 14:36:16 -08:00
Ailing Zhang	c3e3e60657	Add cloud-tpu-client to xla CI. (#50823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50823 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D25976931 Pulled By: ailzhang fbshipit-source-id: f29c24c232944a103b59d9fea9b1c19969a7821b	2021-01-20 13:44:49 -08:00
Pritam Damania	be7e9845a1	Remove gtest_prod.h from TP agent. (#50766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50766 This header breaks certain builds since it causes PyTorch to depend on gtest. ghstack-source-id: 119991167 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D25960810 fbshipit-source-id: ceaaad499f6f363ef35c6623475ae8f191d86171	2021-01-20 13:15:48 -08:00
Vasiliy Kuznetsov	ac8e90fa6d	quantization: Linear + BatchNorm1d fusion (#50748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50748 Adds support for Linear + BatchNorm1d fusion to quantization. This is a redo of dreiss's https://github.com/pytorch/pytorch/pull/37467, faster to copy-paste it than rebase and deal with conflicts. Test Plan: ``` python test/test_quantization.py TestFusion.test_fusion_linear_bn_eval ``` Imported from OSS Reviewed By: supriyar Differential Revision: D25957432 fbshipit-source-id: 24e5b760f70186aa953ef65ab0182770e89495e4	2021-01-20 12:59:02 -08:00
Xiao Wang	db86dd8ad7	Fix replication_pad for cuda launch configuration (#50565 ) Summary: Fix https://github.com/pytorch/pytorch/issues/49601 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50565 Reviewed By: mruberry Differential Revision: D25968843 Pulled By: ngimel fbshipit-source-id: 6d2d543132b501765e69b52caaa283fb816db276	2021-01-20 11:52:12 -08:00
Hameer Abbasi	cf1882adeb	Fix indexing for overrides. (#49324 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46277 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49324 Reviewed By: mruberry Differential Revision: D25959334 Pulled By: ezyang fbshipit-source-id: bac48b8ffee89d10aa04c004de2b53b4e54a96c2	2021-01-20 11:34:02 -08:00
Kyle Chen	16faabe7f0	[ROCm] re-enable tests (#50691 ) Summary: Signed-off-by: Kyle Chen <kylechen@amd.com> cc: jeffdaily re-enable test_torch.py and test_unary_ufuncs.py tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/50691 Reviewed By: mruberry Differential Revision: D25967842 Pulled By: ngimel fbshipit-source-id: dc0f6cb68fe4d151c2719bdf67ead96e1396acf2	2021-01-20 11:23:39 -08:00
Tugsbayasgalan Manlaibaatar	fbf7eec86d	Update JIT_OPT macro for easier use (#50602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50602 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25931371 fbshipit-source-id: cf6bc58c419a1dc0018639596b304a3a05e38360	2021-01-20 11:15:20 -08:00
Luca Wehrstedt	112a583467	Enable TensorPipe's CMA channel (#50759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50759 ghstack-source-id: 120032288 Test Plan: Exported to CircleCI and tested Reviewed By: mrshenli Differential Revision: D25959326 fbshipit-source-id: be6df209ff3a79a8961acbda64ee7805a5c434a9	2021-01-20 10:53:47 -08:00
Stephen Jia	c18403a693	[metal] Use MPSCNN kernels for binary elementwise ops Summary: Previously, binary elementwise kernels such as add, sub, and mul were implemented with custom shaders. However, MPSCNN has kernels for these operations for iOS >=11.3. Update these ops to use MPSCNN kernels instead of shader implementations. Test Plan: Test on device: `arc focus2 pp-ios` Test on mac `buck test pp-macos` Reviewed By: xta0 Differential Revision: D25953986 fbshipit-source-id: 3acac3fa7dbe70f92572c21c0f0cfcdedfcdcf23	2021-01-20 10:41:35 -08:00
Scott Wolchok	1e0809dbf9	[PyTorch] Remove CAFFE2_FB_LIMITED_MOBILE_CAPABILITY (#50385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50385 We no longer use this flag internally, and it's not referenced externally either, so let's clean up. ghstack-source-id: 119676743 Test Plan: CI Reviewed By: ezyang Differential Revision: D25852220 fbshipit-source-id: a4427edff6cbb241340f9f6ae6db4e74832949c2	2021-01-20 10:26:54 -08:00
Xiang Gao	4f3cdd971c	Fix test_dispatch.py when running with TORCH_SHOW_CPP_STACKTRACES=1 (#50509 ) Summary: `test_dispatch.py` has many asserts about the error message. When running with `TORCH_SHOW_CPP_STACKTRACES=1`, the error message is different from when `TORCH_SHOW_CPP_STACKTRACES=0`, which makes many tests in `test_dispatch.py` fail. This PR fixes these failures when running with `TORCH_SHOW_CPP_STACKTRACES=1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50509 Reviewed By: ngimel Differential Revision: D25956853 Pulled By: ezyang fbshipit-source-id: 3b3696742a7dfb8f52f23a364838ec96945c5662	2021-01-20 10:15:01 -08:00
Lillian Johnson	f1c578594b	JIT Testing: Improve assertAutodiffNode error message (#50626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50626 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25932184 Pulled By: Lilyjjo fbshipit-source-id: 6fa5a652eb1a0c10bb9d9040b9a708fdf93aaf46	2021-01-20 10:05:52 -08:00
anjali411	1cc8f8a750	Add complex autograd support and OpInfo based test for torch.addr (#50667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50667 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25957584 Pulled By: anjali411 fbshipit-source-id: a6b2880971027389721f4e051009b7d9694f979b	2021-01-20 09:43:13 -08:00
Eli Uriegas	66adfcd258	tools: Move sha check to else statement (#50773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50773 Moves the sha check for version generation to the else clause since it was causing issues for users building pytorch when the .git directory was not present and PYTORCH_BUILD_VERSION was already set Test Plan: CI Closes https://github.com/pytorch/pytorch/issues/50730 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Reviewed By: janeyx99 Differential Revision: D25963486 Pulled By: seemethere fbshipit-source-id: ce1b315f878d074f2ffb6b658d59cbd13150f27f	2021-01-20 09:34:43 -08:00
Mark Final	e1bb476980	Issue #48724 . Only set the CMake IMPORTED_LOCATION property in static… (#49173 ) Summary: … library builds, as it is already set in shared library builds from the target that was imported from Caffe2. This was identified on Windows builds when PyTorch was built in shared Release mode, and a testapp was built with RelWithDebInfo in CMake. The problem appeared to be that because IMPORTED_LOCATION (in TorchConfig.cmake) and IMPORTED_LOCATION_RELEASE were both set (in Caffe2Targets.cmake), there occurred some confusion in the build as to what was correct. The symptoms are the error: ninja: error: 'torch-NOTFOUND', needed by 'test_pytorch.exe', missing and no known rule to make it in a noddy consuming test application. Fixes https://github.com/pytorch/pytorch/issues/48724 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49173 Reviewed By: malfet Differential Revision: D25974151 Pulled By: ezyang fbshipit-source-id: 3454c0d29cbbe7a37608beedaae3efbb624b0479	2021-01-20 09:23:27 -08:00
Lillian Johnson	22902b9242	[WIP] JIT Static Hooks: cpp tests (#49547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49547 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D25771118 Pulled By: Lilyjjo fbshipit-source-id: cd8a58ff008a1c5d65ccbfbcbcb0214781ece16f	2021-01-20 09:12:57 -08:00
Lillian Johnson	3b88e1b0e7	[WIP] JIT Static Hooks: python tests (#49546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49546 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D25771119 Pulled By: Lilyjjo fbshipit-source-id: bf8a8e20f790691d3ff58fa9c8d0d9ab3e8322c4	2021-01-20 09:12:53 -08:00
Lillian Johnson	0eb41e67fe	[WIP] JIT Static Hooks: serialization logic (#49545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49545 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D25771121 Pulled By: Lilyjjo fbshipit-source-id: fe08936d601618010b9c64e2bb769e0b67cb7187	2021-01-20 09:12:49 -08:00
Lillian Johnson	9c49457233	[WIP] JIT Static Hooks: schema checking logic (#49975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49975 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D25771120 Pulled By: Lilyjjo fbshipit-source-id: 262892cec45b6894bd8c0c20b9cfee43065abc7c	2021-01-20 09:12:45 -08:00
Lillian Johnson	a722d28ef0	[WIP] JIT Static Hooks: adding hooks to class type and adding logic for hook running/compilation (#49544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49544 Implementation of design laid out in: https://fb.quip.com/MY9gAqlroo0Z Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D25771122 Pulled By: Lilyjjo fbshipit-source-id: dc4a8461f71c58ae75144ca1477cd1c0e9f0f325	2021-01-20 09:09:30 -08:00
Shen Li	1f5c3b3aae	Revert D25958987: [pytorch][PR] Add type annotations to torch.overrides Test Plan: revert-hammer Differential Revision: D25958987 (`2ace4fc01e`) Original commit changeset: aadc065c489b fbshipit-source-id: efd8b7c3cbe03d5ab0afa0d7c695182623285a3a	2021-01-20 08:59:44 -08:00
chengjun	4a8ef4525e	Add new backend type for Intel heterogeneous computation platform. (#49786 ) Summary: Add a new device type 'XPU' ('xpu' for lower case) to PyTorch. Changes are needed for code related to device model and kernel dispatch, e.g. DeviceType, Backend and DispatchKey etc. https://github.com/pytorch/pytorch/issues/48246 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49786 Reviewed By: mrshenli Differential Revision: D25893962 Pulled By: ezyang fbshipit-source-id: 7ff0a316ee34cf0ed6fc7ead08ecdeb7df4b0052	2021-01-20 08:15:18 -08:00
Shen Li	a3b8cbcdfc	Let TensorPipe detect peer access (#50676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50676 Test Plan: Imported from OSS Reviewed By: beauby Differential Revision: D25941962 Pulled By: mrshenli fbshipit-source-id: 7d4fd3b4fbd5ae5a0c50ad65605ced9db10ede4a	2021-01-20 08:04:51 -08:00
kiyosora	4803eaf502	Implement NumPy-like function torch.fmax() & torch.fmin() (#49312 ) Summary: - Implementing the NumPy-like function`torch.fmax()` and `torch.fmin()` recommended in https://github.com/pytorch/pytorch/issues/48440 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49312 Reviewed By: izdeby Differential Revision: D25887246 Pulled By: heitorschueroff fbshipit-source-id: d762eeff8b328bfcbe7d48b7ee9d2da72c249691	2021-01-20 06:45:25 -08:00
Guilherme Leobas	2ace4fc01e	Add type annotations to torch.overrides (#48493 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48493 Reviewed By: mruberry Differential Revision: D25958987 Pulled By: ezyang fbshipit-source-id: aadc065c489bf1a8c6258de14c930e396df763bc	2021-01-20 06:32:22 -08:00
Meghan Lele	4aea007351	[JIT] Fix archive file extension in examples and docs (#50649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50649 Summary Tutorials, documentation and comments are not consistent with the file extension they use for JIT archives. This commit modifies certain instances of `.pth` in `torch.jit.save` calls with `.pt`. Test Plan Continuous integration. Fixes This commit fixes #49660. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25961628 Pulled By: SplitInfinity fbshipit-source-id: a40c97954adc7c255569fcec1f389aa78f026d47	2021-01-20 02:04:46 -08:00
Supriya Rao	e00966501b	[quant] Add non-fbgemm fallback implementation for embedding lookup ops (#50706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50706 Add a default CPU implementation for quantized embedding lookup operators. This should enable the ops to execute on mobile as well where we don't have fbgemm. Test Plan: python test/test_quantization.py and CI tests Imported from OSS Reviewed By: vkuzo Differential Revision: D25956842 fbshipit-source-id: 07694888e5e1423b496af1a51494a49558e82152	2021-01-19 23:56:26 -08:00
James Reed	5205cc1c62	[FX] Fix NoneType annotation in generated code (#50777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50777 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D25966026 Pulled By: jamesr66a fbshipit-source-id: 8e36521eee03eade7e1b602e801229c085b03488	2021-01-19 23:16:58 -08:00
Meghan Lele	8f5ad00e13	[JIT] Print out CU address in `ClassType::repr_str()` (#50194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50194 Summary `ClassType::repr_str()` prints out only the name of a `ClassType`, which is not always enough to disambiguate it. In some situations, two `ClassTypes` are compared and do not match despite having identical names because they are in separate compilation units. In such cases, the error message can seem nonsensical (e.g. `expected type T but found type T`). This commit modifies `ClassType::repr_str()` so that it prints out the address of the type's compilation unit to make these messages less puzzling (e.g. `expected type T (0x239023) but found type T (0x230223)`). Test Plan This commit adds a unit test, `ClassTypeTest.IdenticalTypesDifferentCus` that reproduces this situation. Fixes This commit fixes #46212. Test Plan: Imported from OSS Reviewed By: tugsbayasgalan Differential Revision: D25933082 Pulled By: SplitInfinity fbshipit-source-id: ec71b6728be816edd6a9c2b2d5075ead98d8bc88	2021-01-19 23:04:30 -08:00
Marat Subkhankulov	dea9af5c06	Cat benchmark: use mobile feed tensor shapes and torch.cat out-variant (#50778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50778 - use tensor shapes from ctr_mobilefeed merge net - use pt cat out-variant for a fairer comparison otherwise benchmark includes time to construct result tensor Test Plan: turbo off, devbig machine ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 buck-out/gen/caffe2/benchmarks/operator_benchmark/c2/concat_test.par --tag_filter=static_runtime ``` ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : static_runtime # Benchmarking Caffe2: concat # Name: concat_sizes(1,40)_N5_axis1_add_axis0_devicecpu_dtypefloat # Input: sizes: (1, 40), N: 5, axis: 1, add_axis: 0, device: cpu, dtype: float Forward Execution Time (us) : 0.619 # Benchmarking Caffe2: concat # Name: concat_sizes[(1,160),(1,14)]_N-1_axis1_add_axis0_devicecpu_dtypefloat # Input: sizes: [(1, 160), (1, 14)], N: -1, axis: 1, add_axis: 0, device: cpu, dtype: float Forward Execution Time (us) : 0.369 # Benchmarking Caffe2: concat # Name: concat_sizes[(1,20,40),(1,4,40),(1,5,40)]_N-1_axis1_add_axis0_devicecpu_dtypefloat # Input: sizes: [(1, 20, 40), (1, 4, 40), (1, 5, 40)], N: -1, axis: 1, add_axis: 0, device: cpu, dtype: float Forward Execution Time (us) : 0.590 # Benchmarking Caffe2: concat # Name: concat_sizes[(1,580),(1,174)]_N-1_axis1_add_axis0_devicecpu_dtypefloat # Input: sizes: [(1, 580), (1, 174)], N: -1, axis: 1, add_axis: 0, device: cpu, dtype: float Forward Execution Time (us) : 0.412 # Benchmarking Caffe2: concat # Name: concat_sizes(20,40)_N5_axis1_add_axis0_devicecpu_dtypefloat # Input: sizes: (20, 40), N: 5, axis: 1, add_axis: 0, device: cpu, dtype: float Forward Execution Time (us) : 2.464 # Benchmarking Caffe2: concat # Name: concat_sizes[(20,160),(20,14)]_N-1_axis1_add_axis0_devicecpu_dtypefloat # Input: sizes: [(20, 160), (20, 14)], N: -1, axis: 1, add_axis: 0, device: cpu, dtype: float Forward Execution Time (us) : 1.652 # Benchmarking Caffe2: concat # Name: concat_sizes[(20,20,40),(20,4,40),(20,5,40)]_N-1_axis1_add_axis0_devicecpu_dtypefloat # Input: sizes: [(20, 20, 40), (20, 4, 40), (20, 5, 40)], N: -1, axis: 1, add_axis: 0, device: cpu, dtype: float Forward Execution Time (us) : 9.312 # Benchmarking Caffe2: concat # Name: concat_sizes[(20,580),(20,174)]_N-1_axis1_add_axis0_devicecpu_dtypefloat # Input: sizes: [(20, 580), (20, 174)], N: -1, axis: 1, add_axis: 0, device: cpu, dtype: float Forward Execution Time (us) : 6.532 ``` ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 buck-out/gen/caffe2/benchmarks/operator_benchmark/pt/cat_test.par --tag_filter=static_runtime ``` ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : static_runtime # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[(1,160),(1,14)]_N-1_dim1_cpu # Input: sizes: [(1, 160), (1, 14)], N: -1, dim: 1, device: cpu Forward Execution Time (us) : 3.313 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[(1,20,40),(1,4,40),(1,5,40)]_N-1_dim1_cpu # Input: sizes: [(1, 20, 40), (1, 4, 40), (1, 5, 40)], N: -1, dim: 1, device: cpu Forward Execution Time (us) : 3.680 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[(1,580),(1,174)]_N-1_dim1_cpu # Input: sizes: [(1, 580), (1, 174)], N: -1, dim: 1, device: cpu Forward Execution Time (us) : 3.452 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[(20,160),(20,14)]_N-1_dim1_cpu # Input: sizes: [(20, 160), (20, 14)], N: -1, dim: 1, device: cpu Forward Execution Time (us) : 4.653 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[(20,20,40),(20,4,40),(20,5,40)]_N-1_dim1_cpu # Input: sizes: [(20, 20, 40), (20, 4, 40), (20, 5, 40)], N: -1, dim: 1, device: cpu Forward Execution Time (us) : 7.364 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[(20,580),(20,174)]_N-1_dim1_cpu # Input: sizes: [(20, 580), (20, 174)], N: -1, dim: 1, device: cpu Forward Execution Time (us) : 7.055 ``` Reviewed By: hlu1 Differential Revision: D25839036 fbshipit-source-id: 7a6a234f41dfcc56246a80141fe0c84f769a5a85	2021-01-19 22:50:28 -08:00
Richard Barnes	06c734d8c7	Generalize `sum_intlist` and `prod_intlist`, clean up dimensionality functions (#50495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50495 Test Plan: ``` buck test mode/opt //caffe2/c10:c10_test_0 ``` Reviewed By: ngimel Differential Revision: D25902853 fbshipit-source-id: a7d30251ca443df57dd8005ed77dba7b2f1002d4	2021-01-19 22:35:55 -08:00
Sebastian Messmer	47c57b8836	Fix Native signature for optional Tensor arguments (#50767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50767 The native signature for optional tensor arguments wrongly produced "Tensor" instead of "optional<Tensor>". We didn't notice this because all internal ops currently use hacky_wrapper, and for hacky_wrapper, "Tensor" is correct. This PR fixes that and ports one op (batch_norm) to not use hacky_wrapper anymore as a proof of fix. ghstack-source-id: 120017543 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D25960941 fbshipit-source-id: ca6fe133109b5d85cff52390792cf552f12d9590	2021-01-19 21:55:46 -08:00
Nikita Shulga	cebab83d3f	Fix USE_MKLDN defaults (#50782 ) Summary: Fixes regression introduced by https://github.com/pytorch/pytorch/pull/50400 `cmake_dependent_option` semantic is following (see https://cmake.org/cmake/help/v3.19/module/CMakeDependentOption.html); `cmake_dependent_option(<option> "<help_text>" <value> <depends> <force>)` I.e. depends should be true for CPU_INTEL or CPU_AARCH64 but default value should be ON only if CPU_INTEL is true Pull Request resolved: https://github.com/pytorch/pytorch/pull/50782 Reviewed By: xuzhao9 Differential Revision: D25966509 Pulled By: malfet fbshipit-source-id: c891cd9234311875762403f7125d0c3803bb0e65	2021-01-19 21:41:53 -08:00
Himangshu	4ff1823fac	Add Sparse support for torch.sqrt (#50088 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/50088 Reviewed By: mrshenli Differential Revision: D25894003 Pulled By: ezyang fbshipit-source-id: 93688c33b2f9a355c331d6edb3e402935223f75b	2021-01-19 20:19:07 -08:00
James Reed	38c45bdd2d	[FX] Fix tracing a free function with embedded constant (#50639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50639 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D25934142 Pulled By: jamesr66a fbshipit-source-id: de9053d4f92a7a2f4f573378837ff5ae78e539b1	2021-01-19 19:20:34 -08:00
Xinyu Li	7526e38cd3	Revert "Stable sort for CPU (#50052 )" (#50752 ) Summary: This reverts commit c99f35605105f7366bcf4709df534da3ceab9a15. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50752 Reviewed By: zou3519 Differential Revision: D25958146 Pulled By: glaringlee fbshipit-source-id: f4068d038f9bd337bac8b673eaeb46a4646f6c77	2021-01-19 18:21:25 -08:00
Facebook Community Bot	08c90d9e55	Automated submodule update: tensorpipe (#50765 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `6c8ed2e6f7` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50765 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: mrshenli Differential Revision: D25960813 fbshipit-source-id: 80b4e48ef04f22f750a2eb049f5f7114715c0a1e	2021-01-19 17:29:00 -08:00
Jithun Nair	327539ca79	Fix bug in hipify if include_dirs is not specified in setup.py (#50703 ) Summary: Bugs: 1) would introduce -I* in compile commands 2) wouldn't hipify source code directly in build_dir, only one level down or more Pull Request resolved: https://github.com/pytorch/pytorch/pull/50703 Reviewed By: mrshenli Differential Revision: D25949070 Pulled By: ngimel fbshipit-source-id: 018c2a056b68019a922e20e5db2eb8435ad147fe	2021-01-19 16:30:17 -08:00
Nikolay Korovaiko	526659db20	whitelist ops we can build shapes for (#49125 ) Summary: Whitelist ops we can build shapes for. Otherwise, `buildShapeExpressions` assumes that `aten::unsqueeze` is just a regular op. ``` [DUMP tensorexpr_fuser.cpp:329] buildShapeExpressions for [DUMP tensorexpr_fuser.cpp:329] graph(%1 : float, [DUMP tensorexpr_fuser.cpp:329] %3 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0), [DUMP tensorexpr_fuser.cpp:329] %8 : float, [DUMP tensorexpr_fuser.cpp:329] %10 : Float(50, strides=[1], requires_grad=0, device=cuda:0)): [DUMP tensorexpr_fuser.cpp:329] %11 : int = prim::Constant[value=1]() [DUMP tensorexpr_fuser.cpp:329] %12 : Float(50, 1, strides=[1, 1], requires_grad=0, device=cuda:0) = aten::unsqueeze(%10, %11) [DUMP tensorexpr_fuser.cpp:329] %9 : Float(50, 1, strides=[1, 1], requires_grad=0, device=cuda:0) = aten::mul(%12, %8) [DUMP tensorexpr_fuser.cpp:329] %6 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0) = aten::add(%3, %9, %11) [DUMP tensorexpr_fuser.cpp:329] %2 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0) = aten::div(%6, %1) [DUMP tensorexpr_fuser.cpp:329] return (%2, %6, %9) [DEBUG tensorexpr_fuser.cpp:347] Adding a mapping for %3 %162 : int[] = aten::size(%27) [DEBUG tensorexpr_fuser.cpp:347] Adding a mapping for %10 %163 : int[] = aten::size(%23) [DEBUG tensorexpr_fuser.cpp:402] Building sizes for %12 : Float(50, 1, strides=[1, 1], requires_grad=0, device=cuda:0) = aten::unsqueeze(%10, %11) [DEBUG tensorexpr_fuser.cpp:405] Getting aten::size for %10 [DEBUG tensorexpr_fuser.cpp:402] Building sizes for %9 : Float(50, 1, strides=[1, 1], requires_grad=0, device=cuda:0) = aten::mul(%12, %8) [DEBUG tensorexpr_fuser.cpp:405] Getting aten::size for %12 [DEBUG tensorexpr_fuser.cpp:402] Building sizes for %6 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0) = aten::add(%3, %9, %11) [DEBUG tensorexpr_fuser.cpp:405] Getting aten::size for %3 [DEBUG tensorexpr_fuser.cpp:405] Getting aten::size for %9 [DEBUG tensorexpr_fuser.cpp:402] Building sizes for %2 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0) = aten::div(%6, %1) [DEBUG tensorexpr_fuser.cpp:405] Getting aten::size for %6 [DEBUG tensorexpr_fuser.cpp:907] Inserting a typecheck guard for a node%156 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0) = prim::TensorExprGroup[Subgraph=<Graph>](%3, %27, %16, %23) [DUMP tensorexpr_fuser.cpp:463] After guarding fusion groups: ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49125 Reviewed By: albanD Differential Revision: D25926997 Pulled By: Krovatkin fbshipit-source-id: f8041bbfc12be16c329754c6d16911d12aa352ef	2021-01-19 16:17:21 -08:00
zfenice	4816bf62d6	Fix nvcc function signature causing assert in TypeIndex.h (#49778 ) Summary: Adding NVCC function signature to fully_qualified_type_name_impl() Fixes https://github.com/pytorch/pytorch/issues/48568 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49778 Reviewed By: albanD Differential Revision: D25848006 Pulled By: ezyang fbshipit-source-id: 5afa73ecbb1a3f3b7b68a69b2dcdc27ad38dc44d	2021-01-19 15:25:32 -08:00
Guilherme Leobas	a9e46f1413	add type annotations to torch.nn.modules.container (#48969 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48969 Reviewed By: mrshenli Differential Revision: D25728987 Pulled By: walterddr fbshipit-source-id: 02c3aa2078f4ed6cc6edd90ffe1177d789c328a9	2021-01-19 15:12:17 -08:00
peter	a1b1d0cdc0	Better split of the windows test jobs (#50660 ) Summary: See discussion in https://github.com/pytorch/pytorch/pull/50320#discussion_r554447365. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50660 Reviewed By: xuzhao9, samestep Differential Revision: D25959021 Pulled By: seemethere fbshipit-source-id: 7623bddc09e7d55208b8a1af4b5a23fba2cdeb14	2021-01-19 15:07:33 -08:00
Rong Rong (AI Infra)	ebd142e94b	initial commit to enable fast_nvcc (#49773 ) Summary: draft enable fast_nvcc. * cleaned up some non-standard usages * added fall-back to wrap_nvcc Pull Request resolved: https://github.com/pytorch/pytorch/pull/49773 Test Plan: Configuration to enable fast nvcc: - install and enable `ccache` but delete `.ccache/` folder before each build. - `TORCH_CUDA_ARCH_LIST=6.0;6.1;6.2;7.0;7.5` - Toggling `USE_FAST_NVCC=ON/OFF` cmake config and run `cmake --build` to verify the build time. Initial statistic for a full compilation: * `cmake --build . -- -j $(nproc)`: - fast NVCC ``` real 48m55.706s user 1559m14.218s sys 318m41.138s ``` - normal NVCC: ``` real 43m38.723s user 1470m28.131s sys 90m46.879s ``` * `cmake --build . -- -j $(nproc/4)`: - fast NVCC: ``` real 53m44.173s user 1130m18.323s sys 71m32.385s ``` - normal NVCC: ``` real 81m53.768s user 858m45.402s sys 61m15.539s ``` * Conclusion: fast NVCC doesn't provide too much gain when compiler is set to use full CPU utilization, in fact it is even worse because of the thread switcing. initial statistic for partial recompile (edit .cu files) * `cmake --build . -- -j $(nproc)` - fast NVCC: ``` [2021-01-13 18:10:24] [ 86%] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMiscOpsKernels.cu.o [2021-01-13 18:11:08] [ 86%] Linking CXX shared library ../lib/libtorch_cuda.so ``` - normal NVCC: ``` [2021-01-13 17:35:40] [ 86%] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMiscOpsKernels.cu.o [2021-01-13 17:38:08] [ 86%] Linking CXX shared library ../lib/libtorch_cuda.so ``` * Conclusion: Effective compilation time for single CU file modification reduced from from 2min30sec to only 40sec when compiling multiple architecture. This shows 4X gain in speed up using fast NVCC -- reaching the theoretical limit of 5X when compiling 5 gencode architecture at the same time. Follow up PRs: - should have better fallback mechanism to detect whether a build is supported by fast_nvcc or not instead of dryruning then fail with fallback. - performance measurement instrumentation to measure what's the total compile time vs the parallel tasks critical path time. - figure out why `-j $(nproc)` gives significant sys overhead (`sys 318m41.138s` vs `sys 90m46.879s`) over normal nvcc, guess this is context switching, but not exactly sure Reviewed By: malfet Differential Revision: D25692758 Pulled By: walterddr fbshipit-source-id: c244d07b9b71f146e972b6b3682ca792b38c4457	2021-01-19 14:50:54 -08:00
Richard Barnes	f7b2b22b64	Remove instance of blacklist (#50478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50478 See task for context Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25893912 fbshipit-source-id: 761120e4999fddd256bbf855ce49bfd93472b062	2021-01-19 14:42:01 -08:00
Tugsbayasgalan Manlaibaatar	0c9fb4aff0	Disable tracer warning for slicing indices. (#50414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50414 If the index that is supplied from python is an integral type, it converts everything to int64_t which is traced correctly. Test Plan: new test case Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25930773 fbshipit-source-id: a3dfeb49df1394c5c8bea0de46038d2c549a0dc6	2021-01-19 14:15:50 -08:00
Jason Ansel	3344f06130	[FX] Fix using fx.wrap as a decorator (#50677 ) Summary: `torch.fx.wrap()` could not be used as a decorator as the docstring claimed because it returned None. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50677 Test Plan: Added `test_wrapped_via_decorator` which used to fail with `'NoneType' object is not callable` and now passes Reviewed By: jamesr66a Differential Revision: D25949313 Pulled By: jansel fbshipit-source-id: 02d0f9adeed812f58ec94c94dd4adc43578f21ce	2021-01-19 13:42:15 -08:00
Luca Wehrstedt	05036564cf	Remove workaround for TensorPipe failing to get device of CUDA ptr (#50580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50580 Due to what looked like a bug in CUDA, TensorPipe was sometimes failing to auto-detect the device of a CUDA pointer. A workaround, on the PyTorch side, was to always initialize a CUDA context on device 0. Now that TensorPipe has fixed that we can undo the workaround. Reviewed By: mrshenli Differential Revision: D25952929 fbshipit-source-id: 57a5f73241f7371661855c767e44a64ca3b84a74	2021-01-19 12:18:00 -08:00
KumaTea	5f33f22324	Fix caffe2 import tools.codegen (#50353 ) Summary: Using `insert` instead of `append` to add torch root directory to `sys.path`, to fix `ModuleNotFoundError: No module named 'tools.codegen'`, as mentioned in https://github.com/pytorch/pytorch/issues/47553 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50353 Reviewed By: mrshenli Differential Revision: D25893827 Pulled By: ezyang fbshipit-source-id: 841e28898fee5502495f3890801b49d9b442f9d6	2021-01-19 12:13:15 -08:00
Jithun Nair	a9deaf3659	Shouldn't need user local install for ROCm build (#50299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50299 Reviewed By: zhangguanheng66 Differential Revision: D25865423 Pulled By: ezyang fbshipit-source-id: e2af5f00f99de3c0d38b6b6fedfd9b0027ed9b0b	2021-01-19 12:08:01 -08:00
Luca Wehrstedt	cad4753115	Update TensorPipe submodule (#50733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50733 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: beauby Differential Revision: D25954026 fbshipit-source-id: 44d21768379b301144518aafc9c68147db49d931	2021-01-19 12:05:25 -08:00
anjali411	4511f2cc9d	Clean up complex autograd test list (#50615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50615 The method tests for some of the ops have been ported to the new OpInfo based tests. This PR removes those op names from `complex_list` in `test_autograd.py` Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25931268 Pulled By: anjali411 fbshipit-source-id: 4d08626431c61c34cdca18044933e4f5b9b25232	2021-01-19 11:00:13 -08:00
Sam Estep	937eff5853	Consolidate mypy tests and args (#50631 ) Summary: This PR helps with https://github.com/pytorch/pytorch/issues/50513 by reducing the complexity of our `mypy` test suite and making it easier to reproduce on the command line. Previously, to reproduce how `mypy` was actually run on tracked source files (ignoring the doctest typechecking) in CI, you technically needed to run 9 different commands with various arguments: ``` $ mypy --cache-dir=.mypy_cache/normal --check-untyped-defs --follow-imports silent $ mypy --cache-dir=.mypy_cache/examples --follow-imports silent --check-untyped-defs test/type_hint_tests/module_list.py $ mypy --cache-dir=.mypy_cache/examples --follow-imports silent --check-untyped-defs test/type_hint_tests/namedtuple.py $ mypy --cache-dir=.mypy_cache/examples --follow-imports silent --check-untyped-defs test/type_hint_tests/opt_size.py $ mypy --cache-dir=.mypy_cache/examples --follow-imports silent --check-untyped-defs test/type_hint_tests/size.py $ mypy --cache-dir=.mypy_cache/examples --follow-imports silent --check-untyped-defs test/type_hint_tests/tensor_copy.py $ mypy --cache-dir=.mypy_cache/examples --follow-imports silent --check-untyped-defs test/type_hint_tests/torch_cuda_random.py $ mypy --cache-dir=.mypy_cache/examples --follow-imports silent --check-untyped-defs test/type_hint_tests/torch_optim.py $ mypy --cache-dir=.mypy_cache/strict --config mypy-strict.ini ``` Now you only have to run 2 much simpler commands: ``` $ mypy $ mypy --config mypy-strict.ini ``` One reason this is useful is because it will make it easier to integrate PyTorch's `mypy` setup into editors (remaining work on this to be done in a followup PR). Also, as shown in the test plan, this also reduces the time it takes to run `test/test_type_hints.py` incrementally, by reducing the number of times `mypy` is invoked while still checking the same set of files with the same configs. (Because this PR merges `test_type_hint_examples` (added in https://github.com/pytorch/pytorch/issues/34595) into `test_run_mypy` (added in https://github.com/pytorch/pytorch/issues/36584), I've added some people involved in those PRs as reviewers, in case there's a specific reason they weren't combined in the first place.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50631 Test Plan: Run this twice (the first time is to warm the cache): ``` $ python test/test_type_hints.py -v ``` - Before: ``` test_doc_examples (__main__.TestTypeHints) Run documentation examples through mypy. ... ok test_run_mypy (__main__.TestTypeHints) Runs mypy over all files specified in mypy.ini ... ok test_run_mypy_strict (__main__.TestTypeHints) Runs mypy over all files specified in mypy-strict.ini ... ok test_type_hint_examples (__main__.TestTypeHints) Runs mypy over all the test examples present in ... ok ---------------------------------------------------------------------- Ran 4 tests in 5.090s OK ``` You can also just run `mypy` to see how many files it checks: ``` $ mypy --cache-dir=.mypy_cache/normal --check-untyped-defs --follow-imports silent Success: no issues found in 1192 source files ``` - After: ``` test_doc_examples (__main__.TestTypeHints) Run documentation examples through mypy. ... ok test_run_mypy (__main__.TestTypeHints) Runs mypy over all files specified in mypy.ini ... ok test_run_mypy_strict (__main__.TestTypeHints) Runs mypy over all files specified in mypy-strict.ini ... ok ---------------------------------------------------------------------- Ran 3 tests in 2.404s OK ``` Now `mypy` checks 7 more files, which is the number in `test/type_hint_tests`: ``` $ mypy Success: no issues found in 1199 source files ``` Reviewed By: zou3519 Differential Revision: D25932660 Pulled By: samestep fbshipit-source-id: 26c6f00f338e7b44954e5ed89522ce24e2fdc5f0	2021-01-19 10:05:39 -08:00
Tugsbayasgalan Manlaibaatar	1a38fa9930	Striding for lists Part 1 (#48719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48719 Attempt to break this PR (https://github.com/pytorch/pytorch/pull/33019) into two parts. As per our discussion with eellison, the first part is to make sure our aten::slice operator take optional parameters for begin/step/end. This will help with refactoring ir_emitter.cpp for genering handling for list and slice striding. Once this PR merged, we will submit a second PR with compiler change. Test Plan: None for this PR, but new tests will be added for the second part. Imported from OSS Reviewed By: jamesr66a Differential Revision: D25929902 fbshipit-source-id: 5385df04e6d61ded0699b09bbfec6691396b56c3	2021-01-19 09:30:01 -08:00
Richard Zou	1154a8594e	Add instructional error message for cudnn RNN double backward workaround (#33884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33884 Mitigates https://github.com/pytorch/pytorch/issues/5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message Reviewed By: albanD Differential Revision: D20143544 Pulled By: zou3519 fbshipit-source-id: c2e49b3d8bdb9b34b561f006150e4c7551a78fac	2021-01-19 09:05:36 -08:00
anjali411	5d64658ce8	Add complex support for `torch.{acosh, asinh, atanh}` (#50387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50387 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D25947496 Pulled By: anjali411 fbshipit-source-id: c70886a73378501421ff94cdc0dc737f1738bf6f	2021-01-19 08:18:22 -08:00
Shen Li	1000403f66	Adding missing decorator for test_device_map_gpu_mixed_self_4 (#50732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50732 Test Plan: Imported from OSS Reviewed By: beauby Differential Revision: D25954041 Pulled By: mrshenli fbshipit-source-id: b2eeb1a77753cb8696613bfdc7bbc5001ae4c972	2021-01-19 07:53:11 -08:00
Ivan Yashchuk	f9a5ba7398	Added linalg.slogdet (#49194 ) Summary: This PR adds `torch.linalg.slogdet`. Changes compared to the original torch.slogdet: - Complex input now works as in NumPy - Added out= variant (allocates temporary and makes a copy for now) - Updated `slogdet_backward` to work with complex input Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49194 Reviewed By: VitalyFedyunin Differential Revision: D25916959 Pulled By: mruberry fbshipit-source-id: cf9be8c5c044870200dcce38be48cd0d10e61a48	2021-01-19 07:28:12 -08:00
Richard Zou	f7a8bfd0a1	Add batched grad testing to gradcheck, turn it on in test_autograd (#50592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50592 This adds a `check_batched_grad=False` option to gradcheck and gradgradcheck. It defaults to False because gradcheck is a public API and I don't want to break any existing non-pytorch users of gradcheck. This: - runs grad twice with two grad outputs, a & b - runs a vmapped grad with torch.stack([a, b]) - compares the results of the above against each other. Furthermore: - `check_batched_grad=True` is set to be the default for gradcheck/gradgradcheck inside of test_autograd.py. This is done by reassigning to the gradcheck object inside test_autograd - I manually added `check_batched_grad=False` to gradcheck instances that don't support batched grad. - I added a denylist for operations that don't support batched grad. Question: - Should we have a testing only gradcheck (e.g., torch.testing.gradcheck) that has different defaults from our public API, torch.autograd.gradcheck? Future: - The future plan for this is to repeat the above for test_nn.py (the autogenerated test will require a denylist) - Finally, we can repeat the above for all pytorch test files that use gradcheck. Test Plan: - run tests Reviewed By: albanD Differential Revision: D25925942 Pulled By: zou3519 fbshipit-source-id: 4803c389953469d0bacb285774c895009059522f	2021-01-19 06:48:28 -08:00
kshitij12345	316f0b89c3	[testing] Port `torch.{repeat, tile}` tests to use OpInfo machinery (#50199 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50013 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50199 Reviewed By: ngimel Differential Revision: D25949791 Pulled By: mruberry fbshipit-source-id: 10eaf2d749fac8c08847f50461e72ad1c75c61e3	2021-01-19 06:02:27 -08:00
Facebook Community Bot	5f13cc861c	Automated submodule update: tensorpipe (#50684 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `eabfe52867` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50684 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D25944553 fbshipit-source-id: e2bbcc48472cd79df89d87a0e61dcffa783c659d	2021-01-19 04:53:45 -08:00
nikitaved	c458558334	kill `multinomial_alias_setup/draw` (#50489 ) Summary: As per title. Partially Fixes https://github.com/pytorch/pytorch/issues/49421. These functions appear to be dead code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50489 Reviewed By: mruberry Differential Revision: D25948912 Pulled By: ngimel fbshipit-source-id: 108723bd4c76cbc3535eba902d6f74597bfdfa58	2021-01-19 00:23:58 -08:00
Jiakai Liu	5252e9857a	[pytorch] clean up unused util srcs under tools/autograd (#50611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50611 Removed the unused old-style code to prevent it from being used. Added all autograd/gen_pyi sources to mypy-strict.ini config. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Confirmed clean mypy-strict run: ``` mypy --config mypy-strict.ini ``` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25929730 Pulled By: ljk53 fbshipit-source-id: 1fc94436fd4a6b9b368ee0736e99bfb3c01d38ef	2021-01-18 23:54:02 -08:00
Michael Suo	b75cdceb44	[package] Properly demangle all accesses of `__name__` in importer.py (#50711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50711 As title, missed a few of these. Test Plan: Imported from OSS Reviewed By: yf225 Differential Revision: D25949363 Pulled By: suo fbshipit-source-id: 197743fe7097d2ac894421a99c072696c3b8cd70	2021-01-18 23:43:46 -08:00
Kyle Chen	d5e5c5455a	[ROCm] re-enable test_sparse.py tests (#50557 ) Summary: Signed-off-by: Kyle Chen <kylechen@amd.com> cc: jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/50557 Reviewed By: mruberry Differential Revision: D25941432 Pulled By: ngimel fbshipit-source-id: 534fc8a91a48fa8b3b397e63423cd8347b41bbe2	2021-01-18 23:36:39 -08:00
AJ San Joaquin	e9b369c25f	Add SELU Activation to calculate_gain (#50664 ) Summary: Fixes #{[24991](https://github.com/pytorch/pytorch/issues/24991)} I used a value of 0.75 as suggested in the forums by Thomas: https://discuss.pytorch.org/t/calculate-gain-tanh/20854/6 I verified that the value keeps the gradient stable for a 100-layer network. Code to reproduce (from [jpeg729](https://discuss.pytorch.org/t/calculate-gain-tanh/20854/4)): ```python import torch import torch.nn.functional as F import sys a = torch.randn(1000,1000, requires_grad=True) b = a print (f"in: {a.std().item():.4f}") for i in range(100): l = torch.nn.Linear(1000,1000, bias=False) torch.nn.init.xavier_normal_(l.weight, torch.nn.init.calculate_gain("selu")) b = getattr(F, 'selu')(l(b)) if i % 10 == 0: print (f"out: {b.std().item():.4f}", end=" ") a.grad = None b.sum().backward(retain_graph=True) print (f"grad: {a.grad.abs().mean().item():.4f}") ``` Output: ``` in: 1.0008 out: 0.7968 grad: 0.6509 out: 0.3127 grad: 0.2760 out: 0.2404 grad: 0.2337 out: 0.2062 grad: 0.2039 out: 0.2056 grad: 0.1795 out: 0.2044 grad: 0.1977 out: 0.2005 grad: 0.2045 out: 0.2042 grad: 0.2273 out: 0.1944 grad: 0.2034 out: 0.2085 grad: 0.2464 ``` I included the necessary documentation change, and it passes the _test_calculate_gain_nonlinear_ unittest. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50664 Reviewed By: mruberry Differential Revision: D25942217 Pulled By: ngimel fbshipit-source-id: 29ff1be25713484fa7c516df71b12fdaecfb9af8	2021-01-18 23:01:18 -08:00
Shen Li	ce30dba36f	Enable TensorPipe CUDA fallback channel (#50675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50675 Test Plan: Imported from OSS Reviewed By: beauby Differential Revision: D25941963 Pulled By: mrshenli fbshipit-source-id: 205786d7366f36d659a3a3374081a458cfcb4dd1	2021-01-18 19:38:40 -08:00
Shen Li	94d9a7e8ac	Enable TensorPipe CUDA sending to self (#50674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50674 Test Plan: Imported from OSS Reviewed By: beauby Differential Revision: D25941964 Pulled By: mrshenli fbshipit-source-id: b53454efdce01f7c06f67dfb890d3c3bdc2c648f	2021-01-18 19:35:40 -08:00
Pritam Damania	8b501dfd98	Fix memory leak in TensorPipeAgent. (#50564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50564 When an RPC was sent, the associated future was stored in two maps: pendingResponseMessage_ and timeoutMap_. Once the response was received, the entry was only removed from pendingResponseMessage_ and not timeoutMap_. The pollTimedoudRpcs method then eventually removed the entry from timeoutMap_ after the time out duration had passed. Although, in scenarios where there is a large timeout and a large number of RPCs being used, it is very easy for the timeoutMap_ to grow without any bounds. This was discovered in https://github.com/pytorch/pytorch/issues/50522. To fix this issue, I've added some code to cleanup timeoutMap_ as well once we receive a response. ghstack-source-id: 119925182 Test Plan: 1) Unit test added. 2) Tested with repro in https://github.com/pytorch/pytorch/issues/50522 #Closes: https://github.com/pytorch/pytorch/issues/50522 Reviewed By: mrshenli Differential Revision: D25919650 fbshipit-source-id: a0a42647e706d598fce2ca2c92963e540b9d9dbb	2021-01-18 16:34:28 -08:00
Lu Fang	f32b10e564	[BE] Fix the broken test caffe2/caffe2/python:lazy_dyndep_test - test_allcompare (#50696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50696 set no deadline for test_alklcompare Test Plan: buck test mode/dev //caffe2/caffe2/python:lazy_dyndep_test -- --exact 'caffe2/caffe2/python:lazy_dyndep_test - test_allcompare (caffe2.caffe2.python.lazy_dyndep_test.TestLazyDynDepAllCompare)' --run-disabled Reviewed By: hl475 Differential Revision: D25947800 fbshipit-source-id: d2043f97128e257ef06ebca9b68262bb1c0c5e6b	2021-01-18 16:21:06 -08:00
kiyosora	d140ca8b69	Optimize implementation of torch.pow (#46830 ) Summary: - Related with https://github.com/pytorch/pytorch/issues/44937 - Use `resize_output` instead of `resize_as` - Tuning the `native_functions.yaml`, move the inplace variant `pow_` next to the other `pow` entries Pull Request resolved: https://github.com/pytorch/pytorch/pull/46830 Reviewed By: mrshenli Differential Revision: D24567702 Pulled By: anjali411 fbshipit-source-id: a352422c9d4e356574dbfdf21fb57f7ca7c6075d	2021-01-18 14:19:35 -08:00
anjali411	227acc2e51	Complex autograd support for torch.{baddbmm, addbmm, addmm, addmv} (#50632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50632 I'll port the following method tests in follow-up PRs: `'baddbmm', 'addbmm', 'addmv', 'addr'` After the tests are ported to OpInfo based tests, it would also be much easier to add tests with complex alpha and beta values. Edit- it seems like it's hard to port the broadcasting variant tests because one ends up skipping `test_inplace_grad` and `test_variant_consistency_eager` even for the case when inputs are not required to be broadcasted. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D25947471 Pulled By: anjali411 fbshipit-source-id: 9faa7f1fd55a1269bad282adac2b39d19bfa4591	2021-01-18 14:05:02 -08:00
Sameer Deshmukh	7f3a407225	Multi label margin loss (#50007 ) Summary: Reopen PR for https://github.com/pytorch/pytorch/pull/46975 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50007 Reviewed By: mruberry Differential Revision: D25850808 Pulled By: ngimel fbshipit-source-id: a232e02949182b7d3799448d24ad54a9e0bcf95c	2021-01-18 01:48:05 -08:00
vfdev-5	eae1b40400	Introduced operator variant to OpInfo (#50370 ) Summary: Introduced operator variant to OpInfo Context: Split of https://github.com/pytorch/pytorch/issues/49158 cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/50370 Reviewed By: mrshenli Differential Revision: D25897821 Pulled By: mruberry fbshipit-source-id: 4387ea10607dbd7209842b685f1794bcb31f434e	2021-01-18 00:05:01 -08:00
76181208+imaginary-person@users.noreply.github.com	3f052ba07b	Remove unnecessary dtype checks for complex types & disable complex dispatch for CPU min/max pointwise ops (#50465 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50064 PROBLEM DESCRIPTION: 1. Had not removed dtype checks for complex types in the previous PR (https://github.com/pytorch/pytorch/issues/50347) for this issue. These type-checks were added in https://github.com/pytorch/pytorch/issues/36377, but are no longer necessary, as we now rely upon dispatch macros to produce error messages. 2. dtype checks in `clamp_max()` and `clamp_min()` for complex inputs had not been removed either. 3. For min/max pointwise ops in TensorCompareKernel.cpp, complex dispatch had not been removed for min/max functions. ### FIX DESCRIPTION: FIX SUMMARY: 1. Removed dtype checks added in https://github.com/pytorch/pytorch/issues/36377, and added 3 more in TensorCompare.cpp. 2. Removed dtype checks for complex inputs in `clamp_max()` and `clamp_min()`. 3. Disabled complex dispatch for min/max pointwise ops in TensorCompareKernel.cpp. 4. Error messages in the exceptions raised due to min/max ops not being implemented are now checked for containing the text _not support_ (which can also be present in _not supported_), or _not implemented_, so one of them should be a part of error messages, in order for them to be informative. REASON FOR NOT CHANGING DISPATCH FOR CUDA AND CLAMP OPS: As for the CUDA min/max operations, their kernels do not seem to be compiled & dispatched for complex types anyway, so no further changes seem to be required. Basically, the dispatch macros currently being used don't have cases for complex types. For example, 1. the reduce CUDA ops use [AT_DISPATCH_ALL_TYPES_AND2 (`678fe9f077`)](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Dispatch.h#L548-L575) in [ReduceMinMaxKernel.cu](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/ReduceMinMaxKernel.cu), and that macro doesn't allow complex types. 2. In [MinMaxElementwiseKernel.cu](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/MaxMinElementwiseKernel.cu), the CUDA pointwise ops use [`AT_DISPATCH_FLOATING_TYPES_AND2 (`678fe9f077`)`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Dispatch.h#L240-L263) for non-integral & non-boolean types, and this marco doesn't have a case for complex types either. 3. [clamp CUDA ops](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/UnaryOpsKernel.cu#L170-L211) use `AT_DISPATCH_ALL_TYPES_AND2 (`678fe9f077`)`, which doesn't have a case for complex types. Similarly, [CPU clamp min/max ops](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp#L428-L458) use the `AT_DISPATCH_ALL_TYPES_AND `dispatch macro, which doesn't have a case for complex types. REASON FOR ADDING 3 dtype CHECKS: There are a few cases in which the methods corresponding to `min_stub()` or `max_stub()` are not called, so dispatch macros don't get invoked, resulting in no exceptions being raised. Hence, `dtype` checks are necessary at 3 places to raise exceptions: 1. `52dcc72999/aten/src/ATen/native/TensorCompare.cpp (L342)` 2. `52dcc72999/aten/src/ATen/native/TensorCompare.cpp (L422)` 3. `52dcc72999/aten/src/ATen/native/TensorCompare.cpp (L389)` The first dtype check requirement can be verified from the following example Python code based on `test_complex_unsupported()`: ``` import unittest import torch class MyTestCase(unittest.TestCase): def test_1(self): t = torch.tensor((1 + 1j), device='cpu', dtype=torch.complex128) with self.assertRaises(Exception): torch.max(t, dim=0) if __name__ == '__main__': unittest.main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50465 Reviewed By: mruberry Differential Revision: D25938106 Pulled By: ngimel fbshipit-source-id: 95e2df02ba8583fa3ce87d4a2fdcd60b912dda46	2021-01-17 22:00:05 -08:00
Lu Fang	1fdc35da2c	[BE] Fix the broken test -- caffe2/caffe2/python:hypothesis_test - test_recurrent (#50668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50668 GPU initialization sometimes is slow Test Plan: buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --exact 'caffe2/caffe2/python:hypothesis_test - test_recurrent (caffe2.caffe2.python.hypothesis_test.TestOperators)' --run-disabled Reviewed By: hl475 Differential Revision: D25939037 fbshipit-source-id: 832700cf42ece848cda66dd629a06ecda207f086	2021-01-17 21:21:38 -08:00
Natalia Gimelshein	534c82153e	fix bn channels_last contiguity check (#50659 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42588 The contiguity check used to be for memory format suggested by `grad_output->suggest_memory_format()`, but an invariant guaranteed by derivatives.yaml is `input->suggest_memory_format()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50659 Reviewed By: mruberry Differential Revision: D25938921 Pulled By: ngimel fbshipit-source-id: a945bfef6ce3d91b17e7ff96babe89ffd508939a	2021-01-17 21:10:12 -08:00
Jagadish Krishnamoorthy	7e05d07ca7	[distributed_test_c10d]Enable disabled ROCm tests. (#50629 ) Summary: Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/50629 Reviewed By: albanD Differential Revision: D25935005 Pulled By: rohan-varma fbshipit-source-id: e0969afecac2f319833189a7a8897d78068a2cda	2021-01-16 23:32:30 -08:00
Horace He	2001f3a2c9	Finished fleshing out the tensor expr bindings in expr.cpp (#50643 ) Summary: Adds the rest of the ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50643 Reviewed By: pbelevich Differential Revision: D25936346 Pulled By: Chillee fbshipit-source-id: 4e2a7afbeabde51991c39d187a8c35e766950ffe	2021-01-16 13:37:51 -08:00
Nikita Shulga	a469336292	Fix pytorch-doc build (#50651 ) Summary: Fixes `docstring of torch.distributed.rpc.RRef.remote:14: WARNING: Field list ends without a blank line; unexpected unindent.` by indenting multiline fieldlist Pull Request resolved: https://github.com/pytorch/pytorch/pull/50651 Reviewed By: SplitInfinity Differential Revision: D25935839 Pulled By: malfet fbshipit-source-id: e2613ae75334d01ab57f4b071cb0fddf80c6bd78	2021-01-15 23:39:34 -08:00
Rong Rong (AI Infra)	da5d4396c5	remove dulicate newlines (#50648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50648 Reviewed By: malfet Differential Revision: D25935513 Pulled By: walterddr fbshipit-source-id: 1a8419b4fdb25368975ac8e72181c2c4b6295278	2021-01-15 22:26:47 -08:00
Scott Wolchok	0ea1abe07b	[PyTorch] Add missing Dispatcher.h include in quantized_ops.cpp (#50646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50646 Master build broke (see https://app.circleci.com/pipelines/github/pytorch/pytorch/260715/workflows/948c9235-8844-4747-b40d-c14ed33f8dbb/jobs/10195595) ghstack-source-id: 119906225 (Note: this ignores all push blocking failures!) Test Plan: CI? Reviewed By: malfet Differential Revision: D25935300 fbshipit-source-id: 549eba1af24305728a5a0a84cb84142ec4807d95	2021-01-15 19:44:46 -08:00
nikitaved	c99f356051	Stable sort for CPU (#50052 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/38681](https://github.com/pytorch/pytorch/issues/38681) for the CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50052 Reviewed By: mrshenli Differential Revision: D25900823 Pulled By: glaringlee fbshipit-source-id: 1a3fa336037d0aa2344d79f46dcacfd478a353d1	2021-01-15 19:34:27 -08:00
Rong Rong (AI Infra)	3df5f9c3b2	Revert D25843351: [pytorch][PR] Clarify, make consistent, and test the behavior of logspace when dtype is integral Test Plan: revert-hammer Differential Revision: D25843351 (`0ae0fac1bb`) Original commit changeset: 45237574d04c fbshipit-source-id: fb5343d509b277158b14d1b61e10433793889842	2021-01-15 18:47:37 -08:00
James Reed	0291f35b37	[FX] Make len traceable and scriptable with wrap (#50184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50184 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25819832 Pulled By: jamesr66a fbshipit-source-id: ab16138ee26ef2f92f3478c56f0db1873fcc5dd0	2021-01-15 17:46:53 -08:00
Nikita Shulga	585ee119cf	Updated codecov config settings (#50601 ) Summary: - Do not generate inline comments on PRs - Increase number of signals to wait until generating a comment to 5 (2 for codecov configs, 2 for onnx and 1 for windows_test1) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50601 Reviewed By: albanD Differential Revision: D25928920 Pulled By: malfet fbshipit-source-id: 8a4ff70024c948cb65a4bdf31d269080d2cff945	2021-01-15 17:41:24 -08:00
Wenlei He	b832604ffb	Fix caffee2 for llvm trunk Summary: Fix build with llvm-trunk. With D25877605 (`cb37709bee`), we need to explicitly include `llvm/Support/Host.h` in `llvm_jit.cpp`. Test Plan: `buck build mode/opt-clang -j 56 sigrid/predictor/v2:sigrid_remote_predictor -c cxx.extra_cxxflags="-Wforce-no-error" -c cxx.modules=False -c cxx.use_default_autofdo_profile=False` Reviewed By: bertmaher Differential Revision: D25920968 fbshipit-source-id: 4b80d5072907f50d01e8fbef41cda8a89dd66a96	2021-01-15 17:12:39 -08:00
Bert Maher	2569dc71e1	Reapply D25859132: [te] Optimize allocation of kernel outputs (#50546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50546 And fix the ROCm build ghstack-source-id: 119837166 Test Plan: CI Reviewed By: ZolotukhinM Differential Revision: D25912464 fbshipit-source-id: 023e1f6c9fc131815c5a7a31f4860dfe271f7ae1	2021-01-15 17:02:49 -08:00
Nikolay Korovaiko	8e60bf9034	add RequiresGradCheck (#50392 ) Summary: This change improves perf by 3-4% on fastrnns. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50392 Reviewed By: izdeby Differential Revision: D25891392 Pulled By: Krovatkin fbshipit-source-id: 44d9b6907d3975742c9d77102fe6a85aab2c08c0	2021-01-15 16:50:42 -08:00
Jeffrey Wan	6e3e57095c	Add complex support for torch.nn.L1Loss (#49912 ) Summary: Building on top of the work of anjali411 (https://github.com/pytorch/pytorch/issues/46640) Things added in this PR: 1. Modify backward and double-backward formulas 2. Add complex support for `new module tests` and criterion tests (and add complex tests for L1) 3. Modify some existing tests to support complex Pull Request resolved: https://github.com/pytorch/pytorch/pull/49912 Reviewed By: zhangguanheng66 Differential Revision: D25853036 Pulled By: soulitzer fbshipit-source-id: df619f1b71c450ab2818eb17804e0c55990aa8ad	2021-01-15 15:53:15 -08:00
Rohan Varma	d64184ef4c	[RPC] Support timeout for RRef proxy functions (#50499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50499 Adds a timeout API to the following functions: ``` rref.rpc_sync() rref.rpc_async() rref.remote() ``` so that RPCs initiated by these proxy calls can be appropriately timed out similar to the regular RPC APIs. Timeouts are supported in the following use cases: 1. rpc.remote finishes in time and successfully, but function run by rref.rpc_async() is slow and times out. Timeout error will be raised 2. rref.rpc_async() function is fast, but rpc.remote() is slow/hanging. Then when rref.rpc_async() is called, it will still timeout with the passed in timeout (and won't block for the rpc.remote() to succeed, which is what happens currently). Although, the timeout will occur during the future creation itself (and not the wait) since it calls `rref._get_type` which blocks. We can consider making this nonblocking by modifying rref._get_type to return a future, although that is likely a larger change. Test Plan: Added UT Reviewed By: wanchaol Differential Revision: D25897495 fbshipit-source-id: f9ad5b8f75121f50537677056a5ab16cf262847e	2021-01-15 13:23:23 -08:00
Rohan Varma	ab1ba8f433	[RPC] Support timeout in rref._get_type() (#50498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50498 This change is mostly needed for the next diff in this stack, where rref._get_type() is called in the rpc_async/rpc_sync RRef proxy function and can block indefinitely if there is no timeout. It will also be useful to have a timeout argument when we publicize this API to keep it consistent with other RPC APIs. ghstack-source-id: 119859767 Test Plan: Added UT Reviewed By: pritamdamania87 Differential Revision: D25897588 fbshipit-source-id: 2e84aaf7e4faecf80005c78ee2ac8710f387503e	2021-01-15 13:18:39 -08:00
Scott Wolchok	c78e7db7ee	[PyTorch] Remove unnecessary dispatcher.h include in mobile/interpreter.h (#50316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50316 It's unused. ghstack-source-id: 119798799 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D25858961 fbshipit-source-id: 0f214f93dcdf99d0c22e6d8032ed7a10604c714a	2021-01-15 13:10:30 -08:00
Scott Wolchok	60a1831e61	[PyTorch] Remove unnecessary dispatcher.h include in op_registration.h (#50315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50315 It's unused. ghstack-source-id: 119798801 Test Plan: CI Reviewed By: ezyang Differential Revision: D25858937 fbshipit-source-id: fe4fdb33c1a443fdd17644c3f7f34c897abf383f	2021-01-15 13:10:28 -08:00
Scott Wolchok	687f6a513a	[PyTorch] Remove unnecessary dispatcher.h include in builtin_function.h (#50314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50314 It's unused. ghstack-source-id: 119798800 Test Plan: CI Reviewed By: ezyang Differential Revision: D25858900 fbshipit-source-id: 16107acb3df0de18ed16d92f1e2c1b0a72e3e43d	2021-01-15 13:05:47 -08:00
Hong Xu	0ae0fac1bb	Clarify, make consistent, and test the behavior of logspace when dtype is integral (#47647 ) Summary: torch.logspace doesn't seem to have explained how integers are handled. Add some clarification and some test when dtype is integral. The CUDA implementation is also updated to be consistent with CPU implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47647 Reviewed By: gchanan Differential Revision: D25843351 Pulled By: walterddr fbshipit-source-id: 45237574d04c56992c18766667ff1ed71be77ac3	2021-01-15 12:31:20 -08:00
Richard Barnes	8e7402441d	Move irange to c10 (#46414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46414 For loops are often written with mismatched data types which causes silent type and sign coercion in the absence of integer conversion warnings. Getting around this in templated code requires convoluted patterns such as ``` for(auto i=decltype(var){0};i<var;i++) ``` with this diff we can instead write ``` for(const auto i = c10::irange(var)) ``` Note that this loop is type-safe and const-safe. The function introduced here (`c10::irange`) allows for type-safety and const-ness within for loops, which prevents the accidental truncation or modification of integers and other types, improving code safety. Test Plan: ``` buck test //caffe2/c10:c10_test_0 ``` Reviewed By: ngimel Differential Revision: D24334732 fbshipit-source-id: fec5ebda3643ec5589f7ea3a8e7bbea4432ed771	2021-01-15 11:44:55 -08:00
Eli Uriegas	296e4a0b7f	.circleci: Set +u for all conda install commands (#50505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50505 Even with +u set for the the conda install it still seems to fail out with an unbound variable error. Let's try and give it a default value instead. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25913692 Pulled By: seemethere fbshipit-source-id: 4b898f56bff25c7523f10b4933ea6cd17a57df80	2021-01-15 11:36:58 -08:00
Guilherme Leobas	0d981eea6c	add type annotations to torch.nn.modules.conv (#49564 ) Summary: closes gh-49563 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49564 Reviewed By: albanD Differential Revision: D25917441 Pulled By: walterddr fbshipit-source-id: 491dc06cfc1bbf694dfd9ccefca4f55488a931b2	2021-01-15 11:16:11 -08:00
Erjia Guan	00d432a1ed	Remove optional for veiw_fn during View Tracking (#50067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50067 Fixes #49257 Using the `Callgrind` to test the performance. ```python import torch import timeit from torch.utils.benchmark import Timer timer = Timer("x.view({100, 5, 20});", setup="torch::Tensor x = torch::ones({10, 10, 100});", language="c++", timer=timeit.default_timer) res = timer.collect_callgrind(number=10) ``` ### Nightly ```python torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f7949138c40> x.view({100, 5, 20}); setup: torch::Tensor x = torch::ones({10, 10, 100}); All Noisy symbols removed Instructions: 42310 42310 Baseline: 0 0 10 runs per measurement, 1 thread Warning: PyTorch was not built with debug symbols. Source information may be limited. Rebuild with REL_WITH_DEB_INFO=1 for more detailed results. ``` ### Current ```python <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f78f271a580> x.view({100, 5, 20}); setup: torch::Tensor x = torch::ones({10, 10, 100}); All Noisy symbols removed Instructions: 42480 42480 Baseline: 0 0 10 runs per measurement, 1 thread Warning: PyTorch was not built with debug symbols. Source information may be limited. Rebuild with REL_WITH_DEB_INFO=1 for more detailed results. ``` ### Compare There are 170 instructions reduced ```python torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f7941b7a7c0> 970 ???:torch::autograd::as_view(at::Tensor const&, at::Tensor const&, bool, bool, std::function<at::Tensor (at::Tensor const&)>, torch::autograd::CreationMeta, bool) 240 ???:torch::autograd::ViewInfo::~ViewInfo() 180 ???:torch::autograd::ViewInfo::ViewInfo(at::Tensor, std::function<at::Tensor (at::Tensor const&)>) 130 ???:torch::autograd::make_variable_differentiable_view(at::Tensor const&, c10::optional<torch::autograd::ViewInfo>, c10::optional<torch::autograd::ViewInfo>, torch::autograd::CreationMeta, bool) 105 /tmp/benchmark_utils_jit_build_69e2f1710544485588feeca0719a3a57/timer_cpp_4435526292782672407/timer_src.cpp:main 100 ???:std::function<at::Tensor (at::Tensor const&)>::function(std::function<at::Tensor (at::Tensor const&)> const&) 70 ???:torch::autograd::DifferentiableViewMeta::~DifferentiableViewMeta() 70 ???:torch::autograd::DifferentiableViewMeta::DifferentiableViewMeta(c10::TensorImpl*, c10::optional<torch::autograd::ViewInfo>, c10::optional<torch::autograd::ViewInfo>, torch::autograd::CreationMeta) -100 ???:c10::optional_base<torch::autograd::ViewInfo>::optional_base(c10::optional_base<torch::autograd::ViewInfo>&&) -105 /tmp/benchmark_utils_jit_build_2e75f38b553e42eba00523a86ad9aa05/timer_cpp_3360771523810516633/timer_src.cpp:main -120 ???:torch::autograd::ViewInfo::ViewInfo(at::Tensor, c10::optional<std::function<at::Tensor (at::Tensor const&)> >) -210 ???:c10::optional_base<std::function<at::Tensor (at::Tensor const&)> >::~optional_base() -240 ???:c10::optional_base<torch::autograd::ViewInfo>::~optional_base() -920 ???:torch::autograd::as_view(at::Tensor const&, at::Tensor const&, bool, bool, c10::optional<std::function<at::Tensor (at::Tensor const&)> >, torch::autograd::CreationMeta, bool) ``` Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D25900495 Pulled By: ejguan fbshipit-source-id: dedd30e69db6b48601a18ae98d6b28faeae30d90	2021-01-15 08:29:28 -08:00
Rong Rong	070a30b265	[BE] add warning message to cmake against env var "-std=c++xx" (#50491 ) Summary: this was discovered when working on https://github.com/pytorch/pytorch/issues/50230. environment variables such as CXXFLAGS="-std=c++17" will not work because we use CMAKE_CXX_STANDARD 14. Adding this warning to alert users when environment variable was set. See: [CMake env var usage](https://cmake.org/cmake/help/latest/manual/cmake-env-variables.7.html#id4) and [CXXFLAGS usage](https://cmake.org/cmake/help/latest/envvar/CXXFLAGS.html) for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50491 Reviewed By: mrshenli Differential Revision: D25907851 Pulled By: walterddr fbshipit-source-id: 5af5eec76f79f9d35456af1f2663cafbc54e7dc8	2021-01-15 07:12:56 -08:00
Brian Vaughan	a9db2f8e7a	Revert D24924236: [pytorch][PR] [ONNX] Handle sequence output shape and type inference Test Plan: revert-hammer Differential Revision: D24924236 (`adc65e7c8d`) Original commit changeset: 506e70a38cfe fbshipit-source-id: 78069a33fb3df825af1cb482da06a07f7b26ab48	2021-01-15 05:58:35 -08:00
generatedunixname89002005325676	366b00ab7b	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D25921551 fbshipit-source-id: df0445864751c18eaa240deff6a142dd791d32ff	2021-01-15 04:16:07 -08:00
Facebook Community Bot	ffefa44e20	Automated submodule update: tensorpipe (#50572 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `161500fb09` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50572 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D25920888 fbshipit-source-id: fa73ba50a2d9429ea1e0beaac6edc2fd8d3ce244	2021-01-15 02:12:54 -08:00
James Reed	d9f71b5868	[WIP][FX] new sections in docs (#50562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50562 Adding new top-level sections to the docs to be filled out ![image](https://user-images.githubusercontent.com/4685384/104666703-5b778580-5689-11eb-80ab-7df07f816b5b.png) Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D25919592 Pulled By: jamesr66a fbshipit-source-id: 45f564eb8fddc7a42abb5501e160cca0dd0745c8	2021-01-14 21:34:36 -08:00
James Reed	6882f9cc1c	[FX] Add wrap() docstring to docs and add decorator example (#50555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50555 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D25917564 Pulled By: jamesr66a fbshipit-source-id: 20c7c8b1192fa80c6a0bb9e18910791bd7167232	2021-01-14 21:31:51 -08:00
Negin Raoof	adc65e7c8d	[ONNX] Handle sequence output shape and type inference (#46542 ) Summary: Handle sequence output shape and type inference. This PR fixes value type of sequence outputs. Prior to this, all model sequence type outputs were unfolded for ONNX models. This PR also enable shape inference for sequence outputs to represent the dynamic shape of these values. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46542 Reviewed By: ezyang Differential Revision: D24924236 Pulled By: bzinodev fbshipit-source-id: 506e70a38cfe31069191d7f40fc6375239c6aafe	2021-01-14 21:12:35 -08:00
Mikhail Zolotukhin	e9dc8fc162	[TensorExpr] Add python bindings. (#49698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49698 Reincarnation of #47620 by jamesr66a. It's just an initial bunch of things that we're exposing to python, more is expected to come in future. Some things can probably be done better, but I'm putting this out anyway, since some other people were interested in using and/or developing this. Differential Revision: D25668694 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: fb0fd1b31e851ef9ab724686b9ac2d172fa4905a	2021-01-14 21:02:47 -08:00
Nikita Shulga	9efe15313a	Revert D25563542: Add batched grad testing to gradcheck, turn it on in test_autograd Test Plan: revert-hammer Differential Revision: D25563542 (`443412e682`) Original commit changeset: 125dea554abe fbshipit-source-id: 0564735f977431350b75147ef209e56620dbab64	2021-01-14 19:19:02 -08:00
Richard Barnes	be51de4047	Minor doc improvement(?) on ArrayRef::slice (#50541 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50541 I found the current phrasing to be confusing Test Plan: N/A Reviewed By: ngimel Differential Revision: D25909205 fbshipit-source-id: 483151d01848ab41d57b3f3b3775ef69f1451dcf	2021-01-14 18:09:34 -08:00
Mikhail Zolotukhin	4de9d04f03	[TensorExpr] Hook Fuser Pass to JIT opt-limit utility. (#50518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50518 That new feature allows to bisect the pass easily by hard-stopping it after a given number of hits. Test Plan: Imported from OSS Reviewed By: tugsbayasgalan Differential Revision: D25908597 Pulled By: ZolotukhinM fbshipit-source-id: 8ee547989078c7b1747a4b02ce6e71027cb3055f	2021-01-14 17:08:50 -08:00
Richard Barnes	08baffa8aa	Drop blacklist from glow (#50480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50480 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25893858 fbshipit-source-id: 297440997473c037e8f59a460306569d0a4aa67c	2021-01-14 16:06:34 -08:00
Richard Barnes	2ceaec704d	Fix warnings in TensorShape (#50486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50486 Compiling currently gives: ``` an 13 16:46:39 In file included from ../aten/src/ATen/native/TensorShape.cpp:12: Jan 13 16:46:39 ../aten/src/ATen/native/Resize.h:37:24: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:39 if (new_size_bytes > self->storage().nbytes()) { Jan 13 16:46:39 ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:32:24: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long long') [-Wsign-compare] Jan 13 16:46:39 for (size_t i = 0; i < shape_tensor.numel(); ++i) { Jan 13 16:46:39 ~ ^ ~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:122:25: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:39 for (int64_t i = 0; i < tensors.size(); i++) { Jan 13 16:46:39 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:162:21: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:39 for (int i = 0; i < tensors.size(); i++) { Jan 13 16:46:39 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:300:25: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:39 for (int64_t i = 0; i < s1.size(); ++i) { Jan 13 16:46:39 ~ ^ ~~~~~~~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:807:21: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:39 TORCH_CHECK(dim < self_sizes.size()); Jan 13 16:46:39 ~~~ ^ ~~~~~~~~~~~~~~~~~ Jan 13 16:46:39 ../c10/util/Exception.h:361:31: note: expanded from macro 'TORCH_CHECK' Jan 13 16:46:39 if (C10_UNLIKELY_OR_CONST(!(cond))) { \ Jan 13 16:46:39 ^~~~ Jan 13 16:46:39 ../c10/util/Exception.h:244:47: note: expanded from macro 'C10_UNLIKELY_OR_CONST' Jan 13 16:46:39 #define C10_UNLIKELY_OR_CONST(e) C10_UNLIKELY(e) Jan 13 16:46:39 ^ Jan 13 16:46:39 ../c10/macros/Macros.h:173:65: note: expanded from macro 'C10_UNLIKELY' Jan 13 16:46:39 #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0)) Jan 13 16:46:39 ^~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:855:24: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'const int64_t' (aka 'const long long') [-Wsign-compare] Jan 13 16:46:39 for (size_t i = 0; i < num_blocks; ++i) { Jan 13 16:46:39 ~ ^ ~~~~~~~~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:2055:23: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:39 for (int i = 0; i < vec.size(); i++) { Jan 13 16:46:39 ~ ^ ~~~~~~~~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:2100:25: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:39 for (int64_t i = 0; i < src.size(); ++i) { ``` This fixes issues with loop iteration variable types Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25901799 fbshipit-source-id: c68d9ab93ab0142b5057ce4ca9e75c620a1425f0	2021-01-14 15:24:46 -08:00
Richard Barnes	1908f56b3a	Fix warnings in "ForeachOpsKernels" (#50482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50482 Compiling currently shows: ``` Jan 13 16:46:28 In file included from ../aten/src/ATen/native/ForeachOpsKernels.cpp:2: Jan 13 16:46:28 ../aten/src/ATen/native/ForeachUtils.h:28:21: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachUtils.h:44:21: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachUtils.h:149:25: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 for (int64_t i = 0; i < tensors1.size(); i++) { Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachUtils.h:164:25: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 for (int64_t i = 0; i < tensors1.size(); i++) { Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachUtils.h:183:25: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 for (int64_t i = 0; i < tensors1.size(); i++) { Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachUtils.h:198:25: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 for (int64_t i = 0; i < tensors1.size(); i++) { Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:150:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST_ALPHA(add); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:74:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST_ALPHA' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:150:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST_ALPHA(add); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:84:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST_ALPHA' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:151:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST_ALPHA(sub); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:74:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST_ALPHA' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:151:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST_ALPHA(sub); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:84:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST_ALPHA' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:158:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(add); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:31:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:158:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(add); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:40:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:159:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(sub); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:31:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:159:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(sub); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:40:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:160:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(mul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:31:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:160:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(mul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:40:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:161:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(div); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:31:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:161:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_SCALARLIST(div); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:40:21: note: expanded from macro 'FOREACH_BINARY_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < tensors.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:163:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST(mul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:53:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:163:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST(mul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:63:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:164:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST(div); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:53:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:164:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_BINARY_OP_LIST(div); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:63:21: note: expanded from macro 'FOREACH_BINARY_OP_LIST' Jan 13 16:46:28 for (int i = 0; i < tensors1.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:195:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALAR(addcdiv); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:115:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALAR' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:195:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALAR(addcdiv); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:125:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALAR' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:196:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALAR(addcmul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:115:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALAR' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:196:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALAR(addcmul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:125:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALAR' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:198:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALARLIST(addcdiv); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:135:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:198:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALARLIST(addcdiv); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:145:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:199:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALARLIST(addcmul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:135:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { \ Jan 13 16:46:28 ~ ^ ~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:199:1: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:28 FOREACH_POINTWISE_OP_SCALARLIST(addcmul); Jan 13 16:46:28 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:28 ../aten/src/ATen/native/ForeachOpsKernels.cpp:145:21: note: expanded from macro 'FOREACH_POINTWISE_OP_SCALARLIST' Jan 13 16:46:28 for (int i = 0; i < input.size(); i++) { ``` this diff fixes that Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25901744 fbshipit-source-id: 2cb665358a103d85e07c690d73b3f4a557d4c135	2021-01-14 15:21:39 -08:00
Nikita Shulga	171f265d80	Back out "Revert D25717510: Clean up some type annotations in benchmarks/fastrnns" (#50556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50556 Original commit changeset: 2bcc19cd4340 Test Plan: Soft revert hammer Reviewed By: walterddr, seemethere Differential Revision: D25917129 fbshipit-source-id: e5caad77655789d607b84eee820aa7c960e00f51	2021-01-14 15:15:03 -08:00
Sam Estep	51157e802f	Use separate mypy caches for TestTypeHints cases (#50539 ) Summary: Addresses one of the speed points in https://github.com/pytorch/pytorch/issues/50513 by making the `TestTypeHints` suite much faster when run incrementally. Also fixes an issue (at least on 5834438090a1b3206347e30968e48f44251a53a1) where running that suite repeatedly results in a failure every other run (see the test plan below). Pull Request resolved: https://github.com/pytorch/pytorch/pull/50539 Test Plan: First clear your [`mypy` cache](https://mypy.readthedocs.io/en/stable/command_line.html#incremental-mode): ``` $ rm -r .mypy_cache ``` Then run this twice: ``` $ python test/test_type_hints.py ``` - Before: ``` .... ---------------------------------------------------------------------- Ran 4 tests in 212.340s OK ``` ``` .F.. ====================================================================== FAIL: test_run_mypy (__main__.TestTypeHints) Runs mypy over all files specified in mypy.ini ---------------------------------------------------------------------- Traceback (most recent call last): File "test/test_type_hints.py", line 214, in test_run_mypy self.fail(f"mypy failed: {stdout} {stderr}") AssertionError: mypy failed: torch/quantization/fx/quantize.py:138: error: "Tensor" not callable [operator] Found 1 error in 1 file (checked 1189 source files) ---------------------------------------------------------------------- Ran 4 tests in 199.331s FAILED (failures=1) ``` - After: ``` .... ---------------------------------------------------------------------- Ran 4 tests in 212.815s OK ``` ``` .... ---------------------------------------------------------------------- Ran 4 tests in 5.491s OK ``` Reviewed By: xuzhao9 Differential Revision: D25912363 Pulled By: samestep fbshipit-source-id: dac38c890399193699c57b6c9fa8df06a88aee5d	2021-01-14 14:44:31 -08:00
Bert Maher	468c99fba4	Reapply D25856891: [te] Benchmark comparing fused overhead to unfused (#50543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50543 Original commit changeset: 2d2f07f79986 Was part of a stack that got reverted. This is just a benchmark. ghstack-source-id: 119825594 Test Plan: CI Reviewed By: navahgar Differential Revision: D25912439 fbshipit-source-id: 5d9ca45810fff8931a3cfbd03965e11050180676	2021-01-14 14:17:45 -08:00
Shen Li	30e45bb133	Enable GPU-to-GPU comm in TensorPipeAgent (#44418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44418 This commit uses TensorPipe's cuda_ipc channel to conduct cross-process same-machine GPU-to-GPU communication. On the sender side, `TensorPipeAgent` grabs a stream to each device used by the message, let these streams wait for current streams, and passes the streams to TensorPipe `CudaBuffer`. On the receiver side, it also grabs a stream for each device used in the message, and uses these streams to receive tensors and run user functions. After that, these streams are then used for sending the response back to the sender. When receiving the response, the sender will grab a new set of streams and use them for TensorPipe's `CudaBuffer`. If device maps are provided, `TensorPipeAgent::send` will return a derived class of `CUDAFuture`, which is specifically tailored for RPC Messages. TODOs: 1. Enable sending CUDA RPC to the same process. 2. Add a custom CUDA stream pool. 3. When TensorPipe addressed the error for `cudaPointerGetAttributes()`, remove `cuda:0` context initialization code in `backend_registry.py`. 4. When TensorPipe can detect availability of peer access, enable all tests on platforms without peer access. Differential Revision: D23626207 Test Plan: Imported from OSS Reviewed By: lw Pulled By: mrshenli fbshipit-source-id: d30e89e8a98bc44b8d237807b84e78475c2763f0	2021-01-14 13:55:41 -08:00
Supriya Rao	554a1a70c7	[quant] update embedding module to not store qweight (#50418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50418 previously we were storing the quantized weight as a module attribute, whcih was resulting in the weight getting stored as part of the model. We don't need this since we already store the unpacked weights as part of the model. Test Plan: Before ``` Archive: tmp.pt Length Method Size Cmpr Date Time CRC-32 Name -------- ------ ------- ---- ---------- ----- -------- ---- 586 Stored 586 0% 00-00-1980 00:00 5fefdda0 tmp/extra/producer_info.json 1588700 Stored 1588700 0% 00-00-1980 00:00 04e0da4c tmp/data/0 63548 Stored 63548 0% 00-00-1980 00:00 0ceb1f45 tmp/data/1 63548 Stored 63548 0% 00-00-1980 00:00 517bc3ab tmp/data/2 1588700 Stored 1588700 0% 00-00-1980 00:00 dbe88c73 tmp/data/3 63548 Stored 63548 0% 00-00-1980 00:00 d8dc47c4 tmp/data/4 63548 Stored 63548 0% 00-00-1980 00:00 b9e0c20f tmp/data/5 1071 Stored 1071 0% 00-00-1980 00:00 10dc9350 tmp/data.pkl 327 Defl:N 203 38% 00-00-1980 00:00 dfddb661 tmp/code/__torch__/___torch_mangle_0.py 185 Stored 185 0% 00-00-1980 00:00 308f580b tmp/code/__torch__/___torch_mangle_0.py.debug_pkl 1730 Defl:N 515 70% 00-00-1980 00:00 aa11f799 tmp/code/__torch__/torch/nn/quantized/modules/embedding_ops.py 1468 Defl:N 636 57% 00-00-1980 00:00 779609a6 tmp/code/__torch__/torch/nn/quantized/modules/embedding_ops.py.debug_pkl 0 Stored 0 0% 00-00-1980 00:00 00000000 tmp/code/__torch__/torch/classes/quantized.py 6 Stored 6 0% 00-00-1980 00:00 816d0907 tmp/code/__torch__/torch/classes/quantized.py.debug_pkl 4 Stored 4 0% 00-00-1980 00:00 57092f6d tmp/constants.pkl 2 Stored 2 0% 00-00-1980 00:00 55679ed1 tmp/version -------- ------- --- ------- 3436971 3434800 0% 16 files ``` After ``` Archive: tmp.pt Length Method Size Cmpr Date Time CRC-32 Name -------- ------ ------- ---- ---------- ----- -------- ---- 1588700 Stored 1588700 0% 00-00-1980 00:00 a4da6981 tmp/data/0 63548 Stored 63548 0% 00-00-1980 00:00 74d9b607 tmp/data/1 63548 Stored 63548 0% 00-00-1980 00:00 e346a0c2 tmp/data/2 952 Stored 952 0% 00-00-1980 00:00 eff8706e tmp/data.pkl 375 Defl:N 227 40% 00-00-1980 00:00 96c77b68 tmp/code/__torch__/quantization/test_quantize/___torch_mangle_23.py 228 Defl:N 162 29% 00-00-1980 00:00 6a378113 tmp/code/__torch__/quantization/test_quantize/___torch_mangle_23.py.debug_pkl 1711 Defl:N 509 70% 00-00-1980 00:00 66d8fd61 tmp/code/__torch__/torch/nn/quantized/modules/embedding_ops.py 1473 Defl:N 634 57% 00-00-1980 00:00 beb2323b tmp/code/__torch__/torch/nn/quantized/modules/embedding_ops.py.debug_pkl 0 Stored 0 0% 00-00-1980 00:00 00000000 tmp/code/__torch__/torch/classes/quantized.py 6 Stored 6 0% 00-00-1980 00:00 816d0907 tmp/code/__torch__/torch/classes/quantized.py.debug_pkl 4 Stored 4 0% 00-00-1980 00:00 57092f6d tmp/constants.pkl 2 Stored 2 0% 00-00-1980 00:00 55679ed1 tmp/version -------- ------- --- ------- 1720547 1718292 0% 12 files ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25879879 fbshipit-source-id: e09427a60d4c44dd1a190575e75f3ed9cde6358f	2021-01-14 10:38:06 -08:00
Fritz Obermeyer	3dcf126c31	Validate args in HalfCauchy and HalfNormal (#50492 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50404 Complementary to https://github.com/pytorch/pytorch/issues/50403 This also fixes `HalfCauchy.cdf()`, `HalfNormal.log_prob()`, `HalfNormal.cdf()` and ensures validation is not done twice. cc feynmanliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/50492 Reviewed By: mrshenli Differential Revision: D25909541 Pulled By: neerajprad fbshipit-source-id: 35859633bf5c4fd20995182c599cbcaeb863cf29	2021-01-14 10:16:56 -08:00
Yanli Zhao	7fb935806d	enable CPU tests back (#50490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50490 Right now CPU tests are skipped because it always failed in checking 'torch.cuda.device_count() < int(self.world_size)', enable CPU tests back by checking device count only when cuda is available Test Plan: unit tests, CPU tests are not skipped with this diff Reviewed By: rohan-varma Differential Revision: D25901980 fbshipit-source-id: e6e8afe217604c5f5b3784096509240703813d94	2021-01-14 10:13:55 -08:00
Sam Estep	1ea39094a8	Link to mypy wiki page from CONTRIBUTING.md (#50540 ) Summary: Addresses one of the documentation points in https://github.com/pytorch/pytorch/issues/50513 by making it easier to find our `mypy` wiki page. Also updates the `CONTRIBUTING.md` table of contents and removes some trailing whitespace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50540 Reviewed By: janeyx99 Differential Revision: D25912366 Pulled By: samestep fbshipit-source-id: b305f974700a9d9ebedc0c2cb75c92e72d84882a	2021-01-14 10:05:48 -08:00
Chen Lai	e05882d2a4	Back out "reuse consant from jit" (#50521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50521 Original commit changeset: 9731ec1e0c1d Test Plan: - run `arc focus2 -b pp-ios //xplat/arfx/tracking/segmentation:segmentationApple -a ModelRunner --force-with-bad-commit ` - build via Xcode, run it on an iOS device - Click "Person Segmentation" - Crash observed without the diff patched, and the segmentation image is able to be loaded with this diff patched Reviewed By: husthyc Differential Revision: D25908493 fbshipit-source-id: eef072a8a3434b932cfd0646ee78159f72be5536	2021-01-14 09:50:40 -08:00
Richard Barnes	0be1a24b48	Drop unused imports from caffe2/quantization (#50493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50493 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49974 From ``` ./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/ ``` Test Plan: Sandcastle Tests Reviewed By: xush6528 Differential Revision: D25902417 fbshipit-source-id: aeebafce2c4fb649cdce5cf4fd4c5b3ee19923c0	2021-01-14 09:15:19 -08:00
Jeffrey Wan	ef6be0ec50	Revert D25903846: [pytorch][PR] Structured kernel definition for upsample_nearest2d Test Plan: revert-hammer Differential Revision: D25903846 (`19a8e68d8c`) Original commit changeset: 0059fda9b7d8 fbshipit-source-id: b4a7948088c0329a3605c32b64ed77e060e63fca	2021-01-14 08:44:48 -08:00
Richard Zou	443412e682	Add batched grad testing to gradcheck, turn it on in test_autograd (#49120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49120 This adds a `check_batched_grad=False` option to gradcheck and gradgradcheck. It defaults to False because gradcheck is a public API and I don't want to break any existing non-pytorch users of gradcheck. This: - runs grad twice with two grad outputs, a & b - runs a vmapped grad with torch.stack([a, b]) - compares the results of the above against each other. Furthermore: - `check_batched_grad=True` is set to be the default for gradcheck/gradgradcheck inside of test_autograd.py. This is done by reassigning to the gradcheck object inside test_autograd - I manually added `check_batched_grad=False` to gradcheck instances that don't support batched grad. - I added a denylist for operations that don't support batched grad. Question: - Should we have a testing only gradcheck (e.g., torch.testing.gradcheck) that has different defaults from our public API, torch.autograd.gradcheck? Future: - The future plan for this is to repeat the above for test_nn.py (the autogenerated test will require a denylist) - Finally, we can repeat the above for all pytorch test files that use gradcheck. Test Plan: - run tests Reviewed By: albanD Differential Revision: D25563542 Pulled By: zou3519 fbshipit-source-id: 125dea554abefcef0cb7b487d5400cd50b77c52c	2021-01-14 08:13:23 -08:00
Rong Rong (AI Infra)	0abe7f5ef6	[BE] fix subprocess wrapped test cases reported as failure (#50515 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49901. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50515 Reviewed By: janeyx99 Differential Revision: D25907836 Pulled By: walterddr fbshipit-source-id: f6f3aa4c1222bf866077275d28ba637eeaef10c5	2021-01-14 08:05:40 -08:00
Pavel Belevich	d2c3733ca1	Reorder torch.distributed.rpc.init_rpc docstring arguments (#50419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50419 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D25911561 Pulled By: pbelevich fbshipit-source-id: 62c9a5c3f5ec5eddcbd149821ebdf484ff392158	2021-01-14 07:58:09 -08:00
Mike Ruberry	2639f1d4a6	Revert D25717510: Clean up some type annotations in benchmarks/fastrnns Test Plan: revert-hammer Differential Revision: D25717510 (`7d0eecc666`) Original commit changeset: 4f6431d140e3 fbshipit-source-id: 2bcc19cd434047f3857e0d7e804d34f72e566c30	2021-01-14 07:23:45 -08:00
jonykarki	934805bc49	cleaned up ModuleAttributeError (#50298 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49726 Just cleaned up the unnecessary `ModuleAttributeError` BC-breaking note: `ModuleAttributeError` was added in the previous unsuccessful [PR](https://github.com/pytorch/pytorch/pull/49879) and removed here. If a user catches `ModuleAttributeError` specifically, this will no longer work. They should catch `AttributeError` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50298 Reviewed By: mrshenli Differential Revision: D25907620 Pulled By: jbschlosser fbshipit-source-id: cdfa6b1ea76ff080cd243287c10a9d749a3f3d0a	2021-01-14 06:58:01 -08:00
Mike Ruberry	4ee631cdf0	Revert D25856891: [te] Benchmark comparing fused overhead to unfused Test Plan: revert-hammer Differential Revision: D25856891 (`36ae3feb22`) Original commit changeset: 0e99515ec2e7 fbshipit-source-id: 2d2f07f79986ca7815b9eae63e734db76bdfc0c8	2021-01-14 04:33:35 -08:00
Mike Ruberry	269193f5f5	Revert D25859132: [te] Optimize allocation of kernel outputs Test Plan: revert-hammer Differential Revision: D25859132 (`62f676f543`) Original commit changeset: 8753289339e3 fbshipit-source-id: 580069c7fa7565643d3204f3740e64ac94c4db39	2021-01-14 04:28:29 -08:00
Jeffrey Wan	19a8e68d8c	Structured kernel definition for upsample_nearest2d (#50189 ) Summary: See the structured kernel definition [RFC](https://github.com/pytorch/rfcs/pull/9) for context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50189 Reviewed By: mrshenli Differential Revision: D25903846 Pulled By: soulitzer fbshipit-source-id: 0059fda9b7d86f596ca35d830562dd4b859293a0	2021-01-13 22:48:23 -08:00
Feynman Liang	fc9f013cea	HalfCauchy should ValueError if _validate_args (#50403 ) Summary: Expected: When I run `torch.distributions.HalfCauchy(torch.tensor(1.0), validate_args=True).log_prob(-1)`, I expect a `ValueErro` because that is the behavior of other distributions (e.g. Beta, Bernoulli). Actual: No run-time error is thrown, but a `-inf` log prob is returned. Fixes https://github.com/pytorch/pytorch/issues/50404 --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/pytorch/pytorch/50403) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50403 Reviewed By: mrshenli Differential Revision: D25907131 Pulled By: neerajprad fbshipit-source-id: ceb63537e5850809c8b32cf9db0c99043f381edf	2021-01-13 22:07:49 -08:00
Meghan Lele	52ea372fcb	[tools] Update clang-format linux hash (#50520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50520 Summary The new version of `clang-format` for linux64 that was uploaded to S3 earlier this week was dynamically linked to fbcode's custom platform. A new binary has been uploaded that statically links against `libgcc` and `libstdc++`, which seems to have fixed this issue. Ideally, all libraries would be statically linked. Test Plan `clang-format` workflow passes on this PR and output shows that it successfully downloaded, verified and ran. ``` Created directory /home/runner/work/pytorch/pytorch/.clang-format-bin for clang-format binary Downloading clang-format to /home/runner/work/pytorch/pytorch/.clang-format-bin Reference Hash: 9073602de1c4e1748f2feea5a0782417b20e3043 Actual Hash: 9073602de1c4e1748f2feea5a0782417b20e3043 Using clang-format located at /home/runner/work/pytorch/pytorch/.clang-format-bin/clang-format no modified files to format ``` Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25908868 Pulled By: SplitInfinity fbshipit-source-id: 5667fc5546e5ed0bbf9f36570935d245eb26629b	2021-01-13 20:50:56 -08:00
Ansley Ussery	5ea9584400	Assemble technical overview of FX (#50291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50291 Test Plan: Imported from OSS Reviewed By: pbelevich, SplitInfinity Differential Revision: D25908444 Pulled By: ansley fbshipit-source-id: 9860143a0b6aacbed3207228183829c18d10bfdb	2021-01-13 19:31:58 -08:00
Nikita Shulga	a3f9cf9497	Fix fastrnn benchmark regression introduced by 49946 (#50517 ) Summary: Simply add missing `from typing import List, Tuple` and `from torch import Tensor` Fixes regression introduced by https://github.com/pytorch/pytorch/pull/49946 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50517 Reviewed By: gchanan Differential Revision: D25908379 Pulled By: malfet fbshipit-source-id: a44b96681b6121e61b69f960f81c0cad3f2a8d20	2021-01-13 19:10:11 -08:00
Michael Suo	0b49778666	[package] mangle imported module names (#50049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50049 Rationale and implementation immortalized in a big comment in `torch/package/mangling.md`. This change also allows imported modules to be TorchScripted Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25758625 Pulled By: suo fbshipit-source-id: 77a99dd2024c76716cfa6e59c3855ed590efda8b	2021-01-13 16:32:36 -08:00
Scott Wolchok	4a0d17ba2d	[PyTorch][codemod] Replace immediately-dereferenced expect calls w/expectRef (#50228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50228 `fastmod -m 'expect(<((at\|c10)::)?\w+Type>\s*)->' 'expectRef${1}.'` Presuming it builds, this is a safe change: the result of `expect()` wasn't being saved anywhere, so we didn't need it, so we can take a reference instead of a new `shared_ptr`. ghstack-source-id: 119782961 Test Plan: CI Reviewed By: SplitInfinity Differential Revision: D25837374 fbshipit-source-id: 86757b70b1520e3dbaa141001e7976400cdd3b08	2021-01-13 16:13:55 -08:00
Scott Wolchok	c6cb632c63	[PyTorch] Make SROpFunctor a raw function pointer (#50395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50395 There's no need for these to be `std::function`. ghstack-source-id: 119684828 Test Plan: CI Reviewed By: hlu1 Differential Revision: D25874187 fbshipit-source-id: e9fa3fbc0dca1219ed13904ca704670ce24f7cc3	2021-01-13 15:51:14 -08:00
Scott Wolchok	50256710a0	[PyTorch] Make TensorImpl::empty_tensor_restride non-virtual (#50301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50301 I'm not sure why this is virtual. We don't seem to override it anywhere, and GitHub code search doesn't turn up anything either. ghstack-source-id: 119622058 Test Plan: CI Reviewed By: ezyang Differential Revision: D25856434 fbshipit-source-id: a95a8d738b109b34f2aadf8db5d4b733d679344f	2021-01-13 15:44:21 -08:00
Scott Wolchok	9ebea77299	[PyTorch] Reapply D25687465: Devirtualize TensorImpl::dim() with macro (#50290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50290 This was reverted because it landed after D24772023 (`b73c018598`), which changed the implementation of `dim()`, without rebasing on top of it, and thus broke the build. ghstack-source-id: 119608505 Test Plan: CI Reviewed By: ezyang Differential Revision: D25852810 fbshipit-source-id: 9735a095d539a3a6dc530b7b3bb758d4872d05a8	2021-01-13 15:15:32 -08:00
James Reed	21542b43a8	[FX] Update docstring code/graph printout (#50396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50396 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D25874253 Pulled By: jamesr66a fbshipit-source-id: 6217eadbcbe823db14df25070eef411e184c2273	2021-01-13 15:08:20 -08:00
James Reed	08b6b78c51	[FX] Make FX stability warning reference beta (#50394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50394 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D25874188 Pulled By: jamesr66a fbshipit-source-id: 4fc4e72fec1f3fab770d870fe78cd4ad0f1d6888	2021-01-13 15:06:39 -08:00
Spandan Tiwari	aeefe2ce31	[ONNX] ONNX dev branch merge 01-06-2021 (#50163 ) Summary: [ONNX] ONNX dev branch merge 01-06-2021 - [ONNX] Support onnx if/loop sequence output in opset 13 - (https://github.com/pytorch/pytorch/issues/49270) - Symbolic function for torch.square (https://github.com/pytorch/pytorch/issues/49446) - [ONNX] Add checks in ONNXSetDynamicInputShape (https://github.com/pytorch/pytorch/issues/49783) … - [ONNX] Enable export af aten::__derive_index (https://github.com/pytorch/pytorch/issues/49514) … - [ONNX] Update symbolic for unfold (https://github.com/pytorch/pytorch/issues/49378) … - [ONNX] Update the sequence of initializers in exported graph so that it is as same as inputs. (https://github.com/pytorch/pytorch/issues/49798) - [ONNX] Enable opset 13 ops (https://github.com/pytorch/pytorch/issues/49612) … - [ONNX] Improve error message for supported model input types in ONNX export API. (https://github.com/pytorch/pytorch/issues/50119) - [ONNX] Add a post-pass for If folding (https://github.com/pytorch/pytorch/issues/49410) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50163 Reviewed By: pbelevich Differential Revision: D25821059 Pulled By: SplitInfinity fbshipit-source-id: 9f511a93d9d5812d0ab0a49d61ed0fa5f8066948	2021-01-13 13:51:21 -08:00
Richard Barnes	30a8ba93b1	Remove a blacklist reference (#50477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50477 See task for context Test Plan: Sandcastle+OSS tests Reviewed By: xush6528 Differential Revision: D25893906 fbshipit-source-id: c9b86d0292aa751597d75e8d1b53f99b99c924b9	2021-01-13 13:39:06 -08:00
Sam Estep	7426878981	Exclude test/generated_type_hints_smoketest.py from flake8 (#50497 ) Summary: Similar to https://github.com/pytorch/pytorch/issues/48201, this PR excludes a file that is auto-generated by [`test/test_type_hints.py`](`5834438090/test/test_type_hints.py (L109-L111)`), which doesn't happen to be run before the Flake8 check is done in CI. Also, because the `exclude` list in `.flake8` has gotten fairly long, this PR splits it across multiple lines. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50497 Test Plan: Run this in your shell: ```sh python test/test_type_hints.py TestTypeHints.test_doc_examples flake8 ``` - _Before:_ `flake8` prints [these 169 false positives](https://pastebin.com/qPJY24g8) and returns exit code 1 - _After:_ `flake8` prints no output and returns exit code 0 Reviewed By: mrshenli Differential Revision: D25903177 Pulled By: samestep fbshipit-source-id: 21f757ac8bfa626bb56ece2ecc55668912b71234	2021-01-13 12:30:19 -08:00
Richard Barnes	b89827b73f	Drop unused imports (#49972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49972 From ``` ./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/ ``` Test Plan: Standard sandcastle tests Reviewed By: xush6528 Differential Revision: D25727352 fbshipit-source-id: 6b90717e161aeb1da8df30e67d586101d35d7d5f	2021-01-13 12:26:17 -08:00
Bert Maher	62f676f543	[te] Optimize allocation of kernel outputs (#50318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50318 We can skip the dispatcher and go to the device-specific `at::native::empty_strided` implementation. Also, unpacking the TensorOptions struct at kernel launch time actually takes a bit of work, since the optionals are encoded in a bitfield. Do this upfront and use the optionals directly at runtime. ghstack-source-id: 119735738 Test Plan: Before: ``` ------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------- FusedOverhead 2143 ns 2142 ns 332946 UnfusedOverhead 2277 ns 2276 ns 315130 ``` After: ``` ------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------- FusedOverhead 2175 ns 2173 ns 321877 UnfusedOverhead 2394 ns 2394 ns 307360 ``` (The noise in the baseline makes this really hard to read, it seemed to be about 3-5% faster in my local testing) Reviewed By: eellison Differential Revision: D25859132 fbshipit-source-id: 8753289339e365f78c790bee076026cd649b8509	2021-01-13 12:12:43 -08:00
Bert Maher	36ae3feb22	[te] Benchmark comparing fused overhead to unfused (#50305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50305 That's it ghstack-source-id: 119631533 Test Plan: ``` buck run //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench -- --benchmark_filter=Overhead ``` ``` Run on (24 X 2394.67 MHz CPU s) 2021-01-08 16:06:17 ------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------- FusedOverhead 2157 ns 2157 ns 311314 UnfusedOverhead 2443 ns 2443 ns 311221 ``` Reviewed By: ZolotukhinM Differential Revision: D25856891 fbshipit-source-id: 0e99515ec2e769a04929157d46903759c03182a3	2021-01-13 12:09:37 -08:00
Xiang Gao	48318eba40	Fix TestOpInfoCUDA.test_unsupported_dtypes_addmm_cuda_bfloat16 on ampere (#50440 ) Summary: The `TestOpInfoCUDA.test_unsupported_dtypes_addmm_cuda_bfloat16` in `test_ops.py` is failing on ampere. This is because addmm is supported on Ampere, but the test is asserting that it is not supported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50440 Reviewed By: mrshenli Differential Revision: D25893326 Pulled By: ngimel fbshipit-source-id: afeec25fdd76e7336d84eb53ea36319ade1ab421	2021-01-13 11:25:43 -08:00
Tongzhou Wang	d2e96fcf17	Update loss module doc (#48596 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/48596 Reviewed By: izdeby Differential Revision: D25889748 Pulled By: zou3519 fbshipit-source-id: 9f6e77ba2af4030c8b9ae4afcea6d002a4dae423	2021-01-13 10:41:20 -08:00
Rong Rong (AI Infra)	fc5db4265b	[BE] replace unittest.main with run_tests (#50451 ) Summary: fix https://github.com/pytorch/pytorch/issues/50448. This replaces all `test/*.py` files with run_tests(). This PR does not address test files in the subdirectories because they seems unrelated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50451 Reviewed By: janeyx99 Differential Revision: D25899924 Pulled By: walterddr fbshipit-source-id: f7c861f0096624b2791ad6ef6a16b1c4895cce71	2021-01-13 10:33:08 -08:00
Richard Barnes	a4383a69d4	Clean up some type annotations in caffe2/test (#49943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49943 Upgrades type annotations from Python2 to Python3 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25717534 fbshipit-source-id: 5aedea4db07efca126ffb6daee79617c30a67146	2021-01-13 10:01:55 -08:00
Richard Barnes	7d0eecc666	Clean up some type annotations in benchmarks/fastrnns (#49946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49946 Upgrades type annotations from Python2 to Python3 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25717510 fbshipit-source-id: 4f6431d140e3032b4ca55587f9602aa0ea38c671	2021-01-13 09:57:14 -08:00
Zhijing Li	05542f6222	EMA op (#50393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50393 Exponential Moving Average Usage: add ema_options in adagrad optimizer. For details, plz refer to the test workflow setting. if ema_end == -1, it means ema will never end. Test Plan: buck test caffe2/caffe2/fb/optimizers:ema_op_optimizer_test buck test caffe2/caffe2/fb/optimizers:ema_op_test f240459719 Differential Revision: D25416056 fbshipit-source-id: a25e676a364969e3be2bc47750011c812fc3a62f	2021-01-13 08:58:01 -08:00
Tyler Reddy	4a2d3d1cfd	MAINT: char class regex simplify (#50294 ) Summary: * remove some cases of single characters in character classes--these can incur the overhead of a character class with none of the benefits of a multi-character character class * for more details, see Chapter 6 of: Friedl, Jeffrey. Mastering Regular Expressions. 3rd ed., O’Reilly Media, 2009. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50294 Reviewed By: zhangguanheng66 Differential Revision: D25870912 Pulled By: malfet fbshipit-source-id: 9be5be9ed11fd49876213f0be8121b24739f1c13	2021-01-13 08:48:17 -08:00
Nathan John Sircombe	664126bab5	Enables build with oneDNN (MKL-DNN) on AArch64 (#50400 ) Summary: Since version 1.6, oneDNN has provided limited support for AArch64 builds. This minor change is to detect an AArch64 CPU and permit the use of `USE_MKLDNN` in that case. Build flags for oneDNN are also modified accordingly. Note: oneDNN on AArch64, by default, will use oneDNN's reference C++ kernels. These are not optimised for AArch64, but oneDNN v1.7 onwards provides support for a limited set of primitives based Arm Compute Library. See: https://github.com/oneapi-src/oneDNN/pull/795 and: https://github.com/oneapi-src/oneDNN/pull/820 for more details. Support for ACL-based oneDNN primitives in PyTorch will require some further modification, Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/50400 Reviewed By: izdeby Differential Revision: D25886589 Pulled By: malfet fbshipit-source-id: 2c81277a28ad4528c2d2211381e7c6692d952bc1	2021-01-13 08:41:44 -08:00
Gemfield	deba3bd1d0	Fix TORCH_LIBRARIES variables when do static build (#49458 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/21737 With this fix, TORCH_LIBRARIES variable can provide all nessesary static libraries build from pytorch repo. User program (if do static build) now can just link with ${TORCH_LIBRARIES} + MKL + cuda runtime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49458 Reviewed By: mrshenli Differential Revision: D25895354 Pulled By: malfet fbshipit-source-id: 8ff47d14ae1f90036522654d4354256ed5151e5c	2021-01-13 07:56:27 -08:00
generatedunixname89002005325676	2a603145d7	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D25896704 fbshipit-source-id: c6b112db889aaf31996929829e4989f9562964da	2021-01-13 04:22:15 -08:00
Xiang Gao	4a3a37886c	Fix fft slow tests (#50435 ) Summary: The failure is: ``` ______________________________________________________________________________________________________ TestCommonCUDA.test_variant_consistency_jit_fft_rfft_cuda_float64 _______________________________________________________________________________________________________ ../.local/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:889: in wrapper method(args, kwargs) ../.local/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:889: in wrapper method(args, **kwargs) ../.local/lib/python3.9/site-packages/torch/testing/_internal/common_device_type.py:267: in instantiated_test if op is not None and op.should_skip(generic_cls.__name__, name, _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <torch.testing._internal.common_methods_invocations.SpectralFuncInfo object at 0x7f7375f9b550>, cls_name = 'TestCommon', test_name = 'test_variant_consistency_jit', device_type = 'cuda', dtype = torch.float64 def should_skip(self, cls_name, test_name, device_type, dtype): > for si in self.skips: E TypeError: 'NoneType' object is not iterable ../.local/lib/python3.9/site-packages/torch/testing/_internal/common_methods_invocations.py:186: TypeError ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50435 Reviewed By: izdeby Differential Revision: D25886650 Pulled By: mruberry fbshipit-source-id: 722a45247dc79be86858306cd1b51b0a63df8b37	2021-01-13 01:31:37 -08:00
kshitij12345	057be23168	[doc] Add note about `torch.flip` returning new tensor and not view. (#50041 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/38271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50041 Reviewed By: izdeby Differential Revision: D25883870 Pulled By: mruberry fbshipit-source-id: 33cc28a2176e98f2f29077958782291609c7999b	2021-01-13 01:01:47 -08:00
Scott Wolchok	b54240d200	[PyTorch] Gate tls_local_dispatch_key_set inlining off for Android (#50450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50450 See comment, seems to break things. ghstack-source-id: 119753229 Test Plan: CI Reviewed By: ljk53 Differential Revision: D25892759 fbshipit-source-id: 3b34a384713c77aa28b1ef5807828a08833fd86f	2021-01-12 23:32:12 -08:00
Erjia Guan	ca5d9617ba	Fix remainder type promotion (#48668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48668 Combine tests for `fmod` and `remainder`. ## BC-breaking Note: In order to make `remainder` operator have type promotion, we have to introduce BC breaking. ### 1.7.1: In the case where the second argument is a python number, the result is casted to the dtype of the first argument. ```python >>> torch.remainder(x, 1.2) tensor([0, 0, 0, 0, 0], dtype=torch.int32) ``` ### This PR: In the case where the second argument is a python number, the dtype of result is determined by type promotion of both inputs. ```python >>> torch.remainder(x, 1.2) tensor([1.0000, 0.8000, 0.6000, 0.4000, 0.2000]) ``` Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25869136 Pulled By: ejguan fbshipit-source-id: 8e5e87eec605a15060f715952de140f25644008c	2021-01-12 22:09:30 -08:00
Erjia Guan	a0f7b18391	Fix `fmod` type promotion (#48278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48278 Remove various lines from tests due to no type promotion introduced from #47323 ## BC-breaking Note: In order to make `fmod` operator have type promotion, we have to introduce BC breaking. ### 1.7.1: In the case where the second argument is a python number, the result is casted to the dtype of the first argument. ```python >>> torch.fmod(x, 1.2) tensor([0, 0, 0, 0, 0], dtype=torch.int32) ``` ### Prior PR: Check the BC-breaking note of #47323 ### This PR: In the case where the second argument is a python number, the dtype of result is determined by type promotion of both inputs. ```python >>> torch.fmod(x, 1.2) tensor([1.0000, 0.8000, 0.6000, 0.4000, 0.2000]) ``` Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25869137 Pulled By: ejguan fbshipit-source-id: bce763926731e095b75daf2e934bff7c03ff0832	2021-01-12 22:04:19 -08:00
Nikita Shulga	dea529a779	Add torch.cuda.can_device_access_peer (#50446 ) Summary: And unrelying torch._C._cuda_canDeviceAccessPeer, which is a wrapper around cudaDeviceCanAccessPeer Pull Request resolved: https://github.com/pytorch/pytorch/pull/50446 Reviewed By: mrshenli Differential Revision: D25890405 Pulled By: malfet fbshipit-source-id: ef09405f115bbe73ba301d608d56cd8f8453201b	2021-01-12 20:30:45 -08:00
Pritam Damania	4e248eb3f6	Change watchdog timeout logging from INFO to ERROR. (#50455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50455 Certain systems only print logging messages for ERROR/WARN and the error message that the watchdog is timing out a particular operation is pretty important. As a result, changing its level to ERROR instead of INFO. ghstack-source-id: 119761029 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D25894795 fbshipit-source-id: 259b16c13f6cdf9cb1956602d15784b92aa53f17	2021-01-12 20:15:39 -08:00
Hao Lu	4e76616719	[StaticRuntime][ATen] Add out variant for narrow_copy (#49502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49502 It broke the OSS CI the last time I landed it, mostly cuda tests and python bindings. Similar to permute_out, add the out variant of `aten::narrow` (slice in c2) which does an actual copy. `aten::narrow` creates a view, however, an copy is incurred when we call `input.contiguous` in the ops that follow `aten::narrow`, in `concat_add_mul_replacenan_clip`, `casted_batch_one_hot_lengths`, and `batch_box_cox`. {F351263599} Test Plan: Unit test: ``` buck test //caffe2/aten:math_kernel_test buck test //caffe2/test:sparse -- test_narrow ``` Benchmark with the adindexer model: ``` bs = 1 is neutral Before: I1214 21:32:51.919239 3285258 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0886948. Iters per second: 11274.6 After: I1214 21:32:52.492352 3285277 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0888019. Iters per second: 11261 bs = 20 shows more gains probably because the tensors are bigger and therefore the cost of copying is higher Before: I1214 21:20:19.702445 3227229 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.527563. Iters per second: 1895.51 After: I1214 21:20:20.370173 3227307 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.508734. Iters per second: 1965.67 ``` Reviewed By: ajyu Differential Revision: D25596290 fbshipit-source-id: da2f5a78a763895f2518c6298778ccc4d569462c	2021-01-12 19:35:32 -08:00
Marat Subkhankulov	49896c48e0	Caffe2 Concat operator benchmark (#50449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50449 Port caffe2 operator benchmark from torch.cat to caffe2 concat to measure the difference in performance. previous diff abandoned to rerun github CI tests. D25738076 Test Plan: Tested on devbig by running both pt and c2 benchmarks. Compiled with mode/opt Inputs: ``` size, number of inputs, cat dimension, device ---------------------------------------------------- (1, 1, 1), N: 2, dim: 0, device: cpu (512, 512, 2), N: 2, dim: 1, device: cpu (128, 1024, 2), N: 2, dim: 1, device: cpu (1024, 1024, 2), N: 2, dim: 0, device: cpu (1025, 1023, 2), N: 2, dim: 1, device: cpu (1024, 1024, 2), N: 2, dim: 2, device: cpu [<function <lambda> at 0x7f922718e8c0>, 111, 65], N: 5, dim: 0, device: cpu [96, <function <lambda> at 0x7f9226dad710>, 64], N: 5, dim: 1, device: cpu [128, 64, <function <lambda> at 0x7f91a3625ef0>], N: 5, dim: 2, device: cpu [<function <lambda> at 0x7f91a3625f80>, 32, 64], N: 50, dim: 0, device: cpu [32, <function <lambda> at 0x7f91a3621050>, 64], N: 50, dim: 1, device: cpu [33, 65, <function <lambda> at 0x7f91a36210e0>], N: 50, dim: 2, device: cpu (64, 32, 4, 16, 32), N: 2, dim: 2, device: cpu (16, 32, 4, 16, 32), N: 8, dim: 2, device: cpu (9, 31, 5, 15, 33), N: 17, dim: 4, device: cpu [<function <lambda> at 0x7f91a3621170>], N: 100, dim: 0, device: cpu [<function <lambda> at 0x7f91a3621200>], N: 1000, dim: 0, device: cpu [<function <lambda> at 0x7f91a3621290>], N: 2000, dim: 0, device: cpu [<function <lambda> at 0x7f91a3621320>], N: 3000, dim: 0, device: cpu ``` ``` pytorch: MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 buck-out/gen/caffe2/benchmarks/operator_benchmark/pt/cat_test.par --tag_filter=all caffe2: MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 buck-out/gen/caffe2/benchmarks/operator_benchmark/c2/concat_test.par --tag_filter=all ``` ``` Metric: Forward Execution Time (us) pytorch \| caffe2 -------------------------------- 4.066 \| 0.312 351.507 \| 584.033 184.649 \| 292.157 9482.895 \| 6845.112 9558.988 \| 6847.511 13730.016 \| 14118.505 6324.371 \| 4840.883 4613.497 \| 3702.213 7504.718 \| 7889.751 9882.978 \| 7364.350 10087.076 \| 7483.178 16849.556 \| 18092.295 19181.075 \| 13363.742 19296.508 \| 13466.863 34157.449 \| 56320.073 176.483 \| 267.106 322.247 \| 352.782 480.064 \| 460.214 607.381 \| 476.908 ``` Reviewed By: hlu1 Differential Revision: D25890595 fbshipit-source-id: f53e125c0680bc2ebf722d1da5ec964bec585fdd	2021-01-12 18:27:44 -08:00
Dhruv Matani	af968cd672	[Pytorch Mobile] Remove caching (in code) of interned strings (#50390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50390 Currently, there is a massive switch/case statement that is generated in the `InternedStrings::string()` method to speed up Symbol -> string conversion without taking a lock (mutex). The relative call rate of this on mobile is insignificant, so unlikely to have any material impact on runtime even if the lookups happen under a lock. Plus, parallelism is almost absent on mobile, which is where locks/mutexes cause the most problem (taking a mutex without contention is usually very fast and just adds a memory barrier iirc). The only impact that caching interned strings has is avoiding taking a lock when interned strings are looked up. They are not looked up very often during training, and based on basic testing, they don't seem to be looked up much during inference either. During training, the following strings were looked up at test startup: ``` prim::profile prim::profile_ivalue prim::profile_optional prim::FusionGroup prim::TypeCheck prim::FallbackGraph prim::ChunkSizes prim::ConstantChunk prim::tolist prim::FusedConcat prim::DifferentiableGraph prim::MMBatchSide prim::TensorExprGroup ``` Command used to trigger training: `buck test fbsource//xplat/papaya/client/executor/torch/store/transform/feature/test:test` During inference, the only symbol that was looked up was `tolist`. ghstack-source-id: 119679831 Test Plan: See the summary above + sandcastle tests. ### Size test: fbios ``` D25861786-V1 (https://www.internalfb.com/intern/diff/D25861786/?dest_number=119641372) fbios: Succeeded Change in Download Size for arm64 + 3x assets variation: -13.9 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -41.7 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:747386759232352@base/bsb:747386759232352@diff/ ``` ### Size test: igios ``` D25861786-V1 (https://www.internalfb.com/intern/diff/D25861786/?dest_number=119641372) igios: Succeeded Change in Download Size for arm64 + 3x assets variation: -16.6 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -42.0 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:213166470538954@base/bsb:213166470538954@diff/ ``` Reviewed By: iseeyuan Differential Revision: D25861786 fbshipit-source-id: 34a55d693edc41537300f628877a64723694f8f0	2021-01-12 17:53:18 -08:00
Richard Barnes	8c25b9701b	Type annotations in test/jit (#50293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50293 Switching to type annotations for improved safety and import tracking. Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25853949 fbshipit-source-id: fb873587bb521a0a55021ee4d34d1b05ea8f000d	2021-01-12 16:47:06 -08:00
Ansley Ussery	4c97ef8d77	Create subgraph rewriter (#49540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49540 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25869707 Pulled By: ansley fbshipit-source-id: 93d3889f7ae2ecc5e8cdd7f4fb6b0446dbb3cb31	2021-01-12 16:32:13 -08:00
Guilherme Leobas	374951d102	Add type annotations to torch.nn.modules.padding (#49494 ) Summary: Closes gh-49492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49494 Reviewed By: mruberry Differential Revision: D25723837 Pulled By: walterddr fbshipit-source-id: 92af0100f6d9e2bb25b259f5a7fe9d449ffb6443	2021-01-12 15:34:28 -08:00
Bert Maher	cb37709bee	[te] Create TargetMachine only once with correct options to fix perf (#50406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50406 We were creating different TMs in PytorchLLVMJIT and LLVMCodeGen; the one in LLVMCodeGen had the right target-specific options to generate fast AVX2 code (with FMAs, vbroadcastss, etc.), and that's what was showing up in the debug output, but the LLVMJIT TM was the one that actually generated runtime code, and it was slow. ghstack-source-id: 119700110 Test Plan: ``` buck run mode/opt //caffe2/benchmarks/fb/tensorexpr:tensorexpr_bench ``` With this diff NNC is getting at least somewhat (5%) close to Pytorch with MKL, for at least this one small-ish test case" ``` Run on (24 X 2394.67 MHz CPU s) 2021-01-11 15:57:27 ---------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------------------------------------- Gemm/Torch/128/128/128 65302 ns 65289 ns 10734 GFLOPS=64.2423G/s Gemm/TensorExprTile4x16VecUnroll/128/128/128 68602 ns 68599 ns 10256 GFLOPS=61.1421G/s ``` Reviewed By: bwasti Differential Revision: D25877605 fbshipit-source-id: cd293bac94d025511f348eab5c9b8b16bf6505ec	2021-01-12 15:25:48 -08:00
Zafar	7d28f1c81d	[quant][refactor] Minor refactor of some typos (#50304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50304 Does not include any functional changes -- purely for fixing minor typos in the `fuser_method_mappings.py` Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25857248 Pulled By: z-a-f fbshipit-source-id: 3f9b864b18bda8096e7cd52922dc21be64278887	2021-01-12 15:23:13 -08:00
Zafar	39aac65430	[quant][bug] Fixing the mapping getter to return a copy (#50297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50297 Current implementation has a potential bug: if a user modifies the quantization mappings returned by the getters, the changes will propagate. For example, the bug will manifest itself if the user does the following: ``` my_mapping = get_default_static_quant_module_mappings() my_mapping[nn.Linear] = UserLinearImplementation model_A = convert(model_A, mapping=my_mapping) default_mapping = get_default_static_quant_module_mappings() model_B = convert(model_B, mapping=default_mapping) ``` In that case the `model_B` will be quantized with with the modified mapping. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D25855753 Pulled By: z-a-f fbshipit-source-id: 0149a0c07a965024ba7d1084e89157a9c8fa1192	2021-01-12 15:19:39 -08:00
Facebook Community Bot	412e3f46e9	Automated submodule update: tensorpipe (#50441 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `ac98f40758` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50441 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: mrshenli Differential Revision: D25888666 fbshipit-source-id: fd447f81462f476c62aed0e43830a710f60187e1	2021-01-12 14:17:55 -08:00
Michael Suo	50744cd0f7	[package] better error message when unpickling a mocked obj (#50159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50159 Test Plan: Imported from OSS Reviewed By: tugsbayasgalan Differential Revision: D25809551 Pulled By: suo fbshipit-source-id: 130587e650271cf158f5f5d9e688c622c9006631	2021-01-12 14:11:32 -08:00
jjsjann123	6d947067c9	fixing autodiff to support Optional[Tensor] on inputs (#49430 ) Summary: This PR fixes two local issue for me: 1. Assert failure when passing `None` to `Optional[Tensor]` input that requires gradient in autodiff 2. Wrong vjp mapping on inputs when `requires_grad` flag changes on inputs stack. This PR is to support autodiff on layer_norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49430 Reviewed By: izdeby Differential Revision: D25886211 Pulled By: eellison fbshipit-source-id: 075af35a4a9c0b911838f25146f859897f9a07a7	2021-01-12 14:01:14 -08:00
Gregory Chanan	c198e6c6fa	Stop moving scalars to GPU for one computation in leaky_rrelu_backward. (#50115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50115 There is no way this is performant and we are trying to minimize the usage of scalar_to_tensor(..., device) since it is an anti-pattern, see https://github.com/pytorch/pytorch/issues/49758. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25790331 Pulled By: gchanan fbshipit-source-id: 89d6f016dfd76197541b0fd8da4a462876dbf844	2021-01-12 13:44:30 -08:00
Richard Barnes	cf45d65f1c	Clean up some type annotations in test/jit/...../test_class_type.py (#50156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50156 Upgrades type annotations from Python2 to Python3 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25720035 fbshipit-source-id: 7e1aec34b21f3c9a3e8db9578258d99ffb87e6d4	2021-01-12 13:28:13 -08:00
Jessica Zhao	725640ed84	Check CUDA kernel launches in caffe2/caffe2/utils/math (#50238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50238 Added `C10_CUDA_KERNEL_LAUNCH_CHECK();` after all kernel launches in caffe2/caffe2/utils/math Test Plan: ``` buck build //caffe2/caffe2 ``` {F356531214} files in caffe2/caffe2/utils/math no longer show up when running ``` python3 caffe2/torch/testing/check_kernel_launches.py ``` Reviewed By: r-barnes Differential Revision: D25773299 fbshipit-source-id: 28d67b4b9f57f1fa1e8699e43e9202bad4d42c5f	2021-01-12 13:09:15 -08:00
Xiong Wei	5cdc32bf1c	[vmap] Add batching rules for comparisons ops (#50364 ) Summary: Related to https://github.com/pytorch/pytorch/issues/49562 This PR adds batching rules for the below comparison ops. - torch.eq - torch.gt - torch.ge - torch.le - torch.lt - torch.ne Pull Request resolved: https://github.com/pytorch/pytorch/pull/50364 Reviewed By: anjali411 Differential Revision: D25885359 Pulled By: zou3519 fbshipit-source-id: 58874f24f8d525d8fac9062186b1c9970618ff55	2021-01-12 13:00:56 -08:00
Jan	b2f7ff7d29	Fix MultiheadAttention docstring latex (#50430 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50430 Reviewed By: izdeby Differential Revision: D25885695 Pulled By: zou3519 fbshipit-source-id: 7b017f9c5cdebbc7254c8193305c54003478c343	2021-01-12 12:45:42 -08:00
Elias Ellison	a389b30bfc	Add Post Freezing Optimizations, turn on by default in torch.jit.freeze (#50222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50222 This PR adds a pass which runs a set of optimizations to be done after freezing. Currently this encompasses Conv-BN folding, Conv->Add/Sub/Mul/Div folding and i'm also planning on adding dropout removal. I would like some feedback on the API. torch.jit.freeze is technically in \~prototype\~ phase so we have some leeway around making changes. I think in the majority of cases, the user is going to want to freeze their model, and then run in inference. I would prefer if the optimization was opt-out instead of opt-in. All internal/framework use cases of freezing all use `freeze_module`, not the python API, so this shouldn't break anything. I have separated out the optimization pass as a separate API to make things potentially modular, even though I suspect that is an unlikely case. In a future PR i would like to add a `torch::jit::freeze` which follows the same api as `torch.jit.freeze` intended for C++ use, and runs the optimizations. Test Plan: Imported from OSS Reviewed By: tugsbayasgalan Differential Revision: D25856264 Pulled By: eellison fbshipit-source-id: 56be1f12cfc459b4c4421d4dfdedff8b9ac77112	2021-01-12 11:39:13 -08:00
Elias Ellison	30aeed7c2b	Peephole Optimize out conv(x).dim(), which prevents BN fusion (#50221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50221 Test Plan: Imported from OSS Reviewed By: tugsbayasgalan Differential Revision: D25856266 Pulled By: eellison fbshipit-source-id: ef7054b3d4ebc59a0dd129116d29273be33fe12c	2021-01-12 11:39:09 -08:00
Elias Ellison	a69f008cb7	[JIT] Factor out peephole to own test file (#50220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50220 Test Plan: Imported from OSS Reviewed By: tugsbayasgalan Differential Revision: D25856263 Pulled By: eellison fbshipit-source-id: f3d918d860e64e788e0bb9b9cb85125660f834c6	2021-01-12 11:39:06 -08:00
Elias Ellison	6971149326	[JIT] Add Frozen Conv-> Add/Sub/Mul/Div fusion (#50075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50075 Adds Conv - Add/Sub/Mul/Div fusion for frozen models. This helps cover models like torchvision maskrcnn, which use a hand-rolled batchnorm implementation: `90645ccd0e/torchvision/ops/misc.py (L45)`. I haven't tested results yet but I would expect a somewhat similar speed up as conv-bn fusion (maybe a little less). Test Plan: Imported from OSS Reviewed By: tugsbayasgalan Differential Revision: D25856265 Pulled By: eellison fbshipit-source-id: 2c36fb831a841936fe4446ed440185f59110bf68	2021-01-12 11:39:02 -08:00
Elias Ellison	035229c945	[JIT] Frozen Graph Conv-BN fusion (#50074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50074 Adds Conv-BN fusion for models that have been frozen. I haven't explicitly tested perf yet but it should be equivalent to the results from Chillee's PR [here](https://github.com/pytorch/pytorch/pull/476570) and [here](https://github.com/pytorch/pytorch/pull/47657#issuecomment-725752765). Click on the PR for details but it's a good speed up. In a later PR in the stack I plan on making this optimization on by default as part of `torch.jit.freeze`. I will also in a later PR add a peephole so that there is not conv->batchnorm2d doesn't generate a conditional checking # dims. Zino was working on freezing and left the team, so not really sure who should be reviewing this, but I dont care too much so long as I get a review � Test Plan: Imported from OSS Reviewed By: tugsbayasgalan Differential Revision: D25856261 Pulled By: eellison fbshipit-source-id: da58c4ad97506a09a5c3a15e41aa92bdd7e9a197	2021-01-12 11:37:32 -08:00
Scott Wolchok	b5d3826950	[PyTorch] Devirtualize TensorImpl::sizes() with macro (#50176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50176 UndefinedTensorImpl was the only type that overrode this, and IIUC we don't need to do it. ghstack-source-id: 119609531 Test Plan: CI, internal benchmarks Reviewed By: ezyang Differential Revision: D25817370 fbshipit-source-id: 985a99dcea2e0daee3ca3fc315445b978f3bf680	2021-01-12 10:33:46 -08:00
Shijun Kong	158c98ae49	Add new patterns for ConcatAddMulReplaceNaNClip (#50249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50249 Add a few new patterns for `ConcatAddMulReplaceNanClip` Reviewed By: houseroad Differential Revision: D25843126 fbshipit-source-id: d4987c716cf085f2198234651a2214591d8aacc0	2021-01-12 10:20:01 -08:00
anjali411	5834438090	Enable fast pass tensor_fill for single element complex tensors (#50383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50383 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D25879881 Pulled By: anjali411 fbshipit-source-id: a254cff48ea9a6a38f7ee206815a04c31a9bcab0	2021-01-12 08:40:30 -08:00
Sanchit Jain	6420071b43	Disable complex dispatch on min/max functions (#50347 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50064 PROBLEM: In issue https://github.com/pytorch/pytorch/issues/36377, min/max functions were disabled for complex inputs (via dtype checks). However, min/max kernels are still being compiled and dispatched for complex. FIX: The aforementioned dispatch has been disabled & we now rely on errors produced by dispatch macro to not run those ops on complex, instead of doing redundant dtype checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50347 Reviewed By: zhangguanheng66 Differential Revision: D25870385 Pulled By: anjali411 fbshipit-source-id: 921541d421c509b7a945ac75f53718cd44e77df1	2021-01-12 07:55:18 -08:00
Guilherme Leobas	4411b5ac57	add type annotations to torch.nn.modules.normalization (#49035 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49034 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49035 Test Plan: Imported from GitHub, without a `Test Plan:` line. Force rebased to deal with merge conflicts Reviewed By: zhangguanheng66 Differential Revision: D25767065 Pulled By: walterddr fbshipit-source-id: ffb904e449f137825824e3f43f3775a55e9b011b	2021-01-12 07:40:15 -08:00
Ivan Yashchuk	9384d31af5	Added linalg.pinv (#48399 ) Summary: This PR adds `torch.linalg.pinv`. Changes compared to the original `torch.pinverse`: * New kwarg "hermitian": with `hermitian=True` eigendecomposition is used instead of singular value decomposition. * `rcond` argument can now be a `Tensor` of appropriate shape to apply matrix-wise clipping of singular values. * Added `out=` variant (allocates temporary and makes a copy for now) Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48399 Reviewed By: zhangguanheng66 Differential Revision: D25869572 Pulled By: mruberry fbshipit-source-id: 0f330a91d24ba4e4375f648a448b27594e00dead	2021-01-12 06:52:06 -08:00
dheerajgattupalli	314351d0ef	Fix Error with torch.flip() for cuda tensors when dims=() (#50325 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49982 The method flip_check_errors was being called in cuda file which had a condition to throw an exception for when dims size is <=0 changed that to <0 and added seperate condition for when equal to zero to return from the method... the return was needed because after this point the method was performing check expecting a non-zero size dims ... Also removed the comment/condition written to point to the issue mruberry kshitij12345 please review this once Pull Request resolved: https://github.com/pytorch/pytorch/pull/50325 Reviewed By: zhangguanheng66 Differential Revision: D25869559 Pulled By: mruberry fbshipit-source-id: a831df9f602c60cadcf9f886ae001ad08b137481	2021-01-12 05:41:28 -08:00
kshitij12345	5546a12fe3	remove redundant tests from tensor_op_tests (#50096 ) Summary: All these Unary operators have been an entry in OpInfo DB. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50096 Reviewed By: zhangguanheng66 Differential Revision: D25870048 Pulled By: mruberry fbshipit-source-id: b64e06d5b9ab5a03a202cda8c22fdb7e4ae8adf8	2021-01-12 04:53:12 -08:00
Peter Bell	53473985b8	test_ops: Only run complex gradcheck when complex is supported (#49018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49018 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25868683 Pulled By: mruberry fbshipit-source-id: d8c4d89c11939fc7d81db8190ac6b9b551e4cbf5	2021-01-12 04:48:30 -08:00
Peter Bell	d25c673dfc	Cleanup unnecessary SpectralFuncInfo logic (#48712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48712 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25868675 Pulled By: mruberry fbshipit-source-id: 90b32b27d9a3d79c3754c4a1c0747dbe0f140192	2021-01-12 04:48:27 -08:00
Peter Bell	fb73cc4dc4	Migrate some torch.fft tests to use OpInfos (#48428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48428 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25868666 Pulled By: mruberry fbshipit-source-id: ca6d0c4e44f4c220675dc264a405d960d4b31771	2021-01-12 04:42:54 -08:00
kshitij12345	4da9ceb743	[doc] fix doc formatting for `torch.randperm` and `torch.repeat_interleave` (#50254 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50207 Fixes https://github.com/pytorch/pytorch/issues/50208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50254 Reviewed By: zhangguanheng66 Differential Revision: D25865861 Pulled By: mruberry fbshipit-source-id: 9ae45c443df7cce0d8bfb313f1667ff4d5f6262f	2021-01-12 04:33:59 -08:00
Rohan Varma	78e71ce627	warn user once for possible unnecessary find_unused_params (#50133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50133 `find_unused_parameters=True` is only needed when the model has unused parameters that are not known at model definition time or differ due to control flow. Unfortunately, many DDP users pass this flag in as `True` even when they do not need it, sometimes as a precaution to mitigate possible errors that may be raised (such as the error we raise with not using all outputs).While this is a larger issue to be fixed in DDP, it would also be useful to warn once if we did not detect unused parameters. The downside of this is that in the case of flow control models where the first iteration doesn't have unused params but the rest do, this would be a false warning. However, I think the warning's value exceeds this downside. ghstack-source-id: 119707101 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D25411118 fbshipit-source-id: 9f4a18ad8f45e364eae79b575cb1a9eaea45a86c	2021-01-12 02:55:06 -08:00
Nikita Shulga	8c5b0247a5	Fix PyTorch NEON compilation with gcc-7 (#50389 ) Summary: Apply sebpop patch to correctly inform optimizing compiler about side-effect of missing neon restrictions Allow vec256_float_neon to be used even if compiled by gcc-7 Fixes https://github.com/pytorch/pytorch/issues/47098 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50389 Reviewed By: walterddr Differential Revision: D25872875 Pulled By: malfet fbshipit-source-id: 1fc5dfe68fbdbbb9bfa79ce4be2666257877e85f	2021-01-11 21:51:35 -08:00
Scott Wolchok	c3b4b20627	[PyTorch] List::operator[] can return const ref for Tensor & string (#50083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50083 This should supercede D21966183 (`a371652bc8`) (https://github.com/pytorch/pytorch/pull/39763) and D22830381 (`b44a10c179`) as the way to get fast access to the contents of a `torch::List`. ghstack-source-id: 119675495 Reviewed By: smessmer Differential Revision: D25776232 fbshipit-source-id: 81b4d649105ac9e08fc2c6563806f883809872f4	2021-01-11 20:27:03 -08:00
Tao Xu	4fed585dfa	[MacOS] Add unit tests for Metal ops (#50312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50312 Integrate the operator tests to the MacOS playground app, so that we can run them on Sandcastle ghstack-source-id: 119693035 Test Plan: - `buck test pp-macos` - Sandcastle tests Reviewed By: AshkanAliabadi Differential Revision: D25778981 fbshipit-source-id: 8b5770dfddba0ca19f662894757b2dff66df87e6	2021-01-11 20:15:17 -08:00
Chester Liu	bee6b0be58	Fix warning when running scripts/build_ios.sh (#49457 ) Summary: * Fixes `cmake implicitly converting 'string' to 'STRING' type` * Fixes `clang: warning: argument unused during compilation: '-mfpu=neon-fp16' [-Wunused-command-line-argument]` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49457 Reviewed By: zhangguanheng66 Differential Revision: D25871014 Pulled By: malfet fbshipit-source-id: fa0c181ae7a1b8668e47f5ac6abd27a1c735ffce	2021-01-11 19:31:32 -08:00
Hebo Yang	72c1d9df75	Minor Fix: Double ";" typo in transformerlayer.h (#50300 ) Summary: Fix double ";" typo in transformerlayer.h Pull Request resolved: https://github.com/pytorch/pytorch/pull/50300 Reviewed By: zhangguanheng66 Differential Revision: D25857236 Pulled By: glaringlee fbshipit-source-id: b9b21cfb3ddbff493f6d1c616abe21c5cfb9bce0	2021-01-11 19:25:22 -08:00
Oscar Sandoval	09f4844c1f	Pytorch Distributed RPC Reinforcement Learning Benchmark (Throughput and Latency) (#46901 ) Summary: A Pytorch Distributed RPC benchmark measuring Agent and Observer Throughput and Latency for Reinforcement Learning Pull Request resolved: https://github.com/pytorch/pytorch/pull/46901 Reviewed By: mrshenli Differential Revision: D25869514 Pulled By: osandoval-fb fbshipit-source-id: c3b36b21541d227aafd506eaa8f4e5f10da77c78	2021-01-11 19:02:36 -08:00
Tao Xu	2193544024	[GPU] Clean up the operator tests (#50311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50311 Code clean up ghstack-source-id: 119693032 Test Plan: Sandcastle Reviewed By: husthyc Differential Revision: D25823635 fbshipit-source-id: 5205ebd8a5331c0d1825face034cca10e8b3b535	2021-01-11 18:39:46 -08:00
Tao Xu	a72c6fd6e0	[GPU] Fix the broken strides value for 2d transpose (#50310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50310 Swapping the stride value is OK if the output tensor's storage stays in-contiguous. However, when we copy the result back to CPU, we expect to see a contiguous tensor. ``` >>> x = torch.rand(2,3) >>> x.stride() (3, 1) >>> y = x.t() >>> y.stride() (1, 3) >>> z = y.contiguous() >>> z.stride() (2, 1) ``` ghstack-source-id: 119692581 Test Plan: Sandcastle CI Reviewed By: AshkanAliabadi Differential Revision: D25823665 fbshipit-source-id: 61667c03d1d4dd8692b76444676cc393f808cec8	2021-01-11 18:05:31 -08:00
Guilherme Leobas	5f8e1a1da9	add type annotations to torch.nn.modules.module (#49045 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49044 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49045 Reviewed By: malfet Differential Revision: D25767092 Pulled By: walterddr fbshipit-source-id: a81ba96f3495943af7bb9ee3e5fc4c94c690c405	2021-01-11 17:01:47 -08:00
Pritam Damania	f39f258dfd	Ensure DDP + Pipe works with find_unused_parameters. (#49908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49908 As described in https://github.com/pytorch/pytorch/issues/49891, DDP + Pipe doesn't work with find_unused_parameters. This PR adds a simple fix to enable this functionality. This only currently works for Pipe within a single host and needs to be re-worked once we support cross host Pipe. ghstack-source-id: 119573413 Test Plan: 1) unit tests added. 2) waitforbuildbot Reviewed By: rohan-varma Differential Revision: D25719922 fbshipit-source-id: 948bcc758d96f6b3c591182f1ec631830db1b15c	2021-01-11 16:52:37 -08:00
Gregory Chanan	b001c4cc32	Stop using an unnecessary scalar_to_tensor(..., device) call. (#50114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50114 In this case, the function only dispatches on cpu anyway. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25790155 Pulled By: gchanan fbshipit-source-id: 799dc9a3a38328a531ced9e85ad2b4655533e86a	2021-01-11 16:37:04 -08:00
Tao Xu	ba83aea5ee	[GPU] Calculate strides for metal tensors (#50309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50309 Previously, in order to unblock the dogfooding, we did some hacks to calculate the strides for the output tensor. Now it's time to fix that. ghstack-source-id: 119673688 Test Plan: 1. Sandcastle CI 2. Person segmentation results Reviewed By: AshkanAliabadi Differential Revision: D25821766 fbshipit-source-id: 8c067f55a232b7f102a64b9035ef54c72ebab4d4	2021-01-11 16:26:17 -08:00
Facebook Community Bot	9a3305fdd5	Automated submodule update: tensorpipe (#50369 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `bc5ac93c56` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50369 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: mrshenli Differential Revision: D25867976 Pulled By: lw fbshipit-source-id: 5274aa424e3215b200dcb2c02f342270241dd77d	2021-01-11 16:21:02 -08:00
kshitij12345	bb97503a26	[fix] Indexing.cu: Move call to C10_CUDA_KERNEL_LAUNCH_CHECK to make it reachable (#49283 ) Summary: Fixes Compiler Warning: ``` aten/src/ATen/native/cuda/Indexing.cu(233): warning: loop is not reachable aten/src/ATen/native/cuda/Indexing.cu(233): warning: loop is not reachable aten/src/ATen/native/cuda/Indexing.cu(233): warning: loop is not reachable ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49283 Reviewed By: zhangguanheng66 Differential Revision: D25874613 Pulled By: ngimel fbshipit-source-id: 6e384e89533c1d80f241b7b98fda239c357d1a2c	2021-01-11 15:33:08 -08:00
neerajprad	d76176cc1f	Raise warning during validation when arg_constraints not defined (#50302 ) Summary: After we merged https://github.com/pytorch/pytorch/pull/48743, we noticed that some existing code that subclasses `torch.Distribution` started throwing `NotImplemenedError` since the constraints required for validation checks were not implemented. ```sh File "torch/distributions/distribution.py", line 40, in __init__ for param, constraint in self.arg_constraints.items(): File "torch/distributions/distribution.py", line 92, in arg_constraints raise NotImplementedError ``` This PR throws a UserWarning for such cases instead and gives a better warning message. cc. Balandat Pull Request resolved: https://github.com/pytorch/pytorch/pull/50302 Reviewed By: Balandat, xuzhao9 Differential Revision: D25857315 Pulled By: neerajprad fbshipit-source-id: 0ff9f81aad97a0a184735b1fe3a5d42025c8bcdf	2021-01-11 15:26:53 -08:00
Alban Desmaison	e160362837	Add range assert in autograd engine queue lookup (#50372 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/49652 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50372 Reviewed By: zhangguanheng66 Differential Revision: D25872203 Pulled By: albanD fbshipit-source-id: 8d6f30f17fba856c5c34c08372767349a250983d	2021-01-11 15:16:35 -08:00
Alban Desmaison	7efc212f1f	Add link to tutorial in Timer doc (#50374 ) Summary: Because I have a hard time finding this tutorial every time I need it. So I'm sure other people have the same issue :D Pull Request resolved: https://github.com/pytorch/pytorch/pull/50374 Reviewed By: zhangguanheng66 Differential Revision: D25872173 Pulled By: albanD fbshipit-source-id: f34f719606e58487baf03c73dcbd255017601a09	2021-01-11 15:06:00 -08:00
Eli Uriegas	fd0927035e	.circleci: Remove CUDA 9.2 binary build jobs (#50388 ) Summary: Now that we support CUDA 11 we can remove support for CUDA 9.2 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/50388 Reviewed By: zhangguanheng66 Differential Revision: D25872955 Pulled By: seemethere fbshipit-source-id: 1c10bcc8f4abbc1af1b3180b4cf4a9ea9c7104f9	2021-01-11 14:16:58 -08:00
Meghan Lele	a48640af92	[JIT] Update clang-format hashes (#50399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50399 Summary This commit updates the expected hashes of the `clang-format` binaries downloaded from S3. These binaries themselves have been updated due to having been updated inside fbcode. Test Plan Uploaded new binaries to S3, deleted `.clang-format-bin` and ran `clang_format_all.py`. Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D25875184 Pulled By: SplitInfinity fbshipit-source-id: da483735de1b5f1dab7b070f91848ec5741f00b1	2021-01-11 14:13:45 -08:00
Meghan Lele	4d3c12d37c	[JIT] Print better error when class attribute IValue conversion fails (#50255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50255 Summary TorchScript classes are copied attribute-by-attribute from a py::object into a `jit::Object` in `toIValue`, which is called when copying objects from Python into TorchScript. However, if an attribute of the class cannot be converted, the error thrown is a standard pybind error that is hard to act on. This commit adds code to `toIValue` to convert each attribute to an `IValue` inside a try-catch block, throwing a `cast_error` containing the name of the attribute and the target type if the conversion fails. Test Plan This commit adds a unit test to `test_class_type.py` based on the code in the issue that commit fixes. Fixes This commit fixes #46341. Test Plan: Imported from OSS Reviewed By: pbelevich, tugsbayasgalan Differential Revision: D25854183 Pulled By: SplitInfinity fbshipit-source-id: 69d6e49cce9144af4236b8639d8010a20b7030c0	2021-01-11 14:04:26 -08:00
Ansley Ussery	080a097935	Add docstring for Proxy (#50145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50145 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25854281 Pulled By: ansley fbshipit-source-id: d7af6fd6747728ef04e86fbcdeb87cb0508e1fd8	2021-01-11 13:47:55 -08:00
Ansley Ussery	3d263d1928	Update op replacement tutorial (#50377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50377 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D25870409 Pulled By: ansley fbshipit-source-id: b873b89c2e62b57cd5d816f81361c8ff31be2948	2021-01-11 13:04:38 -08:00
Howard Huang	ec51b67282	Fix elu backward operation for negative alpha (#49272 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47671 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49272 Test Plan: ``` x = torch.tensor([-2, -1, 0, 1, 2], dtype=torch.float32, requires_grad=True) y = torch.nn.functional.elu_(x.clone(), alpha=-2) grads = torch.tensor(torch.ones_like(y)) y.backward(grads) ``` ``` RuntimeError: In-place elu backward calculation is triggered with a negative slope which is not supported. This is caused by calling in-place forward function with a negative slope, please call out-of-place version instead. ``` Reviewed By: albanD Differential Revision: D25569839 Pulled By: H-Huang fbshipit-source-id: e3c6c0c2c810261566c10c0cc184fd81b280c650	2021-01-11 12:52:52 -08:00
Tugsbayasgalan Manlaibaatar	559e2d8816	Implement optimization bisect (#49031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49031 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D25691790 Pulled By: tugsbayasgalan fbshipit-source-id: a9c4ff1142f8a234a4ef5b1045fae842c82c18bf	2021-01-11 12:25:28 -08:00
Jerry Zhang	55ac7e53ae	[quant][graphmode][fx] Support preserved_attributes in prepare_fx (#50306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50306 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D25857747 fbshipit-source-id: fac132fb36ed9cf207aea40429b5bc3f7c72c35d	2021-01-11 12:10:02 -08:00
Michael Carilli	271240ae29	[JIT] Ensure offset is a multiple of 4 to fix "Philox" RNG in jitted kernels (#50169 ) Summary: Immediately-upstreamable part of https://github.com/pytorch/pytorch/pull/50148. This PR fixes what I'm fairly sure is a subtle bug with custom `Philox` class usage in jitted kernels. `Philox` [constructors in kernels](`68a6e46379/torch/csrc/jit/codegen/cuda/codegen.cpp (L102)`) take the cuda rng generator's current offset. The Philox constructor then carries out [`offset/4`](`74c055b240/torch/csrc/jit/codegen/cuda/runtime/random_numbers.cu (L13)`) (a uint64_t division) to compute its internal offset in its virtual Philox bitstream of 128-bit chunks. In other words, it assumes the incoming offset is a multiple of 4. But (in current code) that's not guaranteed. For example, the increments used by [these eager kernels](`74c055b240/aten/src/ATen/native/cuda/Distributions.cu (L171-L216)`) could easily make offset not divisible by 4. I figured the easiest fix was to round all incoming increments up to the nearest multiple of 4 in CUDAGeneratorImpl itself. Another option would be to round the current offset up to the next multiple of 4 at the jit point of use. But that would be a jit-specific offset jump, so jit rng kernels wouldn't have a prayer of being bitwise accurate with eager rng kernels that used non-multiple-of-4 offsets. Restricting the offset to multiples of 4 for everyone at least gives jit rng the chance to match eager rng. (Of course, there are still many other ways the numerics could diverge, like if a jit kernel launches a different number of threads than an eager kernel, or assigns threads to data elements differently.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50169 Reviewed By: mruberry Differential Revision: D25857934 Pulled By: ngimel fbshipit-source-id: 43a75e2d0c8565651b0f12a5694c744fd86ece99	2021-01-11 11:53:48 -08:00
James Reed	d390e3d8b9	[FX] Make graph target printouts more user-friendly (#50296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50296 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25855288 Pulled By: jamesr66a fbshipit-source-id: dd725980fc492526861c2ec234050fbdb814caa8	2021-01-11 11:45:20 -08:00
James Reed	a7e92f120c	[FX} Implement wrap() by patching module globals during symtrace (#50182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50182 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25819730 Pulled By: jamesr66a fbshipit-source-id: 274f4799ad589887ecf3b94f5c24ecbe1bc14b1b	2021-01-11 11:01:15 -08:00
Jerry Zhang	f10e7aad06	[quant][graphmode][fx] Scope support for call_method in QuantizationTracer (#50173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50173 Previously we did not set the qconfig for call_method node correctly since it requires us to know the scope (module path of the module whose forward graph contains the node) of the node. This PR modifies the QuantizationTracer to record the scope information and build a map from call_method Node to module path, which will be used when we construct qconfig_map Test Plan: python test/test_quantization.py TestQuantizeFx.test_qconfig_for_call_method Imported from OSS Reviewed By: vkuzo Differential Revision: D25818132 fbshipit-source-id: ee9c5830f324d24d7cf67e5cd2bf1f6e0e46add8	2021-01-11 10:43:58 -08:00
Ansha Yu	6eb8e83c0b	[aten] embedding_bag_byte_rowwise_offsets_out (#49561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49561 Out variant for embedding_bag_byte_rowwise_offsets Test Plan: ```MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adindexer/merge/traced_merge_dper_fixes.pt --p t_inputs=/data/users/ansha/tmp/adindexer/merge/container_precomputation_bs1.pt --iters=30000 --warmup_iters=10000 --num_threads=1 --pred_net=/data/users/ansha/tmp/adindexer/precomputation_merge_net.pb --c2_inp uts=/data/users/ansha/tmp/adindexer/merge/c2_inputs_precomputation_bs1.pb --c2_sigrid_transforms_opt=1 --c2_use_memonger=1 --c2_apply_nomnigraph_passes --c2_weights=/data/users/ansha/tmp/adindexer/merge/c2_weig hts_precomputation.pb --pt_enable_static_runtime --pt_cleanup_activations=true --pt_enable_out_variant=true --compare_results --do_profile``` Check embedding_bag_byte_rowwise_offsets_out is called in perf Before: 0.081438 After: 0.0783725 Reviewed By: supriyar, hlu1 Differential Revision: D25620718 fbshipit-source-id: 83d5d0dd2e1f60c46e6727f73d5d8b52661b6767	2021-01-11 10:21:05 -08:00
Gregory Chanan	0f412aa293	Move scalar_to_tensor_default_dtype out of ScalarOps.h because it's only useful for torch.where. (#50111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50111 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25789638 Pulled By: gchanan fbshipit-source-id: 4254e11e08606b64e393433ef2c169889ff2ac07	2021-01-11 09:36:29 -08:00
Luca Wehrstedt	186fe48d6e	Format RPC files with clang-format (#50367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50367 This had already been done by mrshenli on Friday (#50236, D25847892 (`f9f758e349`)) but over the weekend Facebook's internal clang-format version got updated and this changed the format, hence we need to re-apply it. Note that this update also affected the JIT files, which are the other module enrolled in clang-format (see `8530c65e25`, D25849205 (`8530c65e25`)). ghstack-source-id: 119656866 Test Plan: Shouldn't include functional changes. In any case, there's CI. Reviewed By: mrshenli Differential Revision: D25867720 fbshipit-source-id: 3723abc6c35831d7a8ac31f74baf24c963c98b9d	2021-01-11 08:59:19 -08:00
Ashkan Aliabadi	acaf091302	Vulkan convolution touchups. (#50329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50329 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D25869147 Pulled By: AshkanAliabadi fbshipit-source-id: b8f393330b68912506fdaefaf62a455dc192e36c	2021-01-11 08:51:57 -08:00
Ralf Gommers	e29082b2a6	Run mypy over test/test_utils.py (#50278 ) Summary: _resubmission of gh-49654, which was reverted due to a cross-merge conflict_ This caught one incorrect annotation in `cpp_extension.load`. xref gh-16574. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50278 Reviewed By: walterddr Differential Revision: D25865278 Pulled By: ezyang fbshipit-source-id: 25489191628af5cf9468136db36f5a0f72d9d54d	2021-01-11 08:16:23 -08:00
Nikita Vedeneev	eb87686511	svd_backward: more memory and computationally efficient. (#50109 ) Summary: As per title. CC IvanYashchuk (unfortunately I cannot add you as a reviewer for some reason). Pull Request resolved: https://github.com/pytorch/pytorch/pull/50109 Reviewed By: gchanan Differential Revision: D25828536 Pulled By: albanD fbshipit-source-id: 3791c3dd4f5c2a2917eac62e6527ecd1edcb400d	2021-01-11 05:28:43 -08:00
Chester Liu	9d8bd216f9	Use Unicode friendly API in fused kernel related code (#49781 ) Summary: See https://github.com/pytorch/pytorch/issues/47422 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49781 Reviewed By: gchanan Differential Revision: D25847993 Pulled By: ezyang fbshipit-source-id: e683a8d5841885857ea3037ac801432a1a3eda68	2021-01-10 20:03:00 -08:00
Taylor Robie	6a3fc0c21c	Treat has_torch_function and object_has_torch_function as static False when scripting (#48966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48966 This PR lets us skip the `if not torch.jit.is_scripting():` guards on `functional` and `nn.functional` by directly registering `has_torch_function` and `object_has_torch_function` to the JIT as statically False. Benchmarks The benchmark script is kind of long. The reason is that it's testing all four PRs in the stack, plus threading and subprocessing so that the benchmark can utilize multiple cores while still collecting good numbers. Both wall times and instruction counts were collected. This stack changes dozens of operators / functions, but very mechanically such that there are only a handful of codepath changes. Each row is a slightly different code path (e.g. testing in Python, testing in the arg parser, different input types, etc.) <details> <summary> Test script </summary> ``` import argparse import multiprocessing import multiprocessing.dummy import os import pickle import queue import random import sys import subprocess import tempfile import time import torch from torch.utils.benchmark import Timer, Compare, Measurement NUM_CORES = multiprocessing.cpu_count() ENVS = { "ref": "HEAD (current)", "torch_fn_overhead_stack_0": "#48963", "torch_fn_overhead_stack_1": "#48964", "torch_fn_overhead_stack_2": "#48965", "torch_fn_overhead_stack_3": "#48966", } CALLGRIND_ENVS = tuple(ENVS.keys()) MIN_RUN_TIME = 3 REPLICATES = { "longer": 1_000, "long": 300, "short": 50, } CALLGRIND_NUMBER = { "overnight": 500_000, "long": 250_000, "short": 10_000, } CALLGRIND_TIMEOUT = { "overnight": 800, "long": 400, "short": 100, } SETUP = """ x = torch.ones((1, 1)) y = torch.ones((1, 1)) w_tensor = torch.ones((1, 1), requires_grad=True) linear = torch.nn.Linear(1, 1, bias=False) linear_w = linear.weight """ TASKS = { "C++: unary `.t()`": "w_tensor.t()", "C++: unary (Parameter) `.t()`": "linear_w.t()", "C++: binary (Parameter) `mul` ": "x + linear_w", "tensor.py: _wrap_type_error_to_not_implemented `__floordiv__`": "x // y", "tensor.py: method `__hash__`": "hash(x)", "Python scalar `__rsub__`": "1 - x", "functional.py: (unary) `unique`": "torch.functional.unique(x)", "functional.py: (args) `atleast_1d`": "torch.functional.atleast_1d((x, y))", "nn/functional.py: (unary) `relu`": "torch.nn.functional.relu(x)", "nn/functional.py: (args) `linear`": "torch.nn.functional.linear(x, w_tensor)", "nn/functional.py: (args) `linear (Parameter)`": "torch.nn.functional.linear(x, linear_w)", "Linear(..., bias=False)": "linear(x)", } def _worker_main(argv, fn): parser = argparse.ArgumentParser() parser.add_argument("--output_file", type=str) parser.add_argument("--single_task", type=int, default=None) parser.add_argument("--length", type=str) args = parser.parse_args(argv) single_task = args.single_task conda_prefix = os.getenv("CONDA_PREFIX") assert torch.__file__.startswith(conda_prefix) env = os.path.split(conda_prefix)[1] assert env in ENVS results = [] for i, (k, stmt) in enumerate(TASKS.items()): if single_task is not None and single_task != i: continue timer = Timer( stmt=stmt, setup=SETUP, sub_label=k, description=ENVS[env], ) results.append(fn(timer, args.length)) with open(args.output_file, "wb") as f: pickle.dump(results, f) def worker_main(argv): _worker_main( argv, lambda timer, _: timer.blocked_autorange(min_run_time=MIN_RUN_TIME) ) def callgrind_worker_main(argv): _worker_main( argv, lambda timer, length: timer.collect_callgrind(number=CALLGRIND_NUMBER[length], collect_baseline=False)) def main(argv): parser = argparse.ArgumentParser() parser.add_argument("--long", action="store_true") parser.add_argument("--longer", action="store_true") args = parser.parse_args(argv) if args.longer: length = "longer" elif args.long: length = "long" else: length = "short" replicates = REPLICATES[length] num_workers = int(NUM_CORES // 2) tasks = list(ENVS.keys()) * replicates random.shuffle(tasks) task_queue = queue.Queue() for _ in range(replicates): envs = list(ENVS.keys()) random.shuffle(envs) for e in envs: task_queue.put((e, None)) callgrind_task_queue = queue.Queue() for e in CALLGRIND_ENVS: for i, _ in enumerate(TASKS): callgrind_task_queue.put((e, i)) results = [] callgrind_results = [] def map_fn(worker_id): # Adjacent cores often share cache and maxing out a machine can distort # timings so we space them out. callgrind_cores = f"{worker_id * 2}-{worker_id * 2 + 1}" time_cores = str(worker_id * 2) _, output_file = tempfile.mkstemp(suffix=".pkl") try: loop_tasks = ( # Callgrind is long running, and then the workers can help with # timing after they finish collecting counts. (callgrind_task_queue, callgrind_results, "callgrind_worker", callgrind_cores, CALLGRIND_TIMEOUT[length]), (task_queue, results, "worker", time_cores, None)) for queue_i, results_i, mode_i, cores, timeout in loop_tasks: while True: try: env, task_i = queue_i.get_nowait() except queue.Empty: break remaining_attempts = 3 while True: try: subprocess.run( " ".join([ "source", "activate", env, "&&", "taskset", "--cpu-list", cores, "python", os.path.abspath(__file__), "--mode", mode_i, "--length", length, "--output_file", output_file ] + ([] if task_i is None else ["--single_task", str(task_i)])), shell=True, check=True, timeout=timeout, ) break except subprocess.TimeoutExpired: # Sometimes Valgrind will hang if there are too many # concurrent runs. remaining_attempts -= 1 if not remaining_attempts: print("Too many failed attempts.") raise print(f"Timeout after {timeout} sec. Retrying.") # We don't need a lock, as the GIL is enough. with open(output_file, "rb") as f: results_i.extend(pickle.load(f)) finally: os.remove(output_file) with multiprocessing.dummy.Pool(num_workers) as pool: st, st_estimate, eta, n_total = time.time(), None, "", len(tasks) * len(TASKS) map_job = pool.map_async(map_fn, range(num_workers)) while not map_job.ready(): n_complete = len(results) if n_complete and len(callgrind_results): if st_estimate is None: st_estimate = time.time() else: sec_per_element = (time.time() - st_estimate) / n_complete n_remaining = n_total - n_complete eta = f"ETA: {n_remaining * sec_per_element:.0f} sec" print( f"\r{n_complete} / {n_total} " f"({len(callgrind_results)} / {len(CALLGRIND_ENVS) * len(TASKS)}) " f"{eta}".ljust(40), end="") sys.stdout.flush() time.sleep(2) total_time = int(time.time() - st) print(f"\nTotal time: {int(total_time // 60)} min, {total_time % 60} sec") desc_to_ind = {k: i for i, k in enumerate(ENVS.values())} results.sort(key=lambda r: desc_to_ind[r.description]) # TODO: Compare should be richer and more modular. compare = Compare(results) compare.trim_significant_figures() compare.colorize(rowwise=True) # Manually add master vs. overall relative delta t. merged_results = { (r.description, r.sub_label): r for r in Measurement.merge(results) } cmp_lines = str(compare).splitlines(False) print(cmp_lines[0][:-1] + "-" * 15 + "]") print(f"{cmp_lines[1]} \|{'':>10}\u0394t") print(cmp_lines[2] + "-" * 15) for l, t in zip(cmp_lines[3:3 + len(TASKS)], TASKS.keys()): assert l.strip().startswith(t) t0 = merged_results[(ENVS["ref"], t)].median t1 = merged_results[(ENVS["torch_fn_overhead_stack_3"], t)].median print(f"{l} \|{'':>5}{(t1 / t0 - 1) * 100:>6.1f}%") print("\n".join(cmp_lines[3 + len(TASKS):])) counts_dict = { (r.task_spec.description, r.task_spec.sub_label): r.counts(denoise=True) for r in callgrind_results } def rel_diff(x, x0): return f"{(x / x0 - 1) * 100:>6.1f}%" task_pad = max(len(t) for t in TASKS) print(f"\n\nInstruction % change (relative to `{CALLGRIND_ENVS[0]}`)") print(" " * (task_pad + 8) + (" " * 7).join([ENVS[env] for env in CALLGRIND_ENVS[1:]])) for t in TASKS: values = [counts_dict[(ENVS[env], t)] for env in CALLGRIND_ENVS] print(t.ljust(task_pad + 3) + " ".join([ rel_diff(v, values[0]).rjust(len(ENVS[env]) + 5) for v, env in zip(values[1:], CALLGRIND_ENVS[1:])])) print("\033[4m" + " Instructions per invocation".ljust(task_pad + 3) + " ".join([ f"{v // CALLGRIND_NUMBER[length]:.0f}".rjust(len(ENVS[env]) + 5) for v, env in zip(values[1:], CALLGRIND_ENVS[1:])]) + "\033[0m") print() import pdb pdb.set_trace() if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--mode", type=str, choices=("main", "worker", "callgrind_worker"), default="main") args, remaining = parser.parse_known_args() if args.mode == "main": main(remaining) elif args.mode == "callgrind_worker": callgrind_worker_main(remaining) else: worker_main(remaining) ``` </details> Wall time <img width="1178" alt="Screen Shot 2020-12-12 at 12 28 13 PM" src="https://user-images.githubusercontent.com/13089297/101994419-284f6a00-3c77-11eb-8dc8-4f69a890302e.png"> <details> <summary> Longer run (`python test.py --long`) is basically identical. </summary> <img width="1184" alt="Screen Shot 2020-12-12 at 5 02 47 PM" src="https://user-images.githubusercontent.com/13089297/102000425-2350e180-3c9c-11eb-999e-a95b37e9ef54.png"> </details> Callgrind <img width="936" alt="Screen Shot 2020-12-12 at 12 28 54 PM" src="https://user-images.githubusercontent.com/13089297/101994421-2e454b00-3c77-11eb-9cd3-8cde550f536e.png"> Test Plan: existing unit tests. Reviewed By: ezyang Differential Revision: D25590731 Pulled By: robieta fbshipit-source-id: fe05305ff22b0e34ced44b60f2e9f07907a099dd	2021-01-10 19:23:38 -08:00
Taylor Robie	d31a760be4	move has_torch_function to C++, and make a special case object_has_torch_function (#48965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48965 This PR pulls `__torch_function__` checking entirely into C++, and adds a special `object_has_torch_function` method for ops which only have one arg as this lets us skip tuple construction and unpacking. We can now also do away with the Python side fast bailout for `Tensor` (e.g. `if any(type(t) is not Tensor for t in tensors) and has_torch_function(tensors)`) because they're actually slower than checking with the Python C API. Test Plan: Existing unit tests. Benchmarks are in #48966 Reviewed By: ezyang Differential Revision: D25590732 Pulled By: robieta fbshipit-source-id: 6bd74788f06cdd673f3a2db898143d18c577eb42	2021-01-10 19:23:35 -08:00
Taylor Robie	632a4401a6	clean up imports for tensor.py (#48964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48964 Stop importing overrides within methods now that the circular dependency is gone, and also organize the imports while I'm at it because they're a jumbled mess. Test Plan: Existing unit tests. Benchmarks are in #48966 Reviewed By: ngimel Differential Revision: D25590730 Pulled By: robieta fbshipit-source-id: 4fa929ce8ff548500f3e55d0475f3f22c1fccc04	2021-01-10 19:23:32 -08:00
Taylor Robie	839c2f235f	treat Parameter the same way as Tensor (#48963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48963 This PR makes the binding code treat `Parameter` the same way as `Tensor`, unlike all other `Tensor` subclasses. This does change the semantics of `THPVariable_CheckExact`, but it isn't used much and it seemed to make sense for the half dozen or so places that it is used. Test Plan: Existing unit tests. Benchmarks are in #48966 Reviewed By: ezyang Differential Revision: D25590733 Pulled By: robieta fbshipit-source-id: 060ecaded27b26e4b756898eabb9a94966fc9840	2021-01-10 19:18:31 -08:00
Wanchao Liang	fd92bcfe39	Use FileStore in TorchScript for store registry (#50248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50248 make the FileStore path also use TorchScript when it's needed. Test Plan: wait for sandcastle. Reviewed By: zzzwen Differential Revision: D25842651 fbshipit-source-id: dec941e895a33ffde42c877afcaf64b5aecbe098	2021-01-10 18:50:56 -08:00
Facebook Community Bot	92fcb59feb	Automated submodule update: tensorpipe (#50267 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `03e0711889` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50267 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: gchanan Differential Revision: D25848309 Pulled By: mrshenli fbshipit-source-id: c77adbad73c5b3b4b7d4e79953a797621dc11e5c	2021-01-10 13:36:57 -08:00
Tugsbayasgalan Manlaibaatar	26cc630789	Allow arbitrary docstrings to be inside torchscript interface methods (#50271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50271 Test Plan: new python test case Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D25853916 fbshipit-source-id: adc31e11331a97d08b5bc3f535f185da268554d1	2021-01-10 10:56:30 -08:00
Ivan Yashchuk	4774c6800b	Added linalg.inv (#48261 ) Summary: This PR adds `torch.linalg.inv` for NumPy compatibility. `linalg_inv_out` uses in-place operations on provided `result` tensor. I modified `apply_inverse` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_inv_out` but removing the error checks and device memory synchronization. I fixed `lda` (leading dimension parameter which is max(1, n)) in many places to handle 0x0 matrices correctly. Zero batch dimensions are also working and tested. Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48261 Reviewed By: gchanan Differential Revision: D25849590 Pulled By: mruberry fbshipit-source-id: cfee6f1daf7daccbe4612ec68f94db328f327651	2021-01-10 04:00:51 -08:00
Sameer Deshmukh	375c30a717	Avg pool 0 dim acceptance. (#50008 ) Summary: Reopen https://github.com/pytorch/pytorch/pull/47426 since it failed for XLA tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50008 Reviewed By: mruberry Differential Revision: D25857687 Pulled By: ngimel fbshipit-source-id: 8bd47a17b417b20089cf003173d8c0793be58c72	2021-01-09 21:46:05 -08:00
Andres Suarez	8530c65e25	[codemod][fbcode/caffe2] Apply clang-format update fixes Test Plan: Sandcastle and visual inspection. Reviewed By: igorsugak Differential Revision: D25849205 fbshipit-source-id: ef664c1ad4b3ee92d5c020a5511b4ef9837a09a0	2021-01-09 14:37:36 -08:00
Chen Lai	d4c1684cf5	reuse consant from jit (#49916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49916 Test Plan: 1. Build pytorch locally. `MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ USE_CUDA=0 DEBUG=1 MAX_JOBS=16 python setup.py develop` 2. Run `python save_lite.py` ``` import torch # ~/Documents/pytorch/data/dog.jpg model = torch.hub.load('pytorch/vision:v0.6.0', 'shufflenet_v2_x1_0', pretrained=True) model.eval() # sample execution (requires torchvision) from PIL import Image from torchvision import transforms import pathlib import tempfile import torch.utils.mobile_optimizer input_image = Image.open('~/Documents/pytorch/data/dog.jpg') preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) input_tensor = preprocess(input_image) input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model # move the input and model to GPU for speed if available if torch.cuda.is_available(): input_batch = input_batch.to('cuda') model.to('cuda') with torch.no_grad(): output = model(input_batch) # Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes print(output[0]) # The output has unnormalized scores. To get probabilities, you can run a softmax on it. print(torch.nn.functional.softmax(output[0], dim=0)) traced = torch.jit.trace(model, input_batch) sum(p.numel() * p.element_size() for p in traced.parameters()) tf = pathlib.Path('~/Documents/pytorch/data/data/example_debug_map_with_tensorkey.ptl') torch.jit.save(traced, tf.name) print(pathlib.Path(tf.name).stat().st_size) traced._save_for_lite_interpreter(tf.name) print(pathlib.Path(tf.name).stat().st_size) print(tf.name) ``` 3. Run `python test_lite.py` ``` import torch from torch.jit.mobile import _load_for_lite_interpreter # sample execution (requires torchvision) from PIL import Image from torchvision import transforms input_image = Image.open('~/Documents/pytorch/data/dog.jpg') preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) input_tensor = preprocess(input_image) input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model reload_lite_model = _load_for_lite_interpreter('~/Documents/pytorch/experiment/example_debug_map_with_tensorkey.ptl') with torch.no_grad(): output_lite = reload_lite_model(input_batch) # Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes print(output_lite[0]) # The output has unnormalized scores. To get probabilities, you can run a softmax on it. print(torch.nn.functional.softmax(output_lite[0], dim=0)) ``` 4. Compare the result with pytorch in master and pytorch built locally with this change, and see the same output. 5. The model size was 16.1 MB and becomes 12.9 with this change. Imported from OSS Reviewed By: kimishpatel, iseeyuan Differential Revision: D25731596 Pulled By: cccclai fbshipit-source-id: 9731ec1e0c1d5dc76cfa374d2ad3d5bb10990cf0	2021-01-08 22:39:28 -08:00
Ansley Ussery	ba1ce71cd1	Document single op replacement (#50116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50116 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D25803457 Pulled By: ansley fbshipit-source-id: de2f3c0bd037859117dde55ba677fb5da34ab639	2021-01-08 21:01:18 -08:00
Thomas Viehmann	ea087e2d92	JIT: guard DifferentiableGraph node (#49433 ) Summary: This adds guarding for DifferentiableGraph nodes in order to not depend on Also bailing out on required gradients for the CUDA fuser. Fixes https://github.com/pytorch/pytorch/issues/49299 I still need to look into a handful of failing tests, but maybe it can be a discussion basis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49433 Reviewed By: ngimel Differential Revision: D25681374 Pulled By: Krovatkin fbshipit-source-id: 8e7be53a335c845560436c0cceeb5e154c9cf296	2021-01-08 20:01:27 -08:00
kshitij12345	36ddb00240	[fix] torch.cat: Don't resize out if it is already of the correct size. (#49937 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49878 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49937 Reviewed By: mruberry Differential Revision: D25851564 Pulled By: ngimel fbshipit-source-id: 9a78922642d5bace70d887a88fa9e92d88038120	2021-01-08 18:10:49 -08:00
Jane Xu	c2d37cd990	Change CMake config to enable universal binary for Mac (#50243 ) Summary: This PR is a step towards enabling cross compilation from x86_64 to arm64. The following has been added: 1. When cross compilation is detected, compile a local universal fatfile to use as protoc. 2. For the simple compile check in MiscCheck.cmake, make sure to compile the small snippet as a universal binary in order to run the check. Test plan: Kick off a minimal build on a mac intel machine with the macOS 11 SDK with this command: ``` CMAKE_OSX_ARCHITECTURES=arm64 USE_MKLDNN=OFF USE_QNNPACK=OFF USE_PYTORCH_QNNPACK=OFF BUILD_TEST=OFF USE_NNPACK=OFF python setup.py install ``` (If you run the above command before this change, or without macOS 11 SDK set up, it will fail.) Then check the platform of the built binaries using this command: ``` lipo -info build/lib/libfmt.a ``` Output: - Before this PR, running a regular build via `python setup.py install` (instead of using the flags listed above): ``` Non-fat file: build/lib/libfmt.a is architecture: x86_64 ``` - Using this PR: ``` Non-fat file: build/lib/libfmt.a is architecture: arm64 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50243 Reviewed By: malfet Differential Revision: D25849955 Pulled By: janeyx99 fbshipit-source-id: e9853709a7279916f66aa4c4e054dfecced3adb1	2021-01-08 17:26:08 -08:00
Yuxin Wu	49bb0a30e8	Support scripting classmethod called with object instances (#49967 ) Summary: Currentlt classmethods are compiled the same way as methods - the first argument is self. Adding a fake statement to assign the first argument to the class. This is kind of hacky, but that's all it takes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49967 Reviewed By: gchanan Differential Revision: D25841378 Pulled By: ppwwyyxx fbshipit-source-id: 0f3657b4c9d5d2181d658f9bade9bafc72de33d8	2021-01-08 16:54:46 -08:00
Ashkan Aliabadi	1c12cbea90	Optimize Vulkan command buffer submission rate. (#49112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49112 Differential Revision: D25729889 Test Plan: Imported from OSS Reviewed By: SS-JIA Pulled By: AshkanAliabadi fbshipit-source-id: c4ab470fdcf3f83745971986f3a44a3dff69287f	2021-01-08 16:39:22 -08:00
Guilherme Leobas	aa18d17455	add type annotations to torch.nn.modules.fold (#49479 ) Summary: closes gh-49478 Fixes https://github.com/pytorch/pytorch/issues/49478 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49479 Reviewed By: mruberry Differential Revision: D25723838 Pulled By: walterddr fbshipit-source-id: 45c4cbd6f147b6dc4a5f5419c17578c49c201022	2021-01-08 13:52:14 -08:00
Alex Henrie	2c4b6ec457	Unused exception variables (#50181 ) Summary: These unused variables were identified by [pyflakes](https://pypi.org/project/pyflakes/). They can be safely removed to simplify the code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50181 Reviewed By: gchanan Differential Revision: D25844270 fbshipit-source-id: 0e648ffe8c6db6daf56788a13ba89806923cbb76	2021-01-08 13:33:18 -08:00
Antonio Cuni	8f31621f78	Fix MKL builds on Ubuntu (#50212 ) Summary: This fixes https://github.com/pytorch/pytorch/issues/50211 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50212 Reviewed By: janeyx99 Differential Revision: D25850876 Pulled By: walterddr fbshipit-source-id: be138db3ae370c45f5fbf3af486cf8b32518df87	2021-01-08 13:16:30 -08:00
Heitor Schueroff	1bb7d8ff93	Revert D25717504: Clean up some type annotations in test/jit Test Plan: revert-hammer Differential Revision: D25717504 (`a4f30d48d8`) Original commit changeset: 9a83c44db02e fbshipit-source-id: e6e3a83bed22701d8125f5a293dfcd5093c1a2cd	2021-01-08 12:14:48 -08:00
Shen Li	f9f758e349	Apply clang-format to rpc cpp files (#50236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50236 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25847892 Pulled By: mrshenli fbshipit-source-id: b4af1221acfcaba8903c629869943abbf877e04e	2021-01-08 11:47:43 -08:00
Ailing Zhang	0bb341daaa	Dump state when hitting ambiguous_autogradother_kernel. (#50246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50246 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25843205 Pulled By: ailzhang fbshipit-source-id: 66916ae477a4ae97e1695227fc6af78c4f328ea3	2021-01-08 11:31:54 -08:00
Thomas Zhang	d78b638a31	Convert string => raw strings so char classes can be represented in Python regex (#50239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50239 Convert regex strings that have character classes (e.g. \d, \s, \w, \b, etc) into raw strings so they won't be interpreted as escape characters. References: Python RegEx - https://www.w3schools.com/python/python_regex.asp Python Escape Chars - https://www.w3schools.com/python/gloss_python_escape_characters.asp Python Raw String - https://www.journaldev.com/23598/python-raw-string Python RegEx Docs - https://docs.python.org/3/library/re.html Python String Tester - https://www.w3schools.com/python/trypython.asp?filename=demo_string_escape Python Regex Tester - https://regex101.com/ Test Plan: To find occurrences of regex strings with the above issue in VS Code, search using the regex \bre\.[a-z]+\(['"], and under 'files to include', use /data/users/your_username/fbsource/fbcode/caffe2. Reviewed By: r-barnes Differential Revision: D25813302 fbshipit-source-id: df9e23c0a84c49175eaef399ca6d091bfbeed936	2021-01-08 11:17:17 -08:00
kshitij12345	5d45140d68	[numpy] torch.{all/any} : output dtype is always bool (#47878 ) Summary: BC-breaking note: This PR changes the behavior of the any and all functions to always return a bool tensor. Previously these functions were only defined on bool and uint8 tensors, and when called on uint8 tensors they would also return a uint8 tensor. (When called on a bool tensor they would return a bool tensor.) PR summary: https://github.com/pytorch/pytorch/pull/44790#issuecomment-725596687 Fixes 2 and 3 Also Fixes https://github.com/pytorch/pytorch/issues/48352 Changes * Output dtype is always `bool` (consistent with numpy) BC Breaking (Previously used to match the input dtype) * Uses vectorized version for all dtypes on CPU * Enables test for complex * Update doc for `torch.all` and `torch.any` TODO * [x] Update docs * [x] Benchmark * [x] Raise issue on XLA Pull Request resolved: https://github.com/pytorch/pytorch/pull/47878 Reviewed By: albanD Differential Revision: D25714324 Pulled By: mruberry fbshipit-source-id: a87345f725297524242d69402dfe53060521ea5d	2021-01-08 11:05:39 -08:00
Richard Barnes	a4f30d48d8	Clean up some type annotations in test/jit (#50158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50158 Upgrades type annotations from Python2 to Python3 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25717504 fbshipit-source-id: 9a83c44db02ec79f353862255732873f6d7f885e	2021-01-08 10:56:55 -08:00
Nikita Shulga	81778e2811	[onnx] Do not deref nullptr in scalar type analysis (#50237 ) Summary: Apply a little bit of defensive programming: `type->cast<TensorType>()` returns an optional pointer so dereferencing it can lead to a hard crash. Fixes SIGSEGV reported in https://github.com/pytorch/pytorch/issues/49959 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50237 Reviewed By: walterddr Differential Revision: D25839675 Pulled By: malfet fbshipit-source-id: 403d6df5e2392dd6adc308b1de48057f2f9d77ab	2021-01-08 10:07:30 -08:00
Antonio Cuni	b5ab0a7f78	Improve torch.linalg.qr (#50046 ) Summary: This is a follow up of PR https://github.com/pytorch/pytorch/issues/47764 to fix the remaining details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50046 Reviewed By: zou3519 Differential Revision: D25825557 Pulled By: mruberry fbshipit-source-id: b8e335e02265e73484a99b0189e4cc042828e0a9	2021-01-08 09:52:31 -08:00
Gregory Chanan	88bd69b488	Stop using c10::scalar_to_tensor in float_power. (#50105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50105 There should be no functional change here. A couple of reasons here: 1) This function is generally an anti-pattern (https://github.com/pytorch/pytorch/issues/49758) and it is good to minimize its usage in the code base. 2) pow itself has a fair amount of smarts like not broadcasting scalar/tensor combinations and we should defer to it. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25786172 Pulled By: gchanan fbshipit-source-id: 89de03aa0b900ce011a62911224a5441f15e331a	2021-01-08 09:44:15 -08:00
Guilherme Leobas	55919a4758	add type annotations to torch.nn.quantized.modules.conv (#49702 ) Summary: closes gh-49700 No mypy issues were found in the first three entries deleted from `mypy.ini`: ``` [mypy-torch.nn.qat.modules.activations] ignore_errors = True [mypy-torch.nn.qat.modules.conv] ignore_errors = True [mypy-torch.nn.quantized.dynamic.modules.linear] ignore_errors = True ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49702 Reviewed By: walterddr, zou3519 Differential Revision: D25767119 Pulled By: ezyang fbshipit-source-id: cb83e53549a299538e1b154cf8b79e3280f7392a	2021-01-08 07:31:39 -08:00
Tongzhou Wang	54ce171f16	Fix persistent_workers + pin_memory (#48543 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48370 https://github.com/pytorch/pytorch/issues/47445 cc emcastillo who authored the original functionality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48543 Reviewed By: bdhirsh Differential Revision: D25277474 Pulled By: ejguan fbshipit-source-id: 1967002124fb0fff57caca8982bc7df359a059a2	2021-01-08 07:04:10 -08:00
Xiang Gao	d00acebd14	Add tensor.view(dtype) (#47951 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42571 Note that this functionality is a subset of [`numpy.ndarray.view`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.view.html): - this only supports viewing a tensor as a dtype with the same number of bytes - this does not support viewing a tensor as a subclass of `torch.Tensor` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47951 Reviewed By: ngimel Differential Revision: D25062301 Pulled By: mruberry fbshipit-source-id: 9fefaaef77f15d5b863ccd12d836932983794475	2021-01-08 06:55:21 -08:00
Antonio Cuni	5c5abd591d	Implement torch.linalg.svd (#45562 ) Summary: This is related to https://github.com/pytorch/pytorch/issues/42666 . I am opening this PR to have the opportunity to discuss things. First, we need to consider the differences between `torch.svd` and `numpy.linalg.svd`: 1. `torch.svd` takes `some=True`, while `numpy.linalg.svd` takes `full_matrices=True`, which is effectively the opposite (and with the opposite default, too!) 2. `torch.svd` returns `(U, S, V)`, while `numpy.linalg.svd` returns `(U, S, VT)` (i.e., V transposed). 3. `torch.svd` always returns a 3-tuple; `numpy.linalg.svd` returns only `S` in case `compute_uv==False` 4. `numpy.linalg.svd` also takes an optional `hermitian=False` argument. I think that the plan is to eventually deprecate `torch.svd` in favor of `torch.linalg.svd`, so this PR does the following: 1. Rename/adapt the old `svd` C++ functions into `linalg_svd`: in particular, now `linalg_svd` takes `full_matrices` and returns `VT` 2. Re-implement the old C++ interface on top of the new (by negating `full_matrices` and transposing `VT`). 3. The C++ version of `linalg_svd` always returns a 3-tuple (we can't do anything else). So, there is a python wrapper which manually calls `torch._C._linalg.linalg_svd` to tweak the return value in case `compute_uv==False`. Currently, `linalg_svd_backward` is broken because it has not been adapted yet after the `V ==> VT` change, but before continuing and spending more time on it I wanted to make sure that the general approach is fine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45562 Reviewed By: H-Huang Differential Revision: D25803557 Pulled By: mruberry fbshipit-source-id: 4966f314a0ba2ee391bab5cda4563e16275ce91f	2021-01-08 06:46:16 -08:00
Alban Desmaison	006cfebf3d	Update autograd related comments (#50166 ) Summary: Remove outdated comment and update to use new paths. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50166 Reviewed By: zou3519 Differential Revision: D25824942 Pulled By: albanD fbshipit-source-id: 7dc694891409e80e1804eddcdcc50cc21b60f822	2021-01-08 06:37:57 -08:00
kshitij12345	9f832c8d3e	[numpy] torch.exp: promote integer inputs to float (#50093 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50093 Reviewed By: H-Huang Differential Revision: D25803549 Pulled By: mruberry fbshipit-source-id: e6f245b5e728f2dca6072f8c359f03dff63aa14d	2021-01-08 06:30:18 -08:00
Alban Desmaison	fc2ead0944	Autograd engine, only enqueue task when it is fully initialized (#50164 ) Summary: This solves a race condition where the worker thread might see a partially initialized graph_task Fixes https://github.com/pytorch/pytorch/issues/49652 I don't know how to reliably trigger the race so I didn't add any test. But the rocm build flakyness (it just happens to race more often on rocm builds) should disappear after this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50164 Reviewed By: zou3519 Differential Revision: D25824954 Pulled By: albanD fbshipit-source-id: 6a3391753cb2afd2ab415d3fb2071a837cc565bb	2021-01-08 05:30:11 -08:00
Lucian Grijincu	c215ffb6a2	Revert D25687465: [PyTorch] Devirtualize TensorImpl::dim() with macro Test Plan: revert-hammer Differential Revision: D25687465 (`4de6b279c8`) Original commit changeset: 89aabce165a5 fbshipit-source-id: fa5def17209d1691e68b1245fa0873fd03e88eaa	2021-01-07 22:07:42 -08:00
Rohan Varma	294b7867eb	Address clang-tidy warnings in ProcessGroupNCCL (#50131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50131 Noticed that in the internal diff for https://github.com/pytorch/pytorch/pull/49069 there was a clang-tidy warning to use emplace instead of push_back. This can save us a copy as it eliminates the unnecessary in-place construction ghstack-source-id: 119560979 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D25800134 fbshipit-source-id: 243e57318f5d6e43de524d4e5409893febe6164c	2021-01-07 21:29:28 -08:00
Xiao Wang	5a63c452e6	Disable cuDNN persistent RNN on sm_86 devices (#49534 ) Summary: Excludes sm_86 GPU devices from using cuDNN persistent RNN. This is because there are some hard-to-detect edge cases that will throw exceptions with cudnn 8.0.5 on Nvidia A40 GPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49534 Reviewed By: mruberry Differential Revision: D25632378 Pulled By: mrshenli fbshipit-source-id: cbe78236d85d4d0c2e4ca63a3fc2c4e2de662d9e	2021-01-07 21:20:21 -08:00
Scott Wolchok	b73c018598	[PyTorch] Change representation of SizesAndStrides (#47508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47508 This moves SizesAndStrides to a specialized representation that is 5 words smaller in the common case of tensor rank 5 or less. ghstack-source-id: 119313560 Test Plan: SizesAndStridesTest added in previous diff passes under ASAN + UBSAN. Run framework overhead benchmarks. Looks more or less neutral. Reviewed By: ezyang Differential Revision: D24772023 fbshipit-source-id: 0a75fd6c2daabb0769e2f803e80e2d6831871316	2021-01-07 21:01:46 -08:00
Scott Wolchok	882ddb2f2d	[PyTorch] Introduce packed SizesAndStrides abstraction (#47507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47507 This introduces a new SizesAndStrides class as a helper for TensorImpl, in preparation for changing its representation. ghstack-source-id: 119313559 Test Plan: Added new automated tests as well. Run framework overhead benchmarks. Results seem to be neutral-ish. Reviewed By: ezyang Differential Revision: D24762557 fbshipit-source-id: 6cc0ede52d0a126549fb51eecef92af41c3e1a98	2021-01-07 20:56:50 -08:00
Shen Li	c480eebf95	Completely remove FutureMessage type (#50029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50029 Test Plan: buck run mode/opt -c=python.package_style=inplace //caffe2/torch/fb/training_toolkit/examples:ctr_mbl_feed_april_2020 -- local-preset --flow-entitlement pytorch_ftw_gpu --secure-group oncall_pytorch_distributed Before: ``` ... I0107 11:03:10.434000 3831111 print_publisher.py:23 master ] Publishing batch metrics: qps-qps\|total_examples 14000.0 I0107 11:03:10.434000 3831111 print_publisher.py:23 master ] Publishing batch metrics: qps-qps\|window_qps 74.60101318359375 I0107 11:03:10.434000 3831111 print_publisher.py:23 master ] Publishing batch metrics: qps-qps\|lifetime_qps 74.60101318359375 ... I0107 11:05:12.132000 3831111 print_publisher.py:23 master ] Publishing batch metrics: qps-qps\|total_examples 20000.0 I0107 11:05:12.132000 3831111 print_publisher.py:23 master ] Publishing batch metrics: qps-qps\|window_qps 64.0 I0107 11:05:12.132000 3831111 print_publisher.py:23 master ] Publishing batch metrics: qps-qps\|lifetime_qps 64.64917755126953 ... ``` After: ``` ... I0107 11:53:03.858000 53693 print_publisher.py:23 master ] Publishing batch metrics: qps-qps\|total_examples 14000.0 I0107 11:53:03.858000 53693 print_publisher.py:23 master ] Publishing batch metrics: qps-qps\|window_qps 72.56404876708984 I0107 11:53:03.858000 53693 print_publisher.py:23 master ] Publishing batch metrics: qps-qps\|lifetime_qps 72.56404876708984 ... I0107 11:54:24.612000 53693 print_publisher.py:23 master ] Publishing batch metrics: qps-qps\|total_examples 20000.0 I0107 11:54:24.612000 53693 print_publisher.py:23 master ] Publishing batch metrics: qps-qps\|window_qps 73.07617950439453 I0107 11:54:24.612000 53693 print_publisher.py:23 master ] Publishing batch metrics: qps-qps\|lifetime_qps 73.07617950439453 ... ``` Reviewed By: lw Differential Revision: D25774915 Pulled By: mrshenli fbshipit-source-id: 1128c3c2df9d76e36beaf171557da86e82043eb9	2021-01-07 19:50:57 -08:00
Shen Li	171648edaa	Completely Remove FutureMessage from RPC agents (#50028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50028 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25753887 Pulled By: mrshenli fbshipit-source-id: 40718349c2def262a16aaa24c167c0b540cddcb1	2021-01-07 19:50:53 -08:00
Shen Li	098751016e	Completely Remove FutureMessage from RPC cpp tests (#50027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50027 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25753815 Pulled By: mrshenli fbshipit-source-id: 85b9b03fec52b4175288ac3a401285607744b451	2021-01-07 19:50:50 -08:00
Shen Li	1f795e1a9b	Remove FutureMessage from RPC request callback logic (#50026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50026 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25753588 Pulled By: mrshenli fbshipit-source-id: a6fcda7830901dd812fbf0489b001e6bd9673780	2021-01-07 19:50:47 -08:00
Shen Li	2831af9837	Completely remove FutureMessage from FaultyProcessGroupAgent (#50025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50025 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25753587 Pulled By: mrshenli fbshipit-source-id: a5d4106a10d1b0d3e4c406751795f19af8afd120	2021-01-07 19:50:43 -08:00
Shen Li	0684d07425	Remove FutureMessage from sender TensorPipeAgent (#50024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50024 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25753386 Pulled By: mrshenli fbshipit-source-id: fdca051b805762a2c88f965ceb3edf1c25d40a56	2021-01-07 19:50:40 -08:00
Shen Li	1deb895074	Remove FutureMessage from sender ProcessGroupAgent (#50023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50023 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25753217 Pulled By: mrshenli fbshipit-source-id: 5a98473c17535c8f92043abe143064e7fca4413b	2021-01-07 19:50:37 -08:00
Shen Li	0c943931aa	Completely remove FutureMessage from distributed autograd (#50020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50020 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25752968 Pulled By: mrshenli fbshipit-source-id: 138d37e204b6f9a584633cfc79fd44c8c9c00f41	2021-01-07 19:50:33 -08:00
Shen Li	b2da0b5afe	Completely remove FutureMessage from RPC TorchScript implementations (#50005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50005 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25750663 Pulled By: mrshenli fbshipit-source-id: 6d97156b61d82aa19dd0567ca72fe04bd7b5d1e7	2021-01-07 19:50:30 -08:00
Shen Li	2d5f57cf3b	Completely remove FutureMessage from RRef Implementations (#50004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50004 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25750602 Pulled By: mrshenli fbshipit-source-id: 06854a77f4fb5cc4c34a1ede843301157ebf7309	2021-01-07 19:50:27 -08:00
Shen Li	d730c7e261	Replace FutureMessage with ivalue::Future in RpcAgent retry logic (#49995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49995 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25745301 Pulled By: mrshenli fbshipit-source-id: b5e3a7e0b377496924847d8d70d61de32e2d87f4	2021-01-07 19:50:23 -08:00
Shen Li	008206decc	Replace FutureMessage with ivalue::Future in RRefContext (#49960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49960 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25730530 Pulled By: mrshenli fbshipit-source-id: 5d54572c653592d79c40aed616266c87307a1ad8	2021-01-07 19:50:19 -08:00
Shen Li	25ef605132	Replace FutureMessage with ivalue::Future in distributed/autograd/utils.* (#49927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49927 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25724241 Pulled By: mrshenli fbshipit-source-id: d608e448f5224e41fbb0b5be6b9ac51a587f25b4	2021-01-07 19:50:16 -08:00
Shen Li	84e3237a53	Let RpcAgent::send() return JitFuture (#49906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49906 This commit modifies RPC Message to inherit from `torch::CustomClassHolder`, and wraps a Message in an IValue in `RpcAgent::send()`. Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25719518 Pulled By: mrshenli fbshipit-source-id: 694e40021e49e396da1620a2f81226522341550b	2021-01-07 19:47:14 -08:00
Scott Wolchok	4de6b279c8	[PyTorch] Devirtualize TensorImpl::dim() with macro (#49770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49770 Seems like the performance cost of making this commonly-called method virtual isn't worth having use of undefined tensors crash a bit earlier (they'll still fail to dispatch). ghstack-source-id: 119528065 Test Plan: framework overhead benchmarks Reviewed By: ezyang Differential Revision: D25687465 fbshipit-source-id: 89aabce165a594be401979c04236114a6f527b59	2021-01-07 19:05:41 -08:00
Scott Wolchok	1a1b665827	[PyTorch] validate that SparseTensorImpl::dim needn't be overridden (#49767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49767 I'm told that the base implementation should work fine. Let's validate that in an intermediate diff before removing it. ghstack-source-id: 119528066 Test Plan: CI Reviewed By: ezyang, bhosmer Differential Revision: D25686830 fbshipit-source-id: f931394d3de6df7f6c5c68fe8ab711d90d3b12fd	2021-01-07 19:05:38 -08:00
Scott Wolchok	2e7c6cc9df	[PyTorch] Devirtualize TensorImpl::numel() with macro (#49766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49766 Devirtualizing this seems like a decent performance improvement on internal benchmarks. The reason this is a performance improvement is twofold: 1) virtual calls are a bit slower than regular calls 2) virtual functions in `TensorImpl` can't be inlined Test Plan: internal benchmark Reviewed By: hlu1 Differential Revision: D25602321 fbshipit-source-id: d61556456ccfd7f10c6ebdc3a52263b438a2aef1	2021-01-07 19:00:45 -08:00
Nikita Shulga	bf4fcab681	Fix SyncBatchNorm usage without stats tracking (#50126 ) Summary: In `batch_norm_gather_stats_with_counts_cuda` use `input.scalar_type()` if `running_mean` is not defined In `SyncBatchNorm` forward function create count tensor with `torch.float32` type if `running_mean` is None Fix a few typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/50126 Test Plan: ``` python -c "import torch;print(torch.batch_norm_gather_stats_with_counts( torch.randn(1, 3, 3, 3, device='cuda'), mean = torch.ones(2, 3, device='cuda'), invstd = torch.ones(2, 3, device='cuda'), running_mean = None, running_var = None , momentum = .1, eps = 1e-5, counts = torch.ones(2, device='cuda')))" ``` Fixes https://github.com/pytorch/pytorch/issues/49730 Reviewed By: ngimel Differential Revision: D25797930 Pulled By: malfet fbshipit-source-id: 22a91e3969b5e9bbb7969d9cc70b45013a42fe83	2021-01-07 18:31:13 -08:00
Guilherme Leobas	870ab04b64	add type annotations to torch._utils (#49705 ) Summary: closes gh-49704 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49705 Reviewed By: mruberry Differential Revision: D25725352 Pulled By: malfet fbshipit-source-id: 05a7041c9caffde4a5c1eb8af0d13697075103af	2021-01-07 16:20:16 -08:00
Yi Wang	ce370398cc	[Gradient Compression] Remove the extra comma after "bucket" in PowerSGD hook signatures (#50197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50197 Remove the extra comma after "bucket". ghstack-source-id: 119513484 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D25823117 fbshipit-source-id: acf048f7cb732c23cba3a81ccce1e70f6b9f4299	2021-01-07 15:56:20 -08:00
Richard Barnes	09eefec627	Clean up some type annotations in android (#49944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49944 Upgrades type annotations from Python2 to Python3 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25717539 fbshipit-source-id: c621e2712e87eaed08cda48eb0fb224f6b0570c9	2021-01-07 15:42:55 -08:00
Richard Barnes	f83d57f99e	[Don't review] Clean up type annotations in caffe2/torch/nn (#50079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50079 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25718694 fbshipit-source-id: f535fb879bcd4cb4ea715adfd90bbffa3fcc1150	2021-01-07 15:39:20 -08:00
Richard Barnes	2bceee785f	Clean up simple type annotations in nn/functional.py (#50106 ) Summary: Also reformats code to pass linters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50106 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25787566 fbshipit-source-id: 39c86b4021e279f92f8ccf30252a6cfae1063c3c	2021-01-07 15:33:40 -08:00
Karthik Prasad	3b56e9d0ef	[pytorch] prune based on custom importance scores (#48378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48378 This commit adds support for accepting custom importance scores to use for pruning mask computation, rather than only using the parameter. This is useful if one wants to prune based on scores from different technique such as activations, gradients, weighted scoring of parameters, etc. An alternative to the above approach would be pass the custom mask to the already available interface. However, the ability to accept importance scores is easier it can leverage the mask computation logic that has already been baked in. In addition, the commit also makes some minor lint fixes. Test Plan: * Unit tests * Circle CI Differential Revision: D24997355 fbshipit-source-id: 30797897977b57d3e3bc197987da20e88febb1fa	2021-01-07 15:21:43 -08:00
Scott Wolchok	23cadb5d7b	[PyTorch] Specialize `list_element_from` for `IValue` to avoid extra move/copy (#50124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50124 This patch makes `list_element_from` avoid extra `IValue` move/copies for `List<IValue>` by just forwarding the reference argument. We take advantage of this in `listConstruct` by using `push_back` (which hits the `ListElementFrom` path) instead of ` ghstack-source-id: 119478962 Test Plan: Inspected generated assembly for vararg_functions.cpp in optimized build. Rather than a call to `vector::emplace_back` and an extra move, `vector::push_back` gets inlined. Reviewed By: ezyang Differential Revision: D25794277 fbshipit-source-id: 2354d8c08e0a0d6be2db3f0d0d6c90c3a455d8bd	2021-01-07 15:17:36 -08:00
Zafar	7ce8f7e488	[quant] Backend string for the quantized types (#49965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49965 Without this checking the type of the quantized tensor using `type` would throw an error. After this PR running the `type(qx)`, where `qx` is a quantized tensor would show something like `torch.quantized.QUInt8`. Test Plan: Not needed -- this is just a string description for the quantized tensors Differential Revision: D25731594 Reviewed By: ezyang Pulled By: z-a-f fbshipit-source-id: 942fdf89a1c50895249989c7203f2e7cc00df4c6	2021-01-07 14:57:34 -08:00
Felix Abecassis	0c3bae6a89	docker: add environment variable PYTORCH_VERSION (#50154 ) Summary: The aim is being able to inspect a container image and determine immediately which version of pytorch it contains. Closes https://github.com/pytorch/pytorch/issues/48324 Signed-off-by: Felix Abecassis <fabecassis@nvidia.com> seemethere PTAL. As you requested in https://github.com/pytorch/pytorch/issues/48324#issuecomment-754237156, I'm submitting the patch. But I could only do limited testing as I'm not sure these Makefile/Dockerfile are used for pushing the Docker Hub images (since the Makefile tags the image with a `v` prefix for the version, as in: `pytorch:v1.7.1`, but Docker Hub images don't have this prefix). Also on the master branch we currently have the following: ``` $ git describe --tags v1.4.0a0-11171-g68a6e46379 ``` So it's a little off, but it behaves as expected on the `release/1.7` branch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50154 Reviewed By: walterddr Differential Revision: D25828491 Pulled By: seemethere fbshipit-source-id: 500ec96cb5f5da1321610002d5e3678f4b0b94b5	2021-01-07 14:12:54 -08:00
Zafar	e12008d110	[quant] Mapping for the `_LinearWithBias` (#49964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49964 `torch.nn.modules.linear._LinearWithBias` is only used in the transformers, and is completely identical to the `torch.nn.Linear`. This PR creates a mapping so that this module would be treated the same as the Linear. Test Plan: ``` python test/test_quantization.py TestDynamicQuantizedModule TestStaticQuantizedModule ``` Differential Revision: D25731589 Reviewed By: jerryzh168 Pulled By: z-a-f fbshipit-source-id: 1b2697014e250e97d3010cdb542f9d130b71fbc3	2021-01-07 13:57:29 -08:00
Scott Wolchok	160b4be60a	[PyTorch] typeid: ensmallen scalarTypeItemSizes (#50165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50165 There are currently 17 types, so this used to stretch across 3 cache lines and now it fits in one. All the types in question seem to be way under 255 bytes in size anyway. ghstack-source-id: 119485090 Test Plan: CI, profiled internal benchmarks Reviewed By: smessmer Differential Revision: D25813574 fbshipit-source-id: c342d4f12a7b035503e1483b8301f68d98f3c503	2021-01-07 13:52:02 -08:00
Nikita Shulga	0495180f6e	Fix deprecation warning in scalar_type_analysis (#50218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50218 Reviewed By: janeyx99 Differential Revision: D25827971 Pulled By: malfet fbshipit-source-id: a4e467721435d7ae0db2195694053621eee8c2ee	2021-01-07 13:33:51 -08:00
Sanchit Jain	7377bfb1bd	Fix compiler warnings pertaining to uniform_int() (#49914 ) Summary: PROBLEM DESCRIPTION: GitHub issue 46391 suggests that compiler warnings pertaining to _floating-point value does not fit in required integral type_ might cause some confusion. These compiler-warnings arise during compilation of the templated function `uniform_int()`. The warnings are misleading because they arise from the way the compiler compiles templated functions, but the if-else statements in the function obviate the possibilities that the warnings describe. So, the purpose of a fix would only be to fix the compiler warnings, and not to fix any sort of a bug. FIX DESCRIPTION: [EDITED, after inputs from malfet]: In the function `uniform_int()`, the if-else conditions pertaining to types `double` & `float` can be removed, and then an overloaded specialized function can be added for floating-point types. The current version of the function can be specialized to not have its return type as a floating point type. An unrelated observation is that the if-else condition pertaining to the type `double` (line 57 in the original code) was redundant, as line 61 in the original code covered it (`std::is_floating_point<T>::value` would also have been true for the type `double`). Fixes https://github.com/pytorch/pytorch/issues/46391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49914 Reviewed By: H-Huang Differential Revision: D25808037 Pulled By: malfet fbshipit-source-id: 3f94c4bca877f09720b0d6efa5e1788554aba074	2021-01-07 13:26:08 -08:00
Zafar	e096449360	Adding MyPy daemon status file to gitignore (#50132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50132 When running mypy command using `dmypy run`, it creates a status file. This PR adds the file to the ignore list. Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D25834504 Pulled By: z-a-f fbshipit-source-id: 6c5a8edd6d8eaf61983e3ca80e798e02d78e38ce	2021-01-07 12:55:31 -08:00
Richard Barnes	ec6d29d6fa	Drop unused imports from test (#49973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49973 From ``` ./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/ ``` Test Plan: Standard sandcastle tests Reviewed By: xush6528 Differential Revision: D25727350 fbshipit-source-id: 237ec4edd85788de920663719173ebec7ddbae1c	2021-01-07 12:09:38 -08:00
Yi Zhang	fbdb7822c6	minor improvement: extract major version (#49393 ) Summary: 1. Unify major version extraction. If there's an error, it would throw exception in PR CI. So far, not all cuda version tests are in PR CI 2. better readability. passed cuda11 tests. https://circleci.com/gh/pytorch/pytorch/9651144?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link https://circleci.com/gh/pytorch/pytorch/9651145?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link Pull Request resolved: https://github.com/pytorch/pytorch/pull/49393 Reviewed By: zou3519 Differential Revision: D25828318 Pulled By: seemethere fbshipit-source-id: 5c6861f0ddafe9a77a9fe397e4e0f69ecce4b27f	2021-01-07 11:47:04 -08:00
TomHeaven	8706187523	Fix #42271 (#50141 ) Summary: This pull fix #{42271} by manually specify template data type of `tensor<template>.item()` in `aten/src/THC/generic/THCTensorMasked.cu`. Changes in submodules are not expected since I have pulled the latest submodules from the Pytorch master branch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50141 Reviewed By: zou3519 Differential Revision: D25826104 Pulled By: ezyang fbshipit-source-id: 80527a14786b36e4e520fdecc932e257d2520f89	2021-01-07 11:38:45 -08:00
Nikita Shulga	45c0d64b33	Skip test_functional_autograd_benchmark during code coverage (#50183 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49656 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50183 Reviewed By: walterddr Differential Revision: D25819825 Pulled By: malfet fbshipit-source-id: 0a3e64d6b6aedb6e729e7d14167955fd2d89862c	2021-01-07 11:17:21 -08:00
Bram Wasti	ace1680b68	[static runtime] Remove register concept by giving ownership to the nodes (#50050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50050 Every node will now own its outputs. I don't expect any big improvements perf-wise from this diff, the only eliminated code is from deallocate_registers Largely, this is to enable more optimizations going forward. Test Plan: buck test mode/dev //caffe2/benchmarks/static_runtime:static_runtime_cpptest buck test //caffe2/test:static_runtime Reviewed By: hlu1 Differential Revision: D25571181 fbshipit-source-id: 91fcfbd5cd968af963ba89c45656997650ca6d18	2021-01-07 10:19:58 -08:00
Chunli Fu	321b98830e	[script] Validator for unsupported ops on accelerator Summary: ATT Next step: 1. integrate with dper flow. 2. Support in bento after diff is pushed to prod. Test Plan: buck run mode/opt-clang sigrid/predictor/scripts:check_accelerator_unsupported_ops -- --model_entity_id=232891739 I0106 17:08:36.425796 1238141 pybind_state.cc:531] Unsupported ops: Fused8BitRowwiseQuantizedToFloat Reviewed By: khabinov Differential Revision: D25818253 fbshipit-source-id: 8d8556b0400c1747f154b0517352f1685f1aa8b1	2021-01-07 02:04:56 -08:00
UNO Leo	968ad47b41	Fix error messages thrown when the padding size is not valid (#50135 ) Summary: Hi, I changed error messages so that they correspond to the actual implementation. Acording to the implementation, half of kernel size is valid as padding size. This is minor but an example that the padding size is exactly equal to the half of kernel size, Input: 5 x 5 Kernel: 4 x 4 Stride: 4 Padding: 2 ==> Output: 2 x 2 You don't get the error in the above case, like following: ```python import torch import torch.nn as nn # no error input = torch.randn(1, 1, 5, 5) pool = nn.MaxPool2d(4, 4, padding=2) print(pool(input).shape) # >>> torch.Size([1, 1, 2, 2]) ``` You get the error when you set the padding size larger then half of kernel size like: ```python # it raises error input = torch.randn(1, 1, 5, 5) pool = nn.MaxPool2d(4, 4, padding=3) print(pool(input).shape) ``` The error message is: ``` --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-43-2b09d1c5d79a> in <module>() 1 input = torch.randn(1, 1, 5, 5) 2 pool = nn.MaxPool2d(4, 4, padding=3) ----> 3 print(pool(input).shape) 3 frames /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in _max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode, return_indices) 584 stride = torch.jit.annotate(List[int], []) 585 return torch.max_pool2d( --> 586 input, kernel_size, stride, padding, dilation, ceil_mode) 587 588 max_pool2d = boolean_dispatch( RuntimeError: pad should be smaller than half of kernel size, but got padW = 3, padH = 3, kW = 4, kH = 4 ``` Thanks in advance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50135 Reviewed By: hl475 Differential Revision: D25815337 Pulled By: H-Huang fbshipit-source-id: 98142296fa6e6849d2e1407d2c1d4e3c2f83076d	2021-01-06 22:21:48 -08:00
Meghan Lele	11cdb910b4	[fx] Add matrix multiplication fusion pass (#50151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50151 Summary This commit adds a graph transformation pass that merges several matrix multiplications that use the same RHS operand into one large matrix multiplication. The LHS operands from all of the smaller matrix multiplications are concatenated together and used as an input in the large matrix multiply, and the result is split in order to obtain the same products as the original set of matrix multiplications. Test Plan This commit adds a simple unit test with two matrix multiplications that share the same RHS operand. `python test/test_fx_experimental.py -k merge_matmul -v` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25809409 Pulled By: SplitInfinity fbshipit-source-id: fb55c044a54dea9f07b71aa60d44b7a8f3966ed0	2021-01-06 21:49:37 -08:00
Wanchao Liang	838e73de20	enable alltoall_single torchscript support (#48345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48345 Test Plan: wait for sandcastle Differential Revision: D25074475 fbshipit-source-id: 04261f8453567154b0464f8348320e936ca06384	2021-01-06 18:37:00 -08:00
Qifan Lu	4e2ab2cd73	Move generator state APIs to ATen (#49589 ) Summary: ## Rationale While most of the `torch.Generator` properties and methods are implemented as a thin wrapper of the corresponding `at::Generator` methods, `torch.Generator.get_state()` and `torch.Generator.set_state()` are implemented in legacy Torch code and are not dispatched through the `c10::GeneratorImpl` interface. This is not structured well and makes implementing generators for new backends (e.g. `XLAGeneratorImpl` for the XLA backend) inconvenient. As such, this pull request seeks to move these generator state APIs to c10 and ATen. ## What is being refactored? * Interfaces - Added `c10::GeneratorImpl::set_state` and `c10::GeneratorImpl::state` for getting and setting the internal state of a random number generator. - `at::Generator::set_state` and `at::Generator::state` wraps the above-mentioned APIs, as it's basically a PIMPL. - Added helper function `at::detail::check_rng_state` for checking the validity of new RNG state tensor. * CPU Generator - Renamed and moved `THTensor_(setRNGState)` and `THTensor_(getRNGState)` to `CPUGeneratorImpl::set_state` and `CPUGenerator::state`. - Renamed and moved `THGeneratorState` and `THGeneratorStateNew` to `CPUGeneratorStateLegacy` and `CPUGeneratorState`. * CUDA Generator - Renamed and moved `THCRandom_setRNGState` and `THCRandom_getRNGState` to `CUDAGeneratorImpl::set_state` and `CUDAGeneratorImpl::state`. * PyTorch Bindings - `THPGenerator_setState` and `THPGenerator_getState` now simply forward to `at::Generator::set_state` and `at::Generator::state`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49589 Reviewed By: H-Huang Differential Revision: D25785774 Pulled By: pbelevich fbshipit-source-id: 8ed79209c4ffb1a0ae8b19952ac8871ac9e0255f	2021-01-06 18:26:56 -08:00
Richard Barnes	b6b76a1055	Mod lists to neutral+descriptive terms in caffe2/caffe2/opt (#49801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49801 Per "https://fb.workplace.com/groups/e/permalink/3320810064641820/" we can no longer use the terms "whitelist" and "blacklist", and editing any file containing them results in a critical error signal. Let's embrace the change. This diff changes "blacklist" to "blocklist" in a number of non-interface contexts (interfaces would require more extensive testing and might interfere with reading stored data, so those are deferred until later). Test Plan: Sandcastle Reviewed By: xush6528 Differential Revision: D25686949 fbshipit-source-id: e07de4d228674ae61559719cbe4717f8044778d2	2021-01-06 18:13:42 -08:00
Scott Wolchok	ef1fa547ba	[PyTorch] Use expectRef() when calling listConstruct (#50062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50062 Avoids creating an extra shared_ptr. ghstack-source-id: 119325645 Test Plan: CI Reviewed By: ezyang Differential Revision: D25766631 fbshipit-source-id: f2ab8349dfea325054820fa2c1055180c740574e	2021-01-06 18:13:38 -08:00
Scott Wolchok	fa160d18e7	[PyTorch][jit] Add Type::{castRaw,expectRef} (#50061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50061 These are more efficient than creating an extra `shared_ptr` when you just want to access the casted value. ghstack-source-id: 119325644 Test Plan: CI Reviewed By: ezyang Differential Revision: D25766630 fbshipit-source-id: 46f11f70333b44714cab708a4850922ab7486793	2021-01-06 18:12:05 -08:00
Richard Barnes	6838ecefb6	Clean up some type annotations in torch/jit (#49939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49939 Upgrades type annotations from Python2 to Python3 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25717573 fbshipit-source-id: 7d5c98fafaa224e0504b73dc69b1e4a6410c0494	2021-01-06 16:39:57 -08:00
Xiang Gao	e49372d460	Bugfix nightly checkout tool to work on Windows (#49274 ) Summary: I am submitting this PR on behalf of Janne Hellsten(nurpax) from NVIDIA, for the convenience of CLA. Thanks Janne a lot for the contribution! This fixes the bug when running ` ./tools/nightly.py checkout -b my-nightly-branch` on windows. Before this fix, this command gets the following error on Windows. ``` ERROR:root:Fatal exception Traceback (most recent call last): File "./tools/nightly.py", line 166, in logging_manager yield root_logger File "./tools/nightly.py", line 644, in main install( File "./tools/nightly.py", line 552, in install spdir = _site_packages(pytdir.name, platform) File "./tools/nightly.py", line 325, in _site_packages os.path.join(pytdir.name, "Lib", "site-packages") NameError: name 'pytdir' is not defined log file: d:\pytorch\nightly\log\2020-12-11_16h10m14s_6867a21e-3c0e-11eb-878e-04ed3363a33e\nightly.log ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49274 Reviewed By: H-Huang Differential Revision: D25808156 Pulled By: malfet fbshipit-source-id: 00778016366ab771fc3fb152710c7849210640fb	2021-01-06 16:14:51 -08:00
James Reed	eb8003d8e9	[FX] Remove extraneous newlines at end of code (#50117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50117 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D25791847 Pulled By: jamesr66a fbshipit-source-id: 9c0b296e117e6bcf69ed9624ad0b243fa3db0f76	2021-01-06 15:47:37 -08:00
Eli Uriegas	dc41d17655	.circleci: Add option to not run build workflow (#50162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50162 Adds an option to not run the build workflow when the `run_build` parameter is set to false Should reduce the amount of double workflows that are run by pytorch-probot Uses functionality introduced in https://github.com/pytorch/pytorch-probot/pull/18 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: yns88 Differential Revision: D25812971 Pulled By: seemethere fbshipit-source-id: 4832170f6abcabe3f385f47a663d148b0cfe2a28	2021-01-06 15:42:17 -08:00
Dhruv Matani	3270e661c3	[PyTorch Mobile] Skip signature check when converting to typed operator handle (#49469 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49469 In Functions.cpp, there is a call to `typed<...>` that converts to a `TypedOperatorHandle`. This isn't needed at runtime since it's already been exercised during development, and for mobile, there is no possibility of operators or kernels being registered by users (from Python code the way it is possible on server side). ghstack-source-id: 118714246 Test Plan: Sandcastle ### App testing results: FBiOS fails with an error similar to this one: https://fb.workplace.com/groups/2102613013103952/permalink/3815085708523332/ Tested 2 AR effects (gren screen and colors shift) on IGiOS. ### BSB results: D25581159-V1 (https://www.internalfb.com/intern/diff/D25581159/?dest_number=118689912) fbios: Succeeded Change in Download Size for arm64 + 3x assets variation: -7.2 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -27.1 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:135971531636706@base/bsb:135971531636706@diff/ D25581159-V1 (https://www.internalfb.com/intern/diff/D25581159/?dest_number=118689912) fbios-pika: Succeeded Change in Download Size for arm64 + 3x assets variation: -11.0 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -7.4 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:430379774665351@base/bsb:430379774665351@diff/ 3:02 AM D25581159-V1 (https://www.internalfb.com/intern/diff/D25581159/?dest_number=118689912) igios: Succeeded Change in Download Size for arm64 + 3x assets variation: -5.3 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -17.3 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:685843828784135@base/bsb:685843828784135@diff/ Reviewed By: iseeyuan Differential Revision: D25581159 fbshipit-source-id: 4a62982829ec42c2d3f58f47f876f2543bc0099b	2021-01-06 14:56:07 -08:00
Scott Wolchok	dde5b6e177	[PyTorch] Reapply D25547962: Make tls_local_dispatch_key_set inlineable (reapply) (#49763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49763 This was reverted because it landed in a stack together with D25542799 (`9ce1df079f`), which really was broken. ghstack-source-id: 119063016 Test Plan: CI Reviewed By: ezyang Differential Revision: D25685959 fbshipit-source-id: 514d8076eac67c760f119cfebc2ae3d0ddcd4e04	2021-01-06 14:41:43 -08:00
Sebastian Messmer	eef5eb05bf	Remove backward and requires_grad from Autograd backend key (#49613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49613 Just following a TODO in the code base... ghstack-source-id: 119450484 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25644597 fbshipit-source-id: 26f5fa6af480929d0468b0de3ab103813e40d78b	2021-01-06 14:22:58 -08:00
Sebastian Messmer	6643e9fbb3	Remove `use_c10_dispatcher: full` lines (#49259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49259 Since `use_c10_dispatcher: full` is now the default, we can remove all those pesky lines mentioning it. Only the `use_c10_dispatcher: hacky_wrapper_for_legacy_signatures` lines are left. ghstack-source-id: 119450485 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25506526 fbshipit-source-id: 8053618120c0b52ff7c73cacb34bec7eb38f8fe0	2021-01-06 14:22:54 -08:00
Sebastian Messmer	249261ada7	Remove generated_unboxing_wrappers and setManuallyBoxedKernel (#49251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49251 Since all ops are c10-full and use templated unboxing now, we don't need to codegenerate any unboxing logic anymore. Since this codegen was the only code using setManuallyBoxedKernel, we can also remove that functionality from KernelFunction, OperatorEntry and Dispatcher. ghstack-source-id: 119450486 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25502865 fbshipit-source-id: 49d009df159fda4be41bd02457d4427e6e638c10	2021-01-06 14:22:50 -08:00
Sebastian Messmer	4a14020c0d	Remove .impl_UNBOXED() and functionalities associated with it (#49220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49220 Since all ops are c10-full, we can remove .impl_UNBOXED now. This also removes the ability of KernelFunction or CppFunction to store unboxedOnly kernels. ghstack-source-id: 119450489 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25490225 fbshipit-source-id: 32de9d591e6a842fe18abc82541580647e9cfdad	2021-01-06 14:22:46 -08:00
Sebastian Messmer	e4c41b6936	Remove codegen logic to support non-c10-full ops (#49164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49164 This PR removes the logic paths in codegen that were responsible for handling non-c10-full ops. This only goes through our basic codegen. It does not simplify C++ code yet and it does not remove the codegen for generated unboxing wrappers yet. ghstack-source-id: 119450487 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25462977 fbshipit-source-id: 7e70d14bea96948f5056d98125f3e6ba6bd78285	2021-01-06 14:17:36 -08:00
Ashkan Aliabadi	fcb69d2eba	Add android.permission.INTERNET permission to Android test_app. (#49996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49996 According to section 5.2.1 of Snapdragon Profiler User Guide (https://developer.qualcomm.com/qfile/30580/snapdragon_profiler_user_guide_reva.pdf) OpenGL ES, Vulkan, and OpenCL apps must include android.permission.INTERNET in the app's AndroidManifest.xml to enable API tracing and GPU metrics. Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D25809555 Pulled By: AshkanAliabadi fbshipit-source-id: c4d88a7ea98d9166efbc4157df7d822d99ba0df9	2021-01-06 12:58:28 -08:00
Hugo van Kemenade	473e78c0fa	Remove redundant code for unsupported Python versions (#49486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49486 Remove code for Python 3.5 and lower. There's more that can be removed/modernised, but sticking mainly to redundant version checks here, to keep the diff/PR smaller. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46579 Reviewed By: zou3519 Differential Revision: D24453571 Pulled By: ezyang fbshipit-source-id: c2cfcf05d6c5f65df64d89c331692c9aec09248e	2021-01-06 12:45:46 -08:00
Stephen Jia	09eb468398	[vulkan] 2D prepacking for conv2d (#48816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48816 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D25786280 Pulled By: SS-JIA fbshipit-source-id: b41bf55dcff8f3dfbbf1994171e2ef62f16ff29a	2021-01-06 12:37:51 -08:00
Dhruv Matani	9b519b4a3f	[PyTorch Mobile] Generate Kernel dtype selection code in selected_mobile_ops.h during the build (#49279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49279 Now that the YAML files for tracing based selective build optionally have the information regarding the selected kernel function dtypes, we can start generating constexpr selection code in the include file (`selected_mobile_ops.h`) to make the inclusion of code for specific dtypes selective based on compile time decisions. The way this is done is that if we detect that the code for a specific dtype should not be in the binary, we add an exception (throw) statement just before the method is called (see the first diff in this dtack) and allow the compiler to optimize away the rest of the function's body. This has the advantage of allowing the compiler to know the lambda's return type (since it's inferred from the `return` statements in the body of the method, and if we compile out all the cases, then the compiler won't know the return type and it will result in a compilation error). The generated `<ATen/selected_mobile_ops.h>` is being used (included) in `Dispatch.h`. In case `XPLAT_MOBILE_BUILD` is not defined, then we should include code for all kernel dtypes (non-selective build). When merging, we need to handle the case of both older and newer (tracing based) operator lists. If we detect any operator that includes all overloads, it indicates that an old style operator list is part of the build, and we need to `include_all_kernel_dtypes` for this build. ghstack-source-id: 119439497 Test Plan: For Segmentation v220, here is one of the intermediate generated YAML files (selected_operators.yaml): {P154480509} and here is the generated `selected_mobile_ops.h` file: {P159808798} Here is the `selected_mobile_ops.h` file for lite_predictor (which includes all ops and all dtypes): {P159806443} Continuous build for ~8 checked-in models validates that the selection code works as expected when we build based on dtype selection. Reviewed By: iseeyuan Differential Revision: D25388949 fbshipit-source-id: 1c182a4831a7f94f7b152f02dbd3bc01c0d22443	2021-01-06 12:17:32 -08:00
Sam Estep	ba691e1a42	Remove incorrect links to zdevito/ATen (#50065 ) Summary: Similar to https://github.com/pytorch/pytorch/issues/49028, this PR removes a few more references to https://github.com/zdevito/ATen. - The links for Functions.h, Tensor.h, and Type.h are simply broken, probably because they refer to `master` rather than a specific commit (cf. https://github.com/pytorch/pytorch/issues/47066) - I'm unsure about the change to the `about` section of `aten/conda/meta.yaml`; can someone comment on whether I am understanding that field correctly? - The reference to https://github.com/zdevito/ATen/issues/163 remains [in `tools/autograd/derivatives.yaml`](`cd608fe59b/tools/autograd/derivatives.yaml (L91)`), because the contents of that issue discussion don't seem to be mirrored anywhere else. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50065 Reviewed By: ezyang, walterddr Differential Revision: D25767353 Pulled By: samestep fbshipit-source-id: 265f46f058bc54ef6d1a77f112cdfa1f115b3247	2021-01-06 11:49:26 -08:00
Elias Ellison	6eee2a0a9f	[JIT] disable masked fill (#50147 ) Summary: There is an internal user who is experiencing a bug with masked_fill. While I am almost certain this corresponds to an old pytorch version with the bug, the model that is breaking is important and time-sensitive and we are covering all bases to try to get it to work again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50147 Reviewed By: nhsoukai Differential Revision: D25806541 Pulled By: eellison fbshipit-source-id: 131bd71b5db9717a8a9cb97973d0b4f0e96455d6	2021-01-06 11:36:30 -08:00
Edward Yang	3ce539881a	Back out "Revert D25757721: [pytorch][PR] Run mypy on more test files" (#50142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50142 Original commit changeset: 58437d719285 Test Plan: OSS CI Reviewed By: walterddr, ngimel Differential Revision: D25803866 fbshipit-source-id: d6b83a5211e430c0451994391876103f1ad96315	2021-01-06 11:27:36 -08:00
Richard Barnes	638086950d	Clean up type annotations in torch/nn/quantized/modules (#49941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49941 Test Plan: Sandcastle Reviewed By: jerryzh168 Differential Revision: D25718715 fbshipit-source-id: bbe450d937cf7ef634e003c09146e308180d1d58	2021-01-06 11:03:08 -08:00
Joel Schlosser	7d9eb6c680	Implementation of torch::cuda::synchronize (#50072 ) Summary: Adding `torch::cuda::synchronize()` to libtorch. Note that the implementation here adds a new method to the `CUDAHooksInterface`. An alternative that was suggested to me is to add a method to the `DeviceGuard` interface. Fixes https://github.com/pytorch/pytorch/issues/47722 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50072 Reviewed By: H-Huang Differential Revision: D25804342 Pulled By: jbschlosser fbshipit-source-id: 45aa61d7c6fbfd3178caf2eb5ec053d6c01b5a43	2021-01-06 10:53:39 -08:00
Richard Barnes	e606e60331	[Needs Review] Convert some files to Python3 (#49351 ) Summary: Uses the Python standard library 2to3 script to convert a number of Python 2 files to Python 3. This facilitates code maintenance such as dropping unused imports in D25500422. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49351 Test Plan: Standard sandcastle tests Reviewed By: xush6528 Differential Revision: D25499576 fbshipit-source-id: 0c44718ac734771ce0758b1cb30676cc3d76ac10	2021-01-06 10:48:16 -08:00
Richard Barnes	efe0533a24	Clean up some type annotations in torch/testing/_internal (#50078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50078 Upgrades type annotations from Python2 to Python3 Test Plan: Sandcastle tests Reviewed By: pritamdamania87 Differential Revision: D25717560 fbshipit-source-id: cec631f3121ef9ab87ff8b3b00f1fae6df9a2155	2021-01-06 10:41:22 -08:00
Loi Ly	74c055b240	Fix mypy type hint for AdaptiveAvgPool2,3d, AdaptiveMaxPool2,3d (#49963 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49918 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49963 Reviewed By: mrshenli, heitorschueroff Differential Revision: D25760110 Pulled By: ezyang fbshipit-source-id: aeb655b784689544000ea3b948f7d6d025aee441	2021-01-06 09:47:15 -08:00
Edward Yang	68a6e46379	Push anonymous namespace into codegen, not template (#49498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49498 In the near future, I want to code generate some functions that are visible externally to this compilation unit. I cannot easily do this if all the codegen code is wrapped in a global anonymous namespace, so push the namespace in. Registration has to stay in an anonymous namespace to avoid name conflicts. This could also have been solved by making the wrapper functions have more unique names but I didn't do this in the end. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD, smessmer Differential Revision: D25616104 Pulled By: ezyang fbshipit-source-id: 323c0dda05a081502aab702f359a08dfac8c41a4	2021-01-06 08:44:49 -08:00
Scott Wolchok	480a756194	[PyTorch] IValue::toTensor can now return const Tensor& (#48868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48868 Building on the previous diff, we can make `toTensor()` return a `const Tensor&`, which should make it easier to avoid reference counting. ghstack-source-id: 119327372 Test Plan: internal benchmarks. Reviewed By: bwasti Differential Revision: D25325379 fbshipit-source-id: ca699632901691bcee432f595f75b0a4416d55dd	2021-01-06 08:40:50 -08:00
Scott Wolchok	1b31e13539	[PyTorch] Store Tensor explicitly in IValue (#48824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48824 Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. ghstack-source-id: 119327370 Test Plan: ivalue_test Internal benchmark to ensure perf parity. Some interesting steps during the debugging process: - First version was about a 5% regression - Directly implementing move construction instead of using swap lowered the regression to 2-3% - Directly implementing move assign was maybe an 0.5% improvement - Adding C10_ALWAYS_INLINE on move assign got our regression to negligible - Fixing toTensor() to actually be correct regressed us again, but omitting the explicit dtor call as exhaustively spelled out in a comment fixed it. Reviewed By: bwasti Differential Revision: D25324617 fbshipit-source-id: 7518c1c67f6f2661f151b43310aaddf4fb6e511a	2021-01-06 08:40:47 -08:00
Scott Wolchok	688992c775	[PyTorch] Additional IValue tests (#49718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49718 Improving test coverage in preparation for updating the implementation of IValue. ghstack-source-id: 119327373 Test Plan: ivalue_test Reviewed By: hlu1 Differential Revision: D25674605 fbshipit-source-id: 37a82bb135f75ec52d2d8bd929c4329e8dcc4d25	2021-01-06 08:35:42 -08:00
Alex Henrie	5f2ec6293d	Unused variables in neural net classes and functions (#50100 ) Summary: These unused variables were identified by [pyflakes](https://pypi.org/project/pyflakes/). They can be safely removed to simplify the code and possibly improve performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50100 Reviewed By: ezyang Differential Revision: D25797764 Pulled By: smessmer fbshipit-source-id: ced341aee692f429d2dcc3a4ef5c46c8ee99cabb	2021-01-06 08:16:57 -08:00
Nathan Howell	c517e15d79	Add support for converting sparse bool tensors to dense (#50019 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49977 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50019 Reviewed By: smessmer Differential Revision: D25782045 Pulled By: ezyang fbshipit-source-id: a8389cbecb7e79099292a423a6fd8ac28631905b	2021-01-06 07:38:14 -08:00
Chester Liu	2ac180a5dd	Fix cl.exe detection in cpu/fused_kernel.cpp (#50085 ) Summary: The command used here is essentially `where cl.exe`. By using `system()` we will not be able to find cl.exe unless we are using VS Developer Prompt, which makes `activate()` meaningless. Change `system()` to `run()` fixes this. Found during https://github.com/pytorch/pytorch/issues/49781. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50085 Reviewed By: smessmer Differential Revision: D25782054 Pulled By: ezyang fbshipit-source-id: e8e3cac903a73f3bd78def667ebe0e93201814c8	2021-01-06 07:16:41 -08:00
Jithun Nair	45ec35827e	Set USE_RCCL cmake option (dependent on USE_NCCL) [REDUX] (#34683 ) Summary: Refiled duplicate of https://github.com/pytorch/pytorch/issues/31341 which was reverted in commit 63964175b52197a75e03b73c59bd2573df66b398. This PR enables RCCL support when building Gloo as part of PyTorch for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34683 Reviewed By: glaringlee Differential Revision: D25540578 Pulled By: ezyang fbshipit-source-id: fcb02e5745d62e1b7d2e02048160e9e7a4b4df2d	2021-01-06 07:03:02 -08:00
cyy	0ad6f06684	drop a unneeded comma from cmakelist.txt (#50091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50091 Reviewed By: smessmer Differential Revision: D25782083 Pulled By: ezyang fbshipit-source-id: f90f57c6c9fc0c1e68ab30dd3b56dfe971798df2	2021-01-06 06:53:45 -08:00
Natalia Gimelshein	ad7d208ba5	Revert D25239967: [fx] Add matrix multiplication fusion pass Test Plan: revert-hammer Differential Revision: D25239967 (`9b7f3fa146`) Original commit changeset: fb99ad25b7d8 fbshipit-source-id: 370167b5ade8bf2b3a6cccdf4290ea07b8347c79	2021-01-05 23:22:26 -08:00
Scott Wolchok	282552dde2	[PyTorch] Reapply D25546409: Use .sizes() isntead of .size() in cat_serial_kernel_impl (#49762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49762 This was reverted because it landed in a stack together with D25542799 (`9ce1df079f`), which really was broken. ghstack-source-id: 119326870 Test Plan: CI Reviewed By: maratsubkhankulov Differential Revision: D25685905 fbshipit-source-id: f4ec9e114993f988d4af380677331c72dfe41c44	2021-01-05 22:59:22 -08:00
Michael Carilli	57d489e43a	Fix for possible RNG offset calculation bug in cuda vectorized dropout with VEC=2 (#50110 ) Summary: The [offset calculation](`e3c56ddde6/aten/src/ATen/native/cuda/Dropout.cu (L328)`) (which gives an estimated ceiling on the most 32-bit values in the philox sequence any thread in the launch will use) uses the hardcoded UNROLL value of 4, and assumes the hungriest threads can use every value (.x, .y, .z, and .w) their curand_uniform4 calls provide. However, the way fused_dropout_kernel_vec is currently written, that assumption isn't true in the VEC=2 case: Each iteration of the `grid x VEC` stride loop, each thread calls curand_uniform4 once, uses rand.x and rand.y, and discards rand.z and rand.w. This means (I _think_) curand_uniform4 may be called twice as many times per thread in the VEC=2 case as for the VEC=4 case or the fully unrolled code path, which means the offset calculation (which is a good estimate for the latter two cases) is probably wrong for the `fused_dropout_kernel_vec<..., /VEC=/2>` code path. The present PR inserts some value-reuse in fused_dropout_kernel_vec to align the number of times curand_uniform4 is called for launches with the same totalElements in the VEC=2 and VEC=4 cases. The diff should - make the offset calculation valid for all code paths - provide a very small perf boost by reducing the number of curand_uniform4 calls in the VEC=2 path - ~~make results bitwise accurate for all code paths~~ nvm, tensor elements are assigned to threads differently in the unrolled, VEC 2 and VEC 4 cases, so we're screwed here no matter what. ngimel what do you think? Pull Request resolved: https://github.com/pytorch/pytorch/pull/50110 Reviewed By: smessmer Differential Revision: D25790121 Pulled By: ngimel fbshipit-source-id: f8f533ad997268c6f323cf4d225de547144247a8	2021-01-05 22:36:05 -08:00
Jerry Zhang	f6f0fde841	[reland][quant][graphmode][fx] Standalone module support {input/output}_quantized_idxs (#49754 ) (#50058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50058 This PR adds the support for {input/output}_quantized_idxs for standalone module. if input_quantized_idxs = [] and output_quantized_idxs = [], the standalone module will be expecting float input and produce float output, and will quantize the input and dequantize output internally if input_quantized_idxs = [0] and otuput_qiuantized_idxs = [0], the standalone module will be expecting quantized input and produce quantized output, the input will be quantized in the parent module, and output will be dequantized in the parent module as well, this is similar to current quantized modules like nn.quantized.Conv2d For more details, please see the test case Test Plan: python test/test_quantization.py TestQuantizeFx.test_standalone_module Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D25768910 fbshipit-source-id: 96c21a3456cf192c8f1400afa4e86273ee69197b	2021-01-05 20:27:46 -08:00
Erjia Guan	05358332b3	Fix mypy typing check for test_dataset (#50108 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50108 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D25789184 Pulled By: ejguan fbshipit-source-id: 0eeeeeda62533e7137d56f313b7bf11406b32611	2021-01-05 19:57:22 -08:00
Thomas Viehmann	def8aa5499	Remove cpu half and dead code from multinomial (#50063 ) Summary: Based on ngimel's (Thank you!) feedback, cpu half was only accidental, so I'm removing it. This lets us ditch the old codepath for without replacement in favour of the new, better one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50063 Reviewed By: mruberry Differential Revision: D25772449 Pulled By: ngimel fbshipit-source-id: 608729c32237de4ee6d1acf7e316a6e878dac7f0	2021-01-05 19:46:33 -08:00
Meghan Lele	9b7f3fa146	[fx] Add matrix multiplication fusion pass (#50120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50120 This commit adds a graph transformation pass that merges several matrix multiplications that use the same RHS operand into one large matrix multiplication. The LHS operands from all of the smaller matrix multiplications are concatenated together and used as an input in the large matrix multiply, and the result is split in order to obtain the same products as the original set of matrix multiplications. Test Plan: This commit adds a simple unit test with two matrix multiplications that share the same RHS operand. `buck test //caffe2/test:fx_experimental` Reviewed By: jamesr66a Differential Revision: D25239967 fbshipit-source-id: fb99ad25b7d83ff876da6d19dc4abd112d13001e	2021-01-05 19:37:08 -08:00
Richard Barnes	d80d38cf87	Clean up type annotations in caffe2/torch/nn/modules (#49957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49957 Test Plan: Sandcastle Reviewed By: xush6528 Differential Revision: D25729745 fbshipit-source-id: 85810e2c18ca6856480bef81217da1359b63d8a3	2021-01-05 19:08:40 -08:00
Scott Wolchok	75028f28e1	[PyTorch] Reapply D25545777: Use .sizes() instead of .size() in _cat_out_cpu (#49761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49761 This was reverted because it landed in a stack together with D25542799 (`9ce1df079f`), which really was broken. ghstack-source-id: 119361027 Test Plan: CI Reviewed By: bwasti Differential Revision: D25685855 fbshipit-source-id: b51f67ebe667199d15bfc6f8f131a6f1ab1b0352	2021-01-05 19:04:23 -08:00
Scott Wolchok	574a15b6cc	[PyTorch] Reapply D25544731: Avoid extra Tensor refcounting in _cat_out_cpu (#49760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49760 This was reverted because it landed in a stack together with D25542799 (`9ce1df079f`), which really was broken. ghstack-source-id: 119361028 Test Plan: CI Reviewed By: bwasti Differential Revision: D25685789 fbshipit-source-id: 41e5abb4ff30acaa6f33f9c806acd652a6dd9646	2021-01-05 18:59:20 -08:00
Erjia Guan	5f875965c6	Fix doc for vmap levels (#50099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50099 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D25783257 Pulled By: ejguan fbshipit-source-id: 7d2c7614f87e1c8adc8aefe3fe312b6c98ff6788	2021-01-05 18:48:32 -08:00
Xiang Gao	70734f1260	Kill AT_SKIP_BFLOAT16_IF_NOT_ROCM (#48810 ) Summary: Dependency: https://github.com/pytorch/pytorch/pull/48809 https://github.com/pytorch/pytorch/pull/48807 https://github.com/pytorch/pytorch/pull/48806 https://github.com/pytorch/pytorch/pull/48805 https://github.com/pytorch/pytorch/pull/48801 https://github.com/pytorch/pytorch/pull/44994 https://github.com/pytorch/pytorch/pull/44848 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48810 Reviewed By: mruberry Differential Revision: D25772955 Pulled By: ngimel fbshipit-source-id: 353f130eb701f8b338a826d2edaea69e6e644ee9	2021-01-05 18:10:23 -08:00
Peter Bell	26391143b6	Support out argument in torch.fft ops (#49335 ) Summary: Ref https://github.com/pytorch/pytorch/issues/42175 This adds out argument support to all functions in the `torch.fft` namespace except for `fftshift` and `ifftshift` because they rely on `at::roll` which doesn't have an out argument version. Note that there's no general way to do the transforms directly into the output since both cufft and mkl-fft only support single batch dimensions. At a minimum, the output may need to be re-strided which I don't think is expected from `out` arguments normally. So, on cpu this just copies the result into the out tensor. On cuda, the normalization is changed to call `at::mul_out` instead of an inplace multiply. If it's desirable, I could add a special case to transform into the output when `out.numel() == 0` since there's no expectation to preserve the strides in that case anyway. But that would lead to the slightly odd situation where `out` having the correct shape follows a different code path from `out.resize_(0)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49335 Reviewed By: mrshenli Differential Revision: D25756635 Pulled By: mruberry fbshipit-source-id: d29843f024942443c8857139a2abdde09affd7d6	2021-01-05 17:17:49 -08:00
kshitij12345	5d93e2b818	torch.flip and torch.flip{lr, ud}: Half support for CPU and BFloat16 support for CPU & CUDA (#49895 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49889 Also adds BFloat16 support for CPU and CUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/49895 Reviewed By: mrshenli Differential Revision: D25746272 Pulled By: mruberry fbshipit-source-id: 0b6a9bc13ae60c22729a0aea002ed857c36f14ff	2021-01-05 16:51:49 -08:00
Elias Ellison	d1c375f071	fix fork formatting (#49436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49436 Test Plan: Imported from OSS Reviewed By: tugsbayasgalan Differential Revision: D25788166 Pulled By: eellison fbshipit-source-id: e368b473ad64a1168be01fc674625415a07ff31c	2021-01-05 16:38:34 -08:00
Mike Ruberry	7fe25af59d	Revert D25746115: [pytorch][PR] Improve documentation and warning message for creation of a tensor with from_numpy() Test Plan: revert-hammer Differential Revision: D25746115 (`4a6c178f73`) Original commit changeset: 3e534a8f2bc1 fbshipit-source-id: 12c921cf2d062794ce45afcaed1fbedc28dcdd01	2021-01-05 16:21:26 -08:00
Martin Yuan	dcc83868c5	[PyTorch Mobile] Mark xnnpack operators selective Summary: The remaining operator registrations that are not marked as selective. The size save is -12.2 KB for igios and -14 KB for fbios. Test Plan: CI Reviewed By: dhruvbird Differential Revision: D25742543 fbshipit-source-id: 3e58789d36d216a52340c00b53e2f783ea2c9414	2021-01-05 15:53:01 -08:00
Mike Ruberry	5e1c8f24d4	Make stft (temporarily) warn (#50102 ) Summary: When continuing the deprecation process for stft it was made to throw an error when `use_complex` was not explicitly set by the user. Unfortunately this PR missed a model relying on the historic stft functionality. Before re-enabling the error we'll need to write an upgrader for that model. This PR turns the error back into a warning to allow that model to continue running as before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50102 Reviewed By: ngimel Differential Revision: D25784325 Pulled By: mruberry fbshipit-source-id: 825fb38af39b423ce11b376ad3c4a8b21c410b95	2021-01-05 15:39:00 -08:00
Leon Voland	4a6c178f73	Improve documentation and warning message for creation of a tensor with from_numpy() (#49516 ) Summary: Implements very simple changes suggested in the short discussion of the issue. Updated documentation to inform user that creation of tensor with memory mapped read only numpy arrays will probably cause a crash of the program. The displayed warning message was also updated to contain the information about issues concerning the use of a memory mapped read only numpy array. Closes https://github.com/pytorch/pytorch/issues/46741. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49516 Reviewed By: mrshenli Differential Revision: D25746115 Pulled By: mruberry fbshipit-source-id: 3e534a8f2bc1f083a2835440d324bd6f30798ad4	2021-01-05 15:25:15 -08:00
Mike Ruberry	9529ae3776	Revert D25757721: [pytorch][PR] Run mypy on more test files Test Plan: revert-hammer Differential Revision: D25757721 (`b7bfc723d3`) Original commit changeset: 44c396d8da9e fbshipit-source-id: 58437d719285a4fecd8c05e487cc86fc2cebadff	2021-01-05 15:18:14 -08:00
Jeff Yang	d1a56fcd9d	[docs] add docstring in torch.cuda.get_device_properties (#49792 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49737 Added docstring in `torch.cuda.get_device_properties` Added the `Returns` in `torch.cuda.get_device_name` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49792 Reviewed By: mruberry Differential Revision: D25784046 Pulled By: ngimel fbshipit-source-id: f88da02147f92c889398957fcaf22961d3bb1062	2021-01-05 14:51:07 -08:00
Meghan Lele	abe1fa49e9	[JIT] Add `__prepare_scriptable__` duck typing to allow replacing nn.modules with scriptable preparations (#45645 ) (#49242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49242 Fixes https://github.com/pytorch/pytorch/issues/45072 As discussed with zdevito gchanan cpuhrsch and suo, this change allows developers to create custom preparations for their modules before scripting. This is done by adding a `__prepare_scriptable__` method to a module which returns the prepared scriptable module out-of-place. It does not expand the API surface for end users. Prior art by jamesr66a: https://github.com/pytorch/pytorch/pull/42244 Test Plan: Imported from OSS Reviewed By: dongreenberg Differential Revision: D25500303 fbshipit-source-id: d3ec9005de27d8882fc29d02f0d08acd2a5c6b2c	2021-01-05 14:18:15 -08:00
Jiakai Liu	e71a13e8a3	[pytorch][codegen] migrate gen_variable_type to new data model (#49735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49735 This is the final wave of autograd codegen data model migration. After this PR: - autograd codegen no longer depends on Declarations.yaml; - autograd codegen sources are fully type annotated and pass mypy-strict check; To avoid potential merge conflicts with other pending PRs, some structural changes are intentionally avoided, e.g. didn't move inner methods out, didn't change all inner methods to avoid reading outer function's variables, and etc. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Confirmed clean mypy-strict run: ``` mypy --config mypy-strict.ini ``` Test Plan: Imported from OSS Reviewed By: ezyang, bhosmer Differential Revision: D25678879 Pulled By: ljk53 fbshipit-source-id: ba6e2eb6b9fb744208f7f79a922d933fcc3bde9f	2021-01-05 14:12:39 -08:00
Scott Wolchok	a272a7eeab	[PyTorch] Avoid heap allocations in inferUnsqueezeGeometry (#49497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49497 Noticed this thing spending relatively most of its time in malloc in perf. Optimize for typical tensor sizes. ghstack-source-id: 119318388 Test Plan: perf profile internal benchmark; saw inferUnsqueezeGeometry go from 0.30% exclusive 0.47% inclusive to 0.11% exclusive 0.16% inclusive. Differential Revision: D25596549 fbshipit-source-id: 3bbd2031645a4b9fe6f49a77d41db46826d0f632	2021-01-05 14:06:03 -08:00
Fritz Obermeyer	093aca082e	Enable distribution validation if __debug__ (#48743 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47123 Follows https://github.com/pyro-ppl/pyro/pull/2701 This turns on `Distribution` validation by default. The motivation is to favor beginners by providing helpful error messages. Advanced users focused on speed can disable validation by calling ```py torch.distributions.Distribution.set_default_validate_args(False) ``` or by disabling individual distribution validation via `MyDistribution(..., validate_args=False)`. In practice I have found many beginners forget or do not know about validation. Therefore I have [enabled it by default](https://github.com/pyro-ppl/pyro/pull/2701) in Pyro. I believe PyTorch could also benefit from this change. Indeed validation caught a number of bugs in `.icdf()` methods, in tests, and in PPL benchmarks, all of which have been fixed in this PR. ## Release concerns - This may slightly slow down some models. Concerned users may disable validation. - This may cause new `ValueErrors` in models that rely on unsupported behavior, e.g. `Categorical.log_prob()` applied to continuous-valued tensors (only {0,1}-valued tensors are supported). We should clearly note this change in release notes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48743 Reviewed By: heitorschueroff Differential Revision: D25304247 Pulled By: neerajprad fbshipit-source-id: 8d50f28441321ae691f848c55f71aa80cb356b41	2021-01-05 13:59:10 -08:00
Rong Rong (AI Infra)	e3c56ddde6	Revert D25757691: [pytorch][PR] Run mypy over test/test_utils.py Test Plan: revert-hammer Differential Revision: D25757691 (`c86cfcd81d`) Original commit changeset: 145ce3ae532c fbshipit-source-id: 3dfd68f0c42fc074cde15c6213a630b16e9d8879	2021-01-05 13:40:13 -08:00
vfdev-5	e442ac1e3f	Update MultiHeadAttention docstring (#49950 ) Summary: Fixes MultiHeadAttention docstring. Currently, https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html#torch.nn.MultiheadAttention is <img width="648" alt="Screen Shot 2020-12-29 at 21 06 43" src="https://user-images.githubusercontent.com/2459423/103311124-cd10cc00-4a19-11eb-89c9-0ee261364963.png"> and with the fix will be <img width="648" alt="Screen Shot 2020-12-29 at 22 41 35" src="https://user-images.githubusercontent.com/2459423/103315838-0dc31200-4a27-11eb-82e2-ca8f13d713a1.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/49950 Reviewed By: mrshenli Differential Revision: D25732573 Pulled By: zhangguanheng66 fbshipit-source-id: b362f3f617ab26b0dd25c3a0a7d4117e522e620c	2021-01-05 13:31:48 -08:00
Richard Barnes	9945fd7253	Drop unused imports from caffe2/python (#49980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49980 From ``` ./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/ ``` Test Plan: Standard sandcastle tests Reviewed By: xush6528 Differential Revision: D25727359 fbshipit-source-id: c4f60005b10546423dc093d31d46deb418352286	2021-01-05 13:17:46 -08:00
James Donald	eee849be8c	[caffe2][a10] Move down pragma pop to properly suppress warning 4522 (#49233 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49233 As the comments on line 160, say we should suppress this overly aggressive warning with MSVC: ``` caffe2\tensorbody.h_ovrsource#header-mode-symlink-tree-only,headers\aten\core\tensorbody.h(1223): warning C4522: 'at::Tensor': multiple assignment operators specified ``` However, in order to remove the warning, the closing brace of the class must be between the`#pragma warning` push and its corresponding pop. Move the pop down to ensure that. Test Plan: Built locally using clang for Windows without buck cache, confirmed the warning resolved Reviewed By: bhosmer Differential Revision: D25422447 fbshipit-source-id: c1e1c66fb8513af5f9d4e3c1dc48d0070c4a1f84	2021-01-05 13:13:22 -08:00
Pritam Damania	16e5af41da	Fix store based barrier to only use 'add'. (#49930 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49930 Certain store implementations don't work well when we use get() and add() on the same key. To avoid this issue, we only use add() in the store based barrier. The buggy store implementations can't be properly fixed due to legacy reasons. Test Plan: 1) unit tests. 2) waitforbuildbot Reviewed By: osalpekar Differential Revision: D25725386 fbshipit-source-id: 1535e2629914de7f78847b730f8764f92cde67e7	2021-01-05 12:46:24 -08:00
Rong Rong (AI Infra)	12ee7b61e7	support building with conda installed libraries (#50080 ) Summary: This should fix a bunch of share library compilation error when installed in conda lib, lib64 folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50080 Reviewed By: seemethere Differential Revision: D25781923 Pulled By: walterddr fbshipit-source-id: 78a74925981d65243b98bb99a65f1f2766e87a2f	2021-01-05 12:32:51 -08:00
Rohan Varma	e868825eb6	[RPC] Relax some profiling tests (#49983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49983 We have observed very rare flakiness in some profiling tests recently, i.e.: . However, we were not able to reproduce these even with thousands of runs on the CI machines where the failure was originally reported. As a result, relaxing these tests and re-enabling them to reduce failure rates. ghstack-source-id: 119352019 Test Plan: CI Reviewed By: mrshenli Differential Revision: D25739416 fbshipit-source-id: 4dbb6b30f20d3af94ba39f4a7ccf4fb055e440bc	2021-01-05 11:47:32 -08:00
Jagadish Krishnamoorthy	c115957df0	[distributed] Provide parameter to pass GPU ID in barrier function (#49069 ) Summary: For a multi GPU node, rank and corresponding GPU mapping can be different. Provide optional parameter to specify the GPU device number for the allreduce operation in barrier function. Add test cases to validate barrier device_ids. Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Fixes https://github.com/pytorch/pytorch/issues/48110 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49069 Reviewed By: mrshenli Differential Revision: D25658528 Pulled By: rohan-varma fbshipit-source-id: 418198b6224c8c1fd95993b80c072a8ff8f02eec	2021-01-05 11:27:54 -08:00
Nikolay Korovaiko	3cd2f1f3a7	Add an option to disable aten::cat in TE (re-revert) (#50101 ) Summary: This reverts commit ace78ddb6a2bdbf03f08c69767eba57306dd69ed. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/50101 Reviewed By: eellison Differential Revision: D25784785 Pulled By: Krovatkin fbshipit-source-id: cbb3d377e03303f6c8c71f4c59c6d90ab40d55f7	2021-01-05 11:08:11 -08:00
Meghan Lele	bbae6774c1	[JIT] Remove buffer metadata serialization forward-compat gate (#49990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49990 Summary This commit removes the forward-compatibility gate for buffer metadata serialization. It was introduced to allow versions of fbcode binaries statically linked against older versions of PyTorch (without buffer metadata in JIT) to deserialize archives produced by new versions of PyTorch. Enough time has probably passed that these old binaries don't exist anymore, so it should be safe to remove the gate. Test Plan Internal tests. Test Plan: Imported from OSS Reviewed By: xw285cornell Differential Revision: D25743199 Pulled By: SplitInfinity fbshipit-source-id: 58d82ab4362270b309956826e36c8bf9d620f081	2021-01-05 11:03:28 -08:00
Vasiliy Kuznetsov	04e86be1a2	eager quant: fix error with removing forward hooks (#49813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49813 https://github.com/pytorch/pytorch/issues/49739 reports a crash where removing forward hooks results in a ``` RuntimeError: OrderedDict mutated during iteration ``` Unfortunately I cannot repro this inside the PyTorch module, but the issue author has a good point and and we should not mutate the dict inside of the iteration. Test Plan: ``` // test plan from https://github.com/pytorch/pytorch/pull/46871 which // originally added this python test/test_quantization.py TestEagerModeQATOps ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25698725 fbshipit-source-id: 13069d0d5017a84038c8f7be439a3ed537938ac6	2021-01-05 11:00:20 -08:00
Vasiliy Kuznetsov	113b7623d6	quant: throw a nice error message for allclose with quantized inputs (#49802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49802 Currently `torch.allclose` is not supported with quantized inputs. Throw a nice error message instead of a cryptic one. Test Plan: ``` torch.allclose(x_fp32, y_fp32) torch.allclose(x_int8, y_int8) ``` Imported from OSS Reviewed By: supriyar Differential Revision: D25693538 fbshipit-source-id: 8958628433adfca3ae6ce215f3e3ec3c5e29994c	2021-01-05 10:55:34 -08:00
Vasiliy Kuznetsov	44c17b28c6	quant: nice error message on convtranspose with per-channel weight (#49899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49899 Per channel weights observer in conv transpose is not supported yet. Adding an error message which fails instantly instead of making the user wait until after calibration/training finishes. Test Plan: ``` python test/test_quantization.py TestPostTrainingStatic.test_convtranspose_per_channel_fails_early python test/test_quantization.py TestQuantizeFx.test_convtranspose_per_channel_fails_early ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25717151 fbshipit-source-id: 093e5979030ec185e3e0d56c45d7ce7338bf94b6	2021-01-05 09:38:57 -08:00
Vasiliy Kuznetsov	72306378b4	quant: ensure observers do not crash for empty Tensors (#49800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49800 Ensures that having a Tensor with 0 elements does not crash observers. Note: it's illegal to pass Tensors with 0 elements to reductions such as min and max, so we gate this out before the logic hits min/max. This should not be hit often in practice, but it's coming up during debugging of some RCNN models with test inputs. Test Plan: ``` python test/test_quantization.py TestObserver.test_zero_numel ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25693230 fbshipit-source-id: d737559697c98bd923356edacba895835060bb38	2021-01-05 09:35:47 -08:00
Ralf Gommers	c86cfcd81d	Run mypy over test/test_utils.py (#49654 ) Summary: This caught one incorrect annotation in `cpp_extension.load`. xref gh-16574. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49654 Reviewed By: heitorschueroff Differential Revision: D25757691 Pulled By: ezyang fbshipit-source-id: 145ce3ae532cc585d9ca3bbd5381401bad0072e2	2021-01-05 09:32:06 -08:00
Ralf Gommers	b7bfc723d3	Run mypy on more test files (#49658 ) Summary: Improves one annotation for `augment_model_with_bundled_inputs` Also add a comment to not work on caffe2 type annotations, that's not worth the effort - those ignores can stay as they are. xref gh-16574 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49658 Reviewed By: heitorschueroff Differential Revision: D25757721 Pulled By: ezyang fbshipit-source-id: 44c396d8da9ef3f41b97f9c46a528f0431c4b463	2021-01-05 09:28:38 -08:00
Natalia Gimelshein	e35b822d7d	fixes indices computation for trilinear interpolate backwards (#50084 ) Summary: https://github.com/pytorch/pytorch/issues/48675 had some typos in indices computations so that results for trilinear interpolation where height is not equal to width were wrong. This PR fixes it. cc xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50084 Reviewed By: BIT-silence Differential Revision: D25777083 Pulled By: ngimel fbshipit-source-id: 71be545628735fe875b7ea30bf6a09df4f2fae5c	2021-01-05 09:20:59 -08:00
Andrii Grynenko	52933b9923	Patch death tests/fork use after D25292667 (part 3) Summary: (Note: this ignores all push blocking failures!) Test Plan: unit tests Differential Revision: D25775357 fbshipit-source-id: 0ae3c59181bc123d763ed9c0d05c536998ae5ca0	2021-01-05 09:07:49 -08:00
Heitor Schueroff	ace78ddb6a	Revert D25763758: [pytorch][PR] introduce a flag to disable aten::cat in TE Test Plan: revert-hammer Differential Revision: D25763758 (`9e0b4a96e4`) Original commit changeset: c4f4a8220964 fbshipit-source-id: 98775ad9058b81541a010e646b0cf4864854be3e	2021-01-05 08:45:50 -08:00
yqtianust	3845770349	Fixing error in Readme.md. (#50033 ) Summary: Fix incorrect command in readme. Fix incorrect url in readme. Add url for dockerfile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50033 Reviewed By: ezyang Differential Revision: D25759567 Pulled By: mrshenli fbshipit-source-id: 2a3bc88c8717a3890090ddd0d6657f49d14ff05a	2021-01-05 08:22:49 -08:00
yqtianust	8c66aec435	Fix grammar typo in readme.md (#50000 ) Summary: missing ` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50000 Reviewed By: ezyang Differential Revision: D25759608 Pulled By: mrshenli fbshipit-source-id: 4dbe06b8978ae5b2b9b66cde163dab4bd8ee2257	2021-01-05 08:14:48 -08:00
Alex Henrie	e4d596c575	Fix return value of _vmap_internals._get_name (#49951 ) Summary: This appears to have been a copy-paste error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49951 Reviewed By: mrshenli Differential Revision: D25757099 Pulled By: zou3519 fbshipit-source-id: e47cc3b0694645bd0025326bfe45852ef0266adf	2021-01-05 07:00:48 -08:00
Hector Yuen	6e6231f9cd	unit test for fc parallelization aot (#50056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50056 buck test //caffe2/caffe2/contrib/fakelowp/test:test_chunkingnnpi -- --fallback-classic Test Plan: https://our.intern.facebook.com/intern/testinfra/testrun/7036874446100155 Reviewed By: venkatacrc Differential Revision: D25731079 fbshipit-source-id: 4aa4ffc641659cd90bf4670d28cb43e43ae76dcd	2021-01-05 00:27:43 -08:00
Wei Wang (Server LLVM)	ee80b45843	[TensorExpr] Fix LLVM 10 build after LLVM API changes Summary: Use `llvm::CodeGenFileType` for llvm-10+ Test Plan: local build Reviewed By: asuhan Differential Revision: D25694990 fbshipit-source-id: c35d973ef2669929715a94da5dd46e4a0457c4e8	2021-01-04 23:19:21 -08:00
Brandon Lin	c51455a7bb	[FX] fix Graph python_code return type annotation (#49931 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49931 This fixes #49932. The `maybe_return_annotation` was not being passed by reference, so it was never getting modified. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D25725582 Pulled By: esqu1 fbshipit-source-id: 4136ff169a269d6b98f0b8e14d95d19e7c7cfa71	2021-01-04 19:55:33 -08:00
anjali411	8fb5f16931	Complex backward for indexing, slicing, joining, and mutating ops (#49552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49552 This PR: 1. Migrates independent autograd test for `hstack`, `dstack`, `vstack`, `movedim`, `moveaxis` from `test_autograd.py` to the new `OpInfo` based tests. 2. Migrates autograd test for `gather`, `index_select` from the method_tests to the new `OpInfo` based tests. 2. Enables complex backward for `stack, gather, index_select, index_add_` and adds tests for complex autograd for all the above mentioned ops. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25682511 Pulled By: anjali411 fbshipit-source-id: 5d8f89db4a9ec340ab99a6196987d44a23e2c6c6	2021-01-04 19:44:15 -08:00
Nikolay Korovaiko	9e0b4a96e4	introduce a flag to disable aten::cat in TE (#49579 ) Summary: introduce a flag to disable aten::cat in TE Pull Request resolved: https://github.com/pytorch/pytorch/pull/49579 Reviewed By: eellison Differential Revision: D25763758 Pulled By: Krovatkin fbshipit-source-id: c4f4a8220964813202369a3383057e77e7f10cb0	2021-01-04 19:17:29 -08:00
shubhambhokare1	65122173ab	[ONNX] Modified var_mean symbolic to support more combinations of dims (#48949 ) Summary: Based on existing implementation of var_mean, values of dim have to be sequential and start with zero. The formats listed below are cause scenarios with incompatible dimension for the Sub node. -> dim[1, 2] -> dim[0, 2] -> dim[2, 0] The changes in this PR allow such formats to be supported in var_mean Pull Request resolved: https://github.com/pytorch/pytorch/pull/48949 Reviewed By: houseroad Differential Revision: D25540272 Pulled By: SplitInfinity fbshipit-source-id: 59813a77ff076d138655cc8c17953358f62cf137	2021-01-04 18:10:39 -08:00
Richard Barnes	d0369aabe1	Clean up some type annotations in caffe2/contrib/aten/gen_op (#49945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49945 Upgrades type annotations from Python2 to Python3 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25717502 fbshipit-source-id: 718d93e8614e9d050f4da1c6bd4ac892bab98154	2021-01-04 17:32:38 -08:00
Richard Barnes	a5339b9d7c	Drop unused imports from leftovers (#49953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49953 From ``` ./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/ ``` Test Plan: Standard sandcastle tests Reviewed By: xush6528 Differential Revision: D25727348 fbshipit-source-id: b3feef80b9b4b535f1bd4060dace5b1a50bd5e69	2021-01-04 16:31:48 -08:00
Richard Barnes	5acb1cc1df	Drop unused imports from scripts (#49956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49956 From ``` ./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/ ``` Test Plan: Standard sandcastle tests Reviewed By: xush6528 Differential Revision: D25727347 fbshipit-source-id: 74d0a08aa0cfd0f492688a2b8278a0c65fd1deba	2021-01-04 16:08:28 -08:00
Elias Ellison	efe1fc21fc	Dont inlinine intermediates on cpu (#49565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49565 Test Plan: Imported from OSS Reviewed By: Krovatkin, ZolotukhinM Differential Revision: D25688271 Pulled By: eellison fbshipit-source-id: 9ea7858e2db4fb31292e04440fc72ee04623c688	2021-01-04 15:46:20 -08:00
David	c439a6534d	[ONNX] Handle Sub-block index_put in _jit_pass_onnx_remove_inplace_ops_for_onnx (#48734 ) Summary: For the added UT and existing UTs, this code is independent and ready for review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48734 Reviewed By: izdeby Differential Revision: D25502677 Pulled By: bzinodev fbshipit-source-id: 788b4eaa5e5e8b5df1fb4956fbd25928127bb199	2021-01-04 15:11:10 -08:00
Richard Barnes	240c0b318a	Suppress "statement is unreachable" warning (#49495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49495 Compiling PyTorch currently generates a large number of warnings like this: ``` caffe2/aten/src/ATen/core/builtin_function.h(105): warning: statement is unreachable ``` The offending code ``` std::string pretty_print_schema() const override { TORCH_INTERNAL_ASSERT(false); return ""; } ``` has an unreachable return which prevents a "no return" warning. We resolve the situation by using NVCC's pragma system to suppress this warning within this function. Test Plan: The warning appears when running: ``` buck build mode/dev-nosan //caffe2/torch/fb/sparsenn:test ``` As well as a number of other build commands. Reviewed By: ngimel Differential Revision: D25546542 fbshipit-source-id: 71cddd4fdb5fd16022a6d7b2daf0e6d55e6e90e2	2021-01-04 14:53:47 -08:00
mattip	f96ce3305c	prohibit assignment to a sparse tensor (#50040 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48225 by prohibiting assignment to a sparse Tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50040 Reviewed By: mrshenli Differential Revision: D25757125 Pulled By: zou3519 fbshipit-source-id: 3db6f48932eb10bf6ca5e97a6091afcabb60e478	2021-01-04 14:38:35 -08:00
Rong Rong (AI Infra)	71766d89ea	[BE] unified run_process_no_exception code (#49774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49774 Reviewed By: janeyx99 Differential Revision: D25756811 Pulled By: walterddr fbshipit-source-id: 4d2b3bd772572764ff96e5aad70323b58393e332	2021-01-04 13:43:09 -08:00
Gregory Chanan	74dcb6d363	torch.xlogy: Use wrapped_scalar_tensor / gpu_with_scalars to speed up GPU kernel. (#49926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49926 While investigating https://github.com/pytorch/pytorch/issues/49758, I changed the xlogy kernel to use the recommended wrapped_scaler_tensor pattern instead of moving the scalar to the GPU as a tensor. While this doesn't avoid a synchronization (there is no synchronization in the move, as its done via fill), this does significantly speed up the GPU kernel (almost ~50%, benchmark in PR comments). From looking at the nvprof output, it looks like this code path avoids broadcasting. Aside: this seems unnecessary, as there is nothing special from the point-of-view of broadcasting whether the Tensor is ()-sized or marked as a wrapped_scalar. Still, this is a useful change to make as we avoid extra kernel launches and dispatches to create and fill the tensor. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25724215 Pulled By: gchanan fbshipit-source-id: 4adcd5d8b3297502672ffeafc77e8af80592f460	2021-01-04 12:42:08 -08:00
Kaiwen Wang	483670ff0f	[pytorch] add threshold_backward batching for vmap (#49881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49881 title Test Plan: pytest test/test_vmap.py -v -k "BatchedGrad" Reviewed By: zou3519 Differential Revision: D25711289 fbshipit-source-id: f1856193249fda70da41e36e15bc26ea7966b510	2021-01-04 12:24:05 -08:00
Erjia Guan	da790eca69	Add trace batching forward/backward rule (#49979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49979 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D25734379 Pulled By: ejguan fbshipit-source-id: 8f9346afaf324e7ab17bafd6ecc97eed8442fd38	2021-01-04 12:04:55 -08:00
Edward Yang	0216366f0d	Make use_c10_dispatcher: full mandatory for structured kernels (#49490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49490 No reason to let people to do the legacy thing for the brand new kernel. This simplifies the codegen. I have to port the two structured kernels to this new format. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25595406 Pulled By: ezyang fbshipit-source-id: b5931873379afdd0f3b00a012e0066af05de0a69	2021-01-04 11:59:24 -08:00
Edward Yang	6c833efd65	Move default or no default logic into native.argument (#49489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49489 Previously, it was done at a use site, but that meant other use sites don't get the right logic. Pushing it in makes sure everyone gets it. I also fixed one case of confusion where defn() was used to define a decl(). If you want to define a declaration with no defaults, say no_default().decl() which is more direct and will give us code reviewers a clue if you should have pushed this logic in. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25595407 Pulled By: ezyang fbshipit-source-id: 89c664f0ed4d95699794a0d3123d54d0f7e4cba4	2021-01-04 11:59:20 -08:00
Edward Yang	8eee8460f8	codegen: Resolve overload ambiguities created by defaulted arguments (#49348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49348 This is a redux of #45666 post refactor, based off of `d534f7d4c5` Credit goes to peterbell10 for the implementation. Fixes #43945. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25594004 Pulled By: ezyang fbshipit-source-id: c8eb876bb3348308d6dc8ba7bf091a2a3389450f	2021-01-04 11:59:16 -08:00
Edward Yang	7202c0ec50	Tighten up error checking on manual_kernel_registration (#49341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49341 I noticed that #49097 was using manual_kernel_registration incorrectly, so this diff tightens up the testing so that: 1. We don't generate useless wrapper functions when manual_kernel_registration is on (it's not going to be registered, so it does nothing). 2. manual_kernel_registration shouldn't affect generation of functions in Functions.h; if you need to stop bindings, use manual_cpp_binding 3. Structured and manual_kernel_registration are a hard error 4. We raise an error if you set dispatch and manual_kernel_registration at the same time. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25594003 Pulled By: ezyang fbshipit-source-id: 655b10e9befdfd8bc95f1631b2f48f995a31a59a	2021-01-04 11:59:12 -08:00
Edward Yang	8e20594b38	Construct CppSignatureGroup from NativeFunction (#49245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49245 This will make it easier to implement the POC in `d534f7d4c5` see also https://github.com/pytorch/pytorch/pull/45666 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25594005 Pulled By: ezyang fbshipit-source-id: e458d3dc3a765ec77425761b9b17f23769cecf9e	2021-01-04 11:55:28 -08:00
Eli Uriegas	f0945537af	.circleci: Ignore unbound variables for conda (#50053 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50053 For some reason conda likes to re-activate the conda environment when attempting this install which means that a deactivate is run and some variables might not exist when that happens, namely CONDA_MKL_INTERFACE_LAYER_BACKUP from libblas so let's just ignore unbound variables when it comes to the conda installation commands Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D25760737 Pulled By: seemethere fbshipit-source-id: 9e7720eb8a4f8028dbaa7bcfc304e5c1ca73ad08	2021-01-04 11:34:28 -08:00
Sebastian Messmer	69ca5e1397	Enforce c10-fullness for all ops (#49619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49619 This is a minimal-change PR that enforces that all operators are c10-full by making it the default. This does not clean up any code yet, that will happen in PRs stacked on top. But this PR already ensures that there are no non-c10-full ops left and there will be no non-c10-full ops introduced anymore. ghstack-source-id: 119269182 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D25650198 fbshipit-source-id: efc53e884cb53193bf58a4834bf148453e689ea1	2021-01-04 11:26:53 -08:00
Thomas Viehmann	6e84a018be	move to non-legacy magma v2 headers (#49978 ) Summary: We recently (https://github.com/pytorch/pytorch/issues/7582) dropped magma v1 support, but we were still including the legacy compatibility headers and using functions only provided by them. This changes the includes to the new magma_v2 header and fixes the triangular solve functions to use the v2-style magma_queue-using API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49978 Reviewed By: mrshenli Differential Revision: D25752499 Pulled By: ngimel fbshipit-source-id: 26d916bc5ce63978b341aefb072af228f140637d	2021-01-04 11:18:53 -08:00
Jeffrey Wan	fdb81c538a	Improve `torch.flatten` docs and add tests to test_view_ops (#49501 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/39474 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49501 Reviewed By: mrshenli Differential Revision: D25740586 Pulled By: soulitzer fbshipit-source-id: 3d7bdbab91eb208ac9e6832bb766d9d95a00c103	2021-01-04 11:11:34 -08:00
Yuxin Wu	b76822eb49	Update update_s3_htmls.yml (#49934 ) Summary: It is now running for forks, and generates a lot of failure message to owner of forks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49934 Reviewed By: mruberry Differential Revision: D25739552 Pulled By: seemethere fbshipit-source-id: 0f9cc430316c0a5e9972de3cdd06d225528c81c2	2021-01-04 10:14:14 -08:00
Nikita Shulga	22bd277891	Run test_type_hints first (#49748 ) Summary: Since it sort of a liner check and fails frequently Pull Request resolved: https://github.com/pytorch/pytorch/pull/49748 Reviewed By: vkuzo Differential Revision: D25682980 Pulled By: malfet fbshipit-source-id: 7dba28242dced0277bad56dc887d3273c1e9e575	2021-01-04 09:33:13 -08:00
Guilherme Leobas	211f35631f	Add type annotations to _tensorboard_vis.py and hipify_python.py (#49834 ) Summary: closes gh-49833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49834 Reviewed By: mruberry Differential Revision: D25725341 Pulled By: malfet fbshipit-source-id: 7454c7afe07a3ff829826afe02aba05b7f649d9b	2021-01-04 09:29:51 -08:00
Sebastian Messmer	c7e9abb66a	Making ops c10-full: list of optional tensors (#49138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49138 See for details: https://fb.quip.com/QRtJAin66lPN We need to model optional types explicitly, mostly for schema inference. So we cannot pass a `Tensor?[]` as `ArrayRef<Tensor>`, instead we need to pass it as an optional type. This PR changes it to `torch::List<c10::optional<Tensor>>`. It also makes the ops c10-full that were blocked by this. ## Backwards Compatibility - This should not break the Python API because the representation in Python is the same and python_arg_parser just transforms the python list into a `List<optional<Tensor>>` instead of into a `List<Tensor>`. - This should not break serialized models because there's some logic that allows loading a serialized `List<Tensor>` as `List<optional<Tensor>>`, see https://github.com/pytorch/pytorch/pull/49138/files#diff-9315f5dd045f47114c677174dcaa2f982721233eee1aa19068a42ff3ef775315R57 - This will break backwards compatibility for the C++ API. There is no implicit conversion from `ArrayRef<Tensor>` (which was the old argument type) to `List<optional<Tensor>>`. One common call pattern is `tensor.index({indices_tensor})`, where indices_tensor is another `Tensor`, and that will continue working because the `{}` initializer_list constructor for `List<optional<Tensor>>` can take `Tensor` elements that are implicitly converted to `optional<Tensor>`, but another common call pattern was `tensor.index(indices_tensor)`, where previously, the `Tensor` got implicitly converted to an `ArrayRef<Tensor>`, and to implicitly convert `Tensor -> optional<Tensor> -> List<optional<Tensor>>` would be two implicit conversions. C++ doesn't allow chaining. two implicit conversions. So those call sites have to be rewritten to `tensor.index({indices_tensor})`. ghstack-source-id: 119269131 Test Plan: ## Benchmarks (C++ instruction counts): ### Forward #### Script ```py from torch.utils.benchmark import Timer counts = Timer( stmt=""" auto t = {{op call to measure}}; """, setup=""" using namespace torch::indexing; auto x = torch::ones({4, 4, 4}); """, language="cpp", ).collect_callgrind(number=1_000) print(counts) ``` #### Results \| Op call \|before \|after \|delta \| \| \|------------------------------------------------------------------------\|---------\|--------\|-------\|------\| \|x[0] = 1 \|11566015 \|11566015\|0 \|0.00% \| \|x.index({0}) \|6807019 \|6801019 \|-6000 \|-0.09%\| \|x.index({0, 0}) \|13529019 \|13557019\|28000 \|0.21% \| \|x.index({0, 0, 0}) \|10677004 \|10692004\|15000 \|0.14% \| \|x.index({"..."}) \|5512015 \|5506015 \|-6000 \|-0.11%\| \|x.index({Slice(None, None, None)}) \|6866016 \|6936016 \|70000 \|1.02% \| \|x.index({None}) \|8554015 \|8548015 \|-6000 \|-0.07%\| \|x.index({false}) \|22400000 \|22744000\|344000 \|1.54% \| \|x.index({true}) \|27624088 \|27264393\|-359695\|-1.30%\| \|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})})\|123472000\|123463306\|-8694\|-0.01%\| ### Autograd #### Script ```py from torch.utils.benchmark import Timer counts = Timer( stmt=""" auto t = {{op call to measure}}; """, setup=""" using namespace torch::indexing; auto x = torch::ones({4, 4, 4}, torch::requires_grad()); """, language="cpp", ).collect_callgrind(number=1_000) print(counts) ``` Note: the script measures the forward path of an op call with autograd enabled (i.e. calls into VariableType). It does not measure the backward path. #### Results \| Op call \|before \|after \|delta \| \| \|------------------------------------------------------------------------\|---------\|--------\|-------\|------\| \|x.index({0}) \|14839019\|14833019\|-6000\| 0.00% \| \|x.index({0, 0}) \|28342019\|28370019\|28000\| 0.00% \| \|x.index({0, 0, 0}) \|24434004\|24449004\|15000\| 0.00% \| \|x.index({"..."}) \|12773015\|12767015\|-6000\| 0.00% \| \|x.index({Slice(None, None, None)}) \|14837016\|14907016\|70000\| 0.47% \| \|x.index({None}) \|15926015\|15920015\|-6000\| 0.00% \| \|x.index({false}) \|36958000\|37477000\|519000\| 1.40% \| \|x.index({true}) \|41971408\|42426094\|454686\| 1.08% \| \|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})}) \|168184392\|164545682\|-3638710\| -2.16% \| Reviewed By: bhosmer Differential Revision: D25454632 fbshipit-source-id: 28ab0cffbbdbdff1c40b4130ca62ee72f981b76d	2021-01-04 05:04:02 -08:00
Akshit Khurana	e44b2b72bd	Back out "[pytorch][PR] Preserve memory format in qconv op" (#49994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49994 Revert preserving memory format in qconv op because it is negatively affecting performance, will revert revert after fixing all issues Test Plan: pytest fbcode/caffe2/test/quantization/test_quantized_op.py Reviewed By: kimishpatel Differential Revision: D25731279 fbshipit-source-id: 908dbb127210a93b27ada7ccdfa531177edf679a	2021-01-03 00:11:40 -08:00
Samuel Marks	8aad66a7bd	[c10/**] Fix typos (#49815 ) Summary: All pretty minor. I avoided renaming `class DestructableMock` to `class DestructibleMock` and similar such symbol renames (in this PR). Pull Request resolved: https://github.com/pytorch/pytorch/pull/49815 Reviewed By: VitalyFedyunin Differential Revision: D25734507 Pulled By: mruberry fbshipit-source-id: bbe8874a99d047e9d9814bf92ea8c036a5c6a3fd	2021-01-01 02:11:56 -08:00
Ilia Cherniavskii	749f8b7850	Remove flops warnings from the default profiler use case (#49896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49896 Add missing check for with_flops option set Test Plan: python test/test_profiler.py CI Reviewed By: xuzhao9, ngimel Differential Revision: D25716930 Pulled By: ilia-cher fbshipit-source-id: 0da0bbb6c1a52328f665237e503406f877b41449	2020-12-30 23:49:29 -08:00
Jeffrey Wan	de3d8f8c35	Revert D25734450: [pytorch][PR] Improve `torch.flatten` docs and add tests to test_view_ops Test Plan: revert-hammer Differential Revision: D25734450 (`730965c246`) Original commit changeset: 993667dd07ac fbshipit-source-id: 603af25311fc8b29bb033167f3b2704da79c3147	2020-12-30 22:04:43 -08:00
Jeffrey Wan	4677fc69a2	Fix inf norm grad (reland) (#48611 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/48122 Does this result in a regression? No significant regression observed. Timer script: ``` import torch from torch.utils.benchmark import Timer setup=""" a = torch.rand((2, 2), requires_grad=True) gradient = torch.ones(2) """ stmt=""" torch.autograd.grad(torch.norm(a, dim=(0,), keepdim=False), a, gradient) """ timer = Timer(stmt, setup) print(timer.timeit(10000)) print(timer.collect_callgrind(100)) ``` Note: small matrix, keepdim is False, and dims is non-empty Before change ``` Runtime 37.37 us 1 measurement, 10000 runs , 1 thread All Noisy symbols removed Instructions: 15279045 15141710 Baseline: 4257 3851 100 runs per measurement, 1 thread ``` After change ``` Runtime 36.08 us 1 measurement, 10000 runs , 1 thread All Noisy symbols removed Instructions: 15296974 15153534 Baseline: 4257 3851 100 runs per measurement, 1 thread ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48611 Reviewed By: albanD, mruberry Differential Revision: D25309997 Pulled By: soulitzer fbshipit-source-id: 5fb950dc9259234342985c0e84ada25a7e3814d6	2020-12-30 21:13:33 -08:00
Jeffrey Wan	730965c246	Improve `torch.flatten` docs and add tests to test_view_ops (#49501 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/39474 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49501 Reviewed By: mruberry Differential Revision: D25734450 Pulled By: soulitzer fbshipit-source-id: 993667dd07acd81a4616465e0a3b94bde449193e	2020-12-30 20:35:46 -08:00
Natalia Gimelshein	cd608fe59b	Revert D25719980: [pytorch][PR] Accept input tensor with 0-dim batch size for MultiLabelMarginLoss Test Plan: revert-hammer Differential Revision: D25719980 (`6b56b71e61`) Original commit changeset: 83414bad37c0 fbshipit-source-id: 27eddd711a2b9e0adbc08bfab12100562e63ac21	2020-12-30 17:06:28 -08:00
Martin Yuan	46afd7fc9f	[PyTorch] Decouple version numbers from c10 and caffe2 targets (#49905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49905 There's size regression in model delivery in D25682312. Only the model version numbers are used. However, the dependency of the entire c10 (128 KB) is pulled in. This diff is to decouple the version numbers to a separate header file, versions.h. Other targets referring to version numbers only can have deps of ```caffe2:version_headers```. ghstack-source-id: 119161467 Test Plan: CI Reviewed By: xcheng16, guangyfb Differential Revision: D25716601 fbshipit-source-id: 07634bcf46eacfefa4aa75f2e4c9b9ee30c6929d	2020-12-30 15:34:01 -08:00
Zafar	04a8412b86	[quant] Quantizable LSTM (#49671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49671 - Introduces the `torch.nn.quantizable` namespace - Adds the `torch.nn.quantizable.LSTM` module The point of the `quantizable` namespace is to segregate the purely quantized modules with the modules that could be quantized through a normal quantization flow, but are not using the quantized kernels explicitly. That means the quantizable modules are functionally and numerically equivalent to the FP ones and can be used instead of the FP ones without any loss. The main difference between the `torch.nn.LSTM` and the `torch.nn.quantizable.LSTM` is that the former one does not support observation for the linear layers, because all the computation is internal to the `aten` namespace. The `torch.nn.quantizable.LSTM`, however, uses explicit linear layers that can be observed for further quantization. Test Plan: Imported from OSS Differential Revision: D25663870 Reviewed By: vkuzo Pulled By: z-a-f fbshipit-source-id: 70ff5463bd759b9a7922571a5712d3409dfdfa06	2020-12-30 15:21:38 -08:00
Vasiliy Kuznetsov	ffbb68af8a	quant docs: add common errors section (#49902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49902 Adds a common errors section, and details the two errors we see often on the discuss forums, with recommended solutions. Test Plan: build the docs on Mac OS, the new section renders correctly. Reviewed By: supriyar Differential Revision: D25718195 Pulled By: vkuzo fbshipit-source-id: c5ef2b24831d18d57bbafdb82d26d8fbf3a90781	2020-12-30 15:01:59 -08:00
Ashkan Aliabadi	a7e1f4f37a	Remove incorrect usage of layout(std430) on uniform buffers, correctly now treated as error in the latest release of Vulkan SDK. (#49572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49572 Differential Revision: D25729888 Test Plan: Imported from OSS Reviewed By: SS-JIA Pulled By: AshkanAliabadi fbshipit-source-id: 15dd4acef3dfae72f03e7e3085b1ff5936becf3d	2020-12-30 14:53:41 -08:00
Ralf Gommers	6a951a6f4c	Fix a KaTeX crash and many docstring issues (#49684 ) Summary: The first commit fixes the `MultiheadAttention` docstrings, which are causing a cryptic KaTeX crash. The second commit fixes many documentation issues in `torch/_torch_docs.py`, and closes gh-43667 (missing "Keyword arguments" headers). It also fixes a weird duplicate docstring for `torch.argmin`; there's more of these, it looks like they were written based on whether the C++ implementation has an overload. That makes little sense to a Python user though, and the content is simply duplicate. The `Shape:` heading for https://pytorch.org/docs/master/generated/torch.nn.MultiheadAttention.html looked bad, here's what it looks like with this PR: <img width="475" alt="image" src="https://user-images.githubusercontent.com/98330/102797488-09a44e00-43b0-11eb-8788-acdf4e936f2f.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/49684 Reviewed By: ngimel Differential Revision: D25730909 Pulled By: mruberry fbshipit-source-id: d25bcf8caf928e7e8e918017d119de12e10a46e9	2020-12-30 14:17:39 -08:00
Sameer Deshmukh	6b56b71e61	Accept input tensor with 0-dim batch size for MultiLabelMarginLoss (#46975 ) Summary: Fix for one of the layers listed in https://github.com/pytorch/pytorch/issues/12013 or https://github.com/pytorch/pytorch/issues/38115 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46975 Reviewed By: mruberry Differential Revision: D25719980 Pulled By: ngimel fbshipit-source-id: 83414bad37c0b004bc7cced04df8b9c89bdba3e6	2020-12-30 13:29:26 -08:00
kshitij12345	42d2e31cd6	[numpy] `torch.rsqrt` : promote integer inputs to float (#47909 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47909 Reviewed By: ngimel Differential Revision: D25730876 Pulled By: mruberry fbshipit-source-id: c87a8f686e1dd64e511640e0278021c4a584ccf2	2020-12-30 10:33:14 -08:00
Venkata Chintapalli	b54ad08978	Enable test_fusions TanhQuantize (#49970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49970 enable test_fusions:test_tanhquantize Test Plan: https://internalfb.com/intern/testinfra/testrun/6755399469176694 Reviewed By: hyuen Differential Revision: D25732684 fbshipit-source-id: b8479e43b5248ba5510f0c78c993d534d3ffc2b0	2020-12-30 10:00:39 -08:00
Qifan Lu	cfc3db0ca9	Remove THPWrapper (#49871 ) Summary: Remove `THPWrapper` from PyTorch C code since it is not used anymore and because we have dropped Python 2 compatibility, its usage can be replaced by capsule objects (`PyCapsule_New`, `PyCapsule_CheckExact`, `PyCapsule_GetPointer` and `PyCapsule_GetDestructor`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49871 Reviewed By: mruberry Differential Revision: D25715038 Pulled By: albanD fbshipit-source-id: cc3b6f967bbe0dc42c692adf76dff4e4b667fdd5	2020-12-30 03:01:52 -08:00
Nikitha Malgi	12b73fdbbf	Adding JIT support for cuda streams and events (#48020 ) Summary: ======= This PR addresses the following: * Adds JIT support for CUDA Streams * Adds JIT support for CUDA Events * Adds JIT support for CUDA Stream context manager Testing: ====== python test/test_jit.py -v TestCUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/48020 Reviewed By: navahgar Differential Revision: D25725749 Pulled By: nikithamalgifb fbshipit-source-id: b0addeb49630f8f0c430ed7badeca43bb9d2535c	2020-12-29 20:24:57 -08:00
anjali411	97c17b4772	Fix auto exponent issue for torch.pow (#49809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49809 Fixes https://github.com/pytorch/xla/issues/2688 #46936 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D25724176 Pulled By: anjali411 fbshipit-source-id: 16287a1f481e9475679b99d6fb45de840da225be	2020-12-29 17:02:56 -08:00
Jony Karki	e482c70a3d	added List as an option to the unflattened_size (#49838 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49743 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49838 Reviewed By: mruberry Differential Revision: D25727971 Pulled By: ngimel fbshipit-source-id: 60142dae84ef107f0083676a2a78ce6b0472b7e1	2020-12-29 16:50:37 -08:00
Mike Ruberry	01b57e1810	Revert D25718705: Clean up type annotations in caffe2/torch/nn/modules Test Plan: revert-hammer Differential Revision: D25718705 (`891759f860`) Original commit changeset: 6a9e3e6d17aa fbshipit-source-id: 1a4ef0bfdec8eb8e7ce149bfbdb34a4ad8d964b6	2020-12-29 16:42:26 -08:00
Richard Barnes	14edc726d9	Clean up some type annotations in caffe2/torch/quantization (#49942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49942 Upgrades type annotations from Python2 to Python3 Test Plan: Sandcastle tests Reviewed By: vkuzo Differential Revision: D25717551 fbshipit-source-id: 1b63dc485ecf6641641b05f7ce095ae1d2d87346	2020-12-29 15:43:50 -08:00
Protonu Basu	4c5a4dbb8c	[Tensorexpr]Copying header files in tensorexpr dir (#49933 ) Summary: Previously header files from jit/tensorexpr were not copied, this PR should enable copying. This will allow other OSS projects like Glow to used TE. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49933 Reviewed By: Krovatkin, mruberry Differential Revision: D25725927 Pulled By: protonu fbshipit-source-id: 9d5a0586e9b73111230cacf044cd7e8f5c600ce9	2020-12-29 15:18:52 -08:00
Richard Barnes	891759f860	Clean up type annotations in caffe2/torch/nn/modules (#49938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49938 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25718705 fbshipit-source-id: 6a9e3e6d17aa458726cd32aa0a71a63c51b601d9	2020-12-29 14:04:52 -08:00
Edvard Ghazaryan	a111a9291c	added fuse_op and list_construct - list_unpack pass Summary: Added fuse_op and list_construct and list_unpack pass Test Plan: jit_graph_opt_test.py jit_graph_optimizer_test.cc sparsenn_fused_operator_test.py Reviewed By: qizzzh Differential Revision: D25715079 fbshipit-source-id: fa976be53135a83f262b8f2e2eaedadd177f46c4	2020-12-29 12:29:53 -08:00
peter	8d7338e820	Enable tests using named temp files on Windows (#49640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49640 Reviewed By: ngimel Differential Revision: D25681548 Pulled By: malfet fbshipit-source-id: 0e2b25817c98d749920cb2b4079033a2ee8c1456	2020-12-29 09:57:35 -08:00
Gregory Chanan	d434ac35e4	Update gather documentation to allow index.shape[k] <= input.shape[k] rather than ==. (#41887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41887 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D22680014 Pulled By: gchanan fbshipit-source-id: b162fccabc22a1403c0c43c1131f0fbf4689a79d	2020-12-29 07:28:48 -08:00
Ansley Ussery	c619892482	Fix errata (#49903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49903 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25718411 Pulled By: ansley fbshipit-source-id: 0cc365c5a53077752dc1c5a5c4a65b873baa3604	2020-12-28 20:40:41 -08:00
Antonio Cuni	361f5ed91d	Implement torch.linalg.qr (#47764 ) Summary: I am opening this PR early to have a place to discuss design issues. The biggest difference between `torch.qr` and `numpy.linalg.qr` is that the former `torch.qr` takes a boolean parameter `some=True`, while the latter takes a string parameter `mode='reduced'` which can be one of the following: `reduced` this is completely equivalent to `some=True`, and both are the default. `complete` this is completely equivalent to `some=False`. `r` this returns only `r` instead of a tuple `(r, q)`. We have already decided that we don't want different return types depending on the parameters, so I propose to return `(r, empty_tensor)` instead. I think that in this mode it will be impossible to implement the backward pass, so we should raise an appropriate error in that case. `raw` in this mode, it returns `(h, tau)` instead of `(q, r)`. Internally, `h` and `tau` are obtained by calling lapack's `dgeqrf` and are later used to compute the actual values of `(q, r)`. The numpy docs suggest that these might be useful to call other lapack functions, but at the moment none of them is exposed by numpy and I don't know how often it is used in the real world. I suppose the implementing the backward pass need attention to: the most straightforward solution is to use `(h, tau)` to compute `(q, r)` and then use the normal logic for `qr_backward`, but there might be faster alternatives. `full`, `f` alias for `reduced`, deprecated since numpy 1.8.0 `economic`, `e` similar to `raw but it returns only `h` instead of `(h, tau). Deprecated since numpy 1.8.0 To summarize: * `reduce`, `complete` and `r` are straightforward to implement. * `raw` needs a bit of extra care, but I don't know how much high priority it is: since it is used rarely, we might want to not support it right now and maybe implement it in the future? * I think we should just leave `full` and `economic` out, and possibly add a note to the docs explaining what you need to use instead /cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/47764 Reviewed By: ngimel Differential Revision: D25708870 Pulled By: mruberry fbshipit-source-id: c25c70a23a02ec4322430d636542041e766ebe1b	2020-12-28 17:28:17 -08:00
Vasiliy Kuznetsov	bc4ff7ba05	fx quant: split linear test cases (#49740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49740 1. Separates the module and functional linear test cases. 2. Combines the test case which tests for linear bias observation into the main linear test case, as requested in https://github.com/pytorch/pytorch/pull/49628. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_linear_module python test/test_quantization.py TestQuantizeFxOps.test_linear_functional ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25681272 fbshipit-source-id: 0ed0ebd5afb8cdb938b530f7dbfbd79798eb9318	2020-12-28 14:30:25 -08:00
Vasiliy Kuznetsov	ea558b2135	fx quant: hook up ConvTranspose{n}d (#49717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49717 Quantization of `ConvTranpose{n}d` is supported in Eager mode. This PR adds the support for FX graph mode. Note: this currenlty only works in `qnnpack` because per-channel weights are not supported by quantized conv transpose. In a future PR we should throw an error when someone tries to quantize a ConvTranspose model with per-channel weight observers until this is fixed. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_conv_transpose_1d python test/test_quantization.py TestQuantizeFxOps.test_conv_transpose_2d ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25674636 fbshipit-source-id: b6948156123ed55db77e6337bea10db956215ae6	2020-12-28 14:27:07 -08:00
Elias Ellison	fc559bd6dc	[JIT] Constant prop getattr (#49806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49806 Fix for https://github.com/pytorch/pytorch/issues/47089 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D25696791 Pulled By: eellison fbshipit-source-id: 914c17b8effef7f4f341775ac2b8150ee4703efd	2020-12-28 10:44:53 -08:00
Elias Ellison	268441c7d8	[NNC] masked fill (#49627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49627 There was a bug in the test that was hidden by the `If eager mode doesn't support a dtype/op/device combo` try / catch, so cuda wasn't being tested � The fix is just to rename `aten::masked_fill` to `aten_masked_fill`. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D25696409 Pulled By: eellison fbshipit-source-id: 83de1f5a194df54fe317b0035d4a6c1aed1d19a0	2020-12-28 10:37:02 -08:00
Ansley Ussery	58fe67967c	Support the `in` operator with str (#47057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47057 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D24863370 Pulled By: ansley fbshipit-source-id: 5d17165b06052f0a4676537c5f6757083185a591	2020-12-28 10:26:24 -08:00
Samuel Marks	e6779d4357	[*.py] Rename "Arguments:" to "Args:" (#49736 ) Summary: I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings. ```sh (pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" \| paste -s -d+ -- \| bc)"; done Args: 1095 Arguments: 0336 ``` It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per: - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md) - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md) - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst) Therefore, only `Args:` is valid. This PR replaces them throughout the codebase. PS: For related PRs, see tensorflow/tensorflow/pull/45420 PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736 Reviewed By: albanD Differential Revision: D25710534 Pulled By: soumith fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619	2020-12-28 09:34:47 -08:00
Rong Rong (AI Infra)	9c64b9ffba	early termination of CUDA tests (#49869 ) Summary: This is follow up on https://github.com/pytorch/pytorch/issues/49799. * uses `torch.cuda.synchronize()` to validate CUDA assert instead of inspecting error message. * remove non CUDA tests. hopefully can reproduce why slow_tests fails but not normal test. since the test still runs for >1min. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49869 Reviewed By: mruberry Differential Revision: D25714385 Pulled By: walterddr fbshipit-source-id: 04f8ccb50d8c9ee42826a216c49baf90285b247f	2020-12-28 09:18:00 -08:00
kshitij12345	963f7629b5	[numpy] `torch.digamma` : promote integer inputs to float (#48302 ) Summary: BC-breaking Note: This PR updates PyTorch's digamma function to be consistent with SciPy's special.digamma function. This changes the result of the digamma function on the nonpositive integers, where the gamma function is not defined. Since the gamma function is undefined at these points, the (typical) derivative of the logarithm of the gamma function is also undefined at these points, and for negative integers this PR updates digamma to return NaN. For zero, however, it returns -inf to be consistent with SciPy. Interestingly, SciPy made a similar change, which was noticed by at least one user: https://github.com/scipy/scipy/issues/9663#issue-396587679. SciPy's returning of negative infinity at zero is intentional: `59347ae8b8/scipy/special/cephes/psi.c (L163)` This change is consistent with the C++ standard for the gamma function: https://en.cppreference.com/w/cpp/numeric/math/tgamma PR Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48302 Reviewed By: ngimel Differential Revision: D25664087 Pulled By: mruberry fbshipit-source-id: 1168e81e218bf9fe5b849db0e07e7b22e590cf73	2020-12-24 22:42:55 -08:00
Mike Ruberry	46cf6d332f	Revert D25684692: [quant][graphmode][fx] Standalone module support {input/output}_quantized_idxs Test Plan: revert-hammer Differential Revision: D25684692 (`89b4899ea5`) Original commit changeset: 900360e01c0e fbshipit-source-id: 8b65fa8fbc7b364fbddb5f23cc696cd9b7db98cd	2020-12-24 15:50:52 -08:00
Summer Deng	ec6de6a697	Clip small scales to fp16 min Summary: When the FC output min max range is very small, we want to enforce a cutoff on the scale parameter to better generalize for future values that could fall beyond the original range. Test Plan: More analysis about the output distributions can be found in N425166 An example workflow using fp16 min clipping is f240972205 Reviewed By: jspark1105 Differential Revision: D25681249 fbshipit-source-id: c4dfbd3ee823886afed06e6c2eccfc29d612f7e6	2020-12-24 03:49:34 -08:00
Jerry Zhang	89b4899ea5	[quant][graphmode][fx] Standalone module support {input/output}_quantized_idxs (#49754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49754 This PR adds the support for {input/output}_quantized_idxs for standalone module. if input_quantized_idxs = [] and output_quantized_idxs = [], the standalone module will be expecting float input and produce float output, and will quantize the input and dequantize output internally if input_quantized_idxs = [0] and otuput_qiuantized_idxs = [0], the standalone module will be expecting quantized input and produce quantized output, the input will be quantized in the parent module, and output will be dequantized in the parent module as well, this is similar to current quantized modules like nn.quantized.Conv2d For more details, please see the test case Test Plan: python test/test_quantization.py TestQuantizeFx.test_standalone_module Imported from OSS Reviewed By: raghuramank100 Differential Revision: D25684692 fbshipit-source-id: 900360e01c0e35b26fe85f4a887dc1fd6f7bfb66	2020-12-23 22:36:57 -08:00
Rong Rong (AI Infra)	69b1373587	Revert D25692616: [pytorch][PR] [reland] Early terminate when CUDA assert were thrown Test Plan: revert-hammer Differential Revision: D25692616 (`e6a215592e`) Original commit changeset: 9c5352220d63 fbshipit-source-id: dade8068cad265d15ee908d98abe0de5b81a195d	2020-12-23 17:48:12 -08:00
Himangshu	9552cc65d4	Creation of test framework for Sparse Operators (#48488 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/48488 Reviewed By: ngimel Differential Revision: D25696487 Pulled By: mruberry fbshipit-source-id: dc4f57c6628f62b74dd321f3f6b0fff86f25b040	2020-12-23 15:42:26 -08:00
Mike Ruberry	5acc27c00a	Revert D25690129: [pytorch][PR] Added linalg.inv Test Plan: revert-hammer Differential Revision: D25690129 (`8554b58fbd`) Original commit changeset: edb2d03721f2 fbshipit-source-id: 8679ea18e637423d35919544d2b047a62ac3abd8	2020-12-23 15:27:52 -08:00
Jeffrey Wan	1833009202	Fix typo in complex autograd docs (#49755 ) Summary: Update complex autograd docs to fix a typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/49755 Reviewed By: mruberry Differential Revision: D25692649 Pulled By: soulitzer fbshipit-source-id: 43c2113b4c8f2d1828880102189a5a9b887dc784	2020-12-23 14:42:34 -08:00
Rong Rong (AI Infra)	e6a215592e	[reland] Early terminate when CUDA assert were thrown (#49799 ) Summary: this is a reland of https://github.com/pytorch/pytorch/issues/49527. fixed slow test not running properly in py36 because capture_output is introduced in py37. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49799 Reviewed By: janeyx99 Differential Revision: D25692616 Pulled By: walterddr fbshipit-source-id: 9c5352220d632ec8d7464e5f162ffb468a0f30df	2020-12-23 14:25:14 -08:00
Kshiteej K	3f4b98d568	[numpy] `torch.erfinv`: promote integer inputs to float (#49155 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49155 Reviewed By: ngimel Differential Revision: D25664234 Pulled By: mruberry fbshipit-source-id: 630fd1d334567d78c8130236a67dda0f5ec02560	2020-12-23 14:22:03 -08:00
Jianyu Huang	4d6110939a	[pt][quant] Make the CUDA fake quantize logic consistent with CPU fake quantize logic (#49808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49808 In PyTorch, it uses `dst = std::nearbyint(src * inv_scale) + zero_point` instead of the LEGACY `dst = std::nearbyint(src * inv_scale + zero_point)`. However, the CUDA implementation doesn't match this. This Diff makes the CPU and CUDA implementation consistent. - FBGEMM code pointer: https://github.com/pytorch/FBGEMM/blob/master/include/fbgemm/QuantUtils.h#L76-L80 - PyTorch code pointer: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/affine_quantizer.cpp#L306 Test Plan: CI Reviewed By: dskhudia Differential Revision: D25694235 fbshipit-source-id: 0a615e559132aafe18543deac1ea5028dd840cb9	2020-12-23 12:47:44 -08:00
Natalia Gimelshein	e163172904	removes more unused THC functions (#49788 ) Summary: per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/49788 Reviewed By: mruberry Differential Revision: D25693328 Pulled By: ngimel fbshipit-source-id: 244a096214d110e4c1a94f2847ff8457f1afb0d1	2020-12-23 12:38:20 -08:00
Ralf Gommers	d99a0c3b3e	Improve docs for scatter and gather functions (#49679 ) Summary: - Add warning about non-unique indices - And note that these functions don't broadcast - Add missing `torch.scatter` and `torch.scatter_add` doc entries - Fix parameter descriptions - Improve code examples to make indexing behaviour easier to understand Closes gh-48214 Closes gh-26191 Closes gh-37130 Closes gh-34062 xref gh-31776 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49679 Reviewed By: mruberry Differential Revision: D25693660 Pulled By: ngimel fbshipit-source-id: 4983e7b4efcbdf1ab9f04e58973b4f983e8e43a4	2020-12-23 12:23:15 -08:00
Richard Barnes	b3387139b4	Mod lists to neutral+descriptive terms in caffe2/docs (#49803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49803 Per "https://fb.workplace.com/groups/e/permalink/3320810064641820/" we can no longer use the terms "whitelist" and "blacklist", and editing any file containing them results in a critical error signal. Let's embrace the change. This diff changes "blacklist" to "blocklist" in a number of non-interface contexts (interfaces would require more extensive testing and might interfere with reading stored data, so those are deferred until later). Test Plan: Sandcastle Reviewed By: vkuzo Differential Revision: D25686924 fbshipit-source-id: 117de2ca43a0ea21b6e465cf5082e605e42adbf6	2020-12-23 11:37:11 -08:00
Ivan Yashchuk	8554b58fbd	Added linalg.inv (#48261 ) Summary: This PR adds `torch.linalg.inv` for NumPy compatibility. `linalg_inv_out` uses in-place operations on provided `result` tensor. I modified `apply_inverse` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_inv_out` but removing the error checks and device memory synchronization. I fixed `lda` (leading dimension parameter which is max(1, n)) in many places to handle 0x0 matrices correctly. Zero batch dimensions are also working and tested. Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48261 Reviewed By: ngimel Differential Revision: D25690129 Pulled By: mruberry fbshipit-source-id: edb2d03721f22168c42ded8458513cb23dfdc712	2020-12-23 11:29:00 -08:00
Akshit Khurana	370350c749	Preserve memory format in qconv op (#49533 ) Summary: * qconv used to return NHWC no matter the input format * this change returns NCHW format if the input was NCHW Pull Request resolved: https://github.com/pytorch/pytorch/pull/49533 Test Plan: pytest test/quantization/test_quantized_op.py::\ TestQuantizedConv::test_qconv2d_preserve_mem_format Fixes https://github.com/pytorch/pytorch/issues/47295 Reviewed By: kimishpatel Differential Revision: D25609205 Pulled By: axitkhurana fbshipit-source-id: 83f8ca4a1496a8a4612fc3da082d727ead257ce7	2020-12-23 10:58:57 -08:00
Stas Bekman	5171bd94d7	[lint doc] how to fix flake errors if pre-commit hook wasn't there (#49345 ) Summary: This PR adds instructions on what to do if one committed into a PR branch w/o having a pre-commit hook enabled and having CI report flake8 errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49345 Reviewed By: cpuhrsch Differential Revision: D25683167 Pulled By: soumith fbshipit-source-id: 3c45c866e1636c116d2cacec438d62c860e6b854	2020-12-23 09:17:40 -08:00
Yi Wang	55b431b17a	[Gradient Compression] Directly let world_size = group_to_use.size() (#49715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49715 Address the comment on https://github.com/pytorch/pytorch/pull/49417#discussion_r545388351 ghstack-source-id: 119049598 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D25673997 fbshipit-source-id: 44eb2540e5a77331c34ba503285cbd0bd63c2c0a	2020-12-22 23:24:54 -08:00
Yi Wang	88c33ff8ab	[Gradient Compression] Explicitly restrict the scope of torch.cuda.synchronize to the current device (#49711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49711 `torch.cuda.synchronize` uses the current device by default. Explicitly specify this device for better readability. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 119017654 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D25672267 fbshipit-source-id: 62a2266727a2ea76175f3c438daf20951091c771	2020-12-22 23:21:45 -08:00
Michael Carilli	ee271047b5	torch.utils.checkpoint.checkpoint + torch.cuda.amp (#49757 ) Summary: Adds a test to orphaned original PR (https://github.com/pytorch/pytorch/pull/40221). Should fix https://github.com/pytorch/pytorch/issues/49738 and https://github.com/pytorch/pytorch/issues/47183 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49757 Reviewed By: mruberry Differential Revision: D25689609 Pulled By: ngimel fbshipit-source-id: 0a6adc11eb98382048ef9a9775e185dcdeff6010	2020-12-22 22:25:11 -08:00
Jerry Zhang	f474ffa1a9	[quant][graphmode][fx] Change standalone module api (#49719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49719 We find there are multiple use cases for standalone module, one use case requires standalone module to produce a module that takes float Tensor as input and outputs a float Tensor, the other needs to produce a modulee that takes quantized Tensor as input and outputs a quantized Tensor. This is similar to `quantized_input_idxs` and `quantized_output_idxs` so we want to nest prepare_custom_config_dict in the standalone module configuration, for maximum flxibility we also include qconfig_dict for stand alone module as well in case user needs to have special qconfig_dict for the standalone module in the future. Changed from ```python prepare_custom_config_dict = { "standalone_module_name": ["standalone_module"], "standalone_module_class": [StandaloneModule] } ``` to ```python prepare_custom_config_dict = { "standalone_module_name": [("standalone_module", qconfig_dict1, prepare_custom_config_dict1)], "standalone_module_class": [(StandaloneModule, qconfig_dict2, prepare_custom_config_dict2)] } ``` The entries in the config are: 1. name/module_class 2. optional qconfig_dict, when it is None, we'll use {"": qconfig} where qconfig is the one from parent qconfig_dict 3. optional prepare_custom_config_dict, when it is None, we'll use default value of prepare_custom_config_dict for prepare API (None) Test Plan: python test/test_quantization.py TestQuantizeFx.test_standalone_module Imported from OSS Reviewed By: raghuramank100 Differential Revision: D25675704 fbshipit-source-id: 0889f519a3e55a7a677f0e2db4db9a18d87a93d4	2020-12-22 21:58:40 -08:00
Yi Wang	af1b636b89	[Gradient Compression] Change wait() to value() in some callbacks of PowerSGD communication hook (#49709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49709 Since wait() has already been called in the return statements of the precursor callbacks, no need to wait again. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 119015237 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D25672068 fbshipit-source-id: da136327db4c4c0e3b846ba8d6885629f1044374	2020-12-22 21:37:04 -08:00
Joel Schlosser	68d438c9da	Add PixelUnshuffle (#49334 ) Summary: Adds an implementation of `torch.nn.PixelUnshuffle` as the inverse operation of `torch.nn.PixelShuffle`. This addresses https://github.com/pytorch/pytorch/issues/2456 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49334 Test Plan: ``` # Unit tests. python test/test_nn.py TestNN.test_pixel_shuffle_unshuffle # Module test. python test/test_nn.py TestNN.test_PixelUnshuffle # C++ API tests. build/bin/test_api # C++ / python parity tests. python test/test_cpp_api_parity.py # JIT test. python test/test_jit.py TestJitGeneratedFunctional.test_nn_pixel_unshuffle # Override tests. python test/test_overrides.py # Type hint tests. python test/test_type_hints.py ``` Screenshots of rendered docs: <img width="876" alt="Screen Shot 2020-12-18 at 12 19 05 PM" src="https://user-images.githubusercontent.com/75754324/102642255-6b07bb00-412b-11eb-88fa-e53e7e8ba720.png"> <img width="984" alt="Screen Shot 2020-12-18 at 12 19 26 PM" src="https://user-images.githubusercontent.com/75754324/102642276-70fd9c00-412b-11eb-8548-445082a2db02.png"> <img width="932" alt="Screen Shot 2020-12-18 at 12 19 34 PM" src="https://user-images.githubusercontent.com/75754324/102642704-19abfb80-412c-11eb-9546-95bdd1c3cf22.png"> <img width="876" alt="Screen Shot 2020-12-22 at 12 51 36 PM" src="https://user-images.githubusercontent.com/75754324/102918259-986aa680-4454-11eb-99e7-a0b4c8b3e283.png"> <img width="869" alt="Screen Shot 2020-12-22 at 12 51 44 PM" src="https://user-images.githubusercontent.com/75754324/102918274-9ef91e00-4454-11eb-94bb-91b58aff47d3.png"> Reviewed By: mruberry Differential Revision: D25401439 Pulled By: jbschlosser fbshipit-source-id: 209d92ce7295e51699e83616d0c62170a7ce75c8	2020-12-22 20:14:55 -08:00
Kshiteej K	461aafe389	[numpy] `torch.angle`: promote integer inputs to float (#49163 ) Summary: BC-Breaking Note: This PR updates PyTorch's angle operator to be consistent with NumPy's. Previously angle would return zero for all floating point values (including NaN). Now angle returns `pi` for negative floating point values, zero for non-negative floating point values, and propagates NaNs. PR Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 TODO: * [x] Add BC-Breaking Note (Prev all real numbers returned `0` (even `nan`)) -> Fixed to match the correct behavior of NumPy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49163 Reviewed By: ngimel Differential Revision: D25681758 Pulled By: mruberry fbshipit-source-id: 54143fe6bccbae044427ff15d8daaed3596f9685	2020-12-22 18:43:14 -08:00
skyline75489	46b83212d1	Remove unused six code for Python 2/3 compatibility (#48077 ) Summary: This is basically a reborn version of https://github.com/pytorch/pytorch/issues/45254 . Ref: https://github.com/pytorch/pytorch/issues/42919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48077 Reviewed By: ngimel Differential Revision: D25687042 Pulled By: bugra fbshipit-source-id: 05f20a6f3c5212f73d0b1505b493b720e6cf74e5	2020-12-22 18:07:08 -08:00
Natalia Gimelshein	abacf27038	Revert D25623219: [pytorch][PR] early terminate when CUDA assert were thrown Test Plan: revert-hammer Differential Revision: D25623219 (`be091600ed`) Original commit changeset: 1b414623ecce fbshipit-source-id: ba304c57eea29d19550ac1e864ccfcd0cec68bec	2020-12-22 17:57:19 -08:00
Will Feng (DPER)	010b9c52f4	Skip None submodule during JIT-tracing (#49765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49765 Some PyTorch module can have None as submodule, which causes the following error in JIT-tracing: Repro script: ``` import torch class TestModule(torch.nn.Module): def __init__(self): super().__init__() self.submod = torch.nn.Linear(3, 4) self.submod = None def forward(self, inputs): return inputs m = TestModule() tm = torch.jit.trace(m, torch.tensor(1.)) ``` Error: ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/data/miniconda3/envs/master_nightly/lib/python3.7/site-packages/torch/jit/_trace.py", line 742, in trace _module_class, File "/data/miniconda3/envs/master_nightly/lib/python3.7/site-packages/torch/jit/_trace.py", line 928, in trace_module module = make_module(mod, _module_class, _compilation_unit) File "/data/miniconda3/envs/master_nightly/lib/python3.7/site-packages/torch/jit/_trace.py", line 560, in make_module return _module_class(mod, _compilation_unit=_compilation_unit) File "/data/miniconda3/envs/master_nightly/lib/python3.7/site-packages/torch/jit/_trace.py", line 1039, in __init__ submodule, TracedModule, _compilation_unit=None File "/data/miniconda3/envs/master_nightly/lib/python3.7/site-packages/torch/jit/_trace.py", line 560, in make_module return _module_class(mod, _compilation_unit=_compilation_unit) File "/data/miniconda3/envs/master_nightly/lib/python3.7/site-packages/torch/jit/_trace.py", line 988, in __init__ assert isinstance(orig, torch.nn.Module) AssertionError ``` This pull request changes the JIT-tracing logic to skip the None submodule when tracing. Test Plan: `buck test mode/dev //caffe2/test:jit -- test_trace_skip_none_submodule` Reviewed By: wanchaol Differential Revision: D25670948 fbshipit-source-id: 468f42f5ddbb8fd3de06d0bc224dc67bd7172358	2020-12-22 17:45:35 -08:00
Riley Dulin	62f9b03b7c	[lint] Apply whitespace linter to all gradle files Summary: Run whitespace and license linters on gradle build files. Reviewed By: zertosh Differential Revision: D25687355 fbshipit-source-id: 44330daac7582fed6c05680bffc74e855a9b1dbc	2020-12-22 17:01:51 -08:00
Guilherme Leobas	27f0dd36d9	add type annotations to torch.nn.parallel._functions (#49687 ) Summary: Closes gh-49686 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49687 Reviewed By: ngimel Differential Revision: D25680210 Pulled By: zou3519 fbshipit-source-id: 221f7c9a4d3a6213eac6983030b0be51ee1c5b60	2020-12-22 16:56:16 -08:00
Vasiliy Kuznetsov	de07d07600	fx quant: improve types on convert (#49688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49688 Adds more types on FX quantize convert, fixing things as they are uncovered by mypy. Test Plan: ``` mypy torch/quantization python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25667231 fbshipit-source-id: 262713c6ccb050a05e3119c0457d0335dde82d25	2020-12-22 16:53:23 -08:00
Vasiliy Kuznetsov	19f972b696	fx quant: do not observe bias on F.linear (#49628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49628 Ensures that linear bias is not observed in a `F.linear` call. This should be a small speedup in PTQ, and will change numerics (in a good way) for QAT if someone is using `F.linear`. Note: the implementation is slightly more verbose compared to conv because bias is a keyword argument in Linear. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_linear_functional_bias_not_observed ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25653532 fbshipit-source-id: c93501bf6b55cbe4a11cfdad6f79313483133a39	2020-12-22 16:53:21 -08:00
Vasiliy Kuznetsov	c3a7591cef	fx quant: do not observe bias on F.conv (#49623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49623 (not ready for review) Ensures that conv bias is not observed in a `F.conv{n}d` call. Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25652856 fbshipit-source-id: 884f87be1948d3e049a557d79bec3c90aec34340	2020-12-22 16:49:50 -08:00
Tyler Davis	b414123264	Update `is_floating_point()` docs to mention bfloat16 (#49611 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49610 . Explicitly mentions that `is_floating_point()` will return `True` if passed a `bfloat16` tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49611 Reviewed By: mrshenli Differential Revision: D25660723 Pulled By: VitalyFedyunin fbshipit-source-id: 04fab2f6c1c5c2859c6efff1976a92a676b9efa3	2020-12-22 15:54:27 -08:00
James Reed	67d0c18241	[FX] Try to make it more clear that _update_args_kwargs should not be called (#49745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49745 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D25682177 Pulled By: jamesr66a fbshipit-source-id: 4910577541c4d41e1be50a7aa061873f061825b6	2020-12-22 15:20:02 -08:00
kshitij12345	2780400904	[numpy] Add `torch.xlogy` (#48777 ) Summary: Reference https://github.com/pytorch/pytorch/issues/38349 Fixes https://github.com/pytorch/pytorch/issues/22656 TODO: * [x] Add docs * [x] Add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/48777 Reviewed By: ngimel Differential Revision: D25681346 Pulled By: mruberry fbshipit-source-id: 369e0a29ac8a2c44de95eec115bf75943fe1aa45	2020-12-22 15:05:59 -08:00
Rong Rong (AI Infra)	be091600ed	early terminate when CUDA assert were thrown (#49527 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49019 I marked the test_testing function as slow since it took ~1 minute to finish the subprocess test suite. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49527 Reviewed By: malfet Differential Revision: D25623219 Pulled By: walterddr fbshipit-source-id: 1b414623ecce14aace5e0996d5e4768a40e12e06	2020-12-22 14:33:41 -08:00
Nikita Shulga	9b6fb856e8	Update NNPACK (#49749 ) Summary: This update enables NNPACK cross compilation on MacOS Pull Request resolved: https://github.com/pytorch/pytorch/pull/49749 Reviewed By: janeyx99 Differential Revision: D25683056 Pulled By: malfet fbshipit-source-id: c7a6b7f49d61a9a0697d67f6319f06bd252b66a5	2020-12-22 14:20:37 -08:00
Rong Rong (AI Infra)	6f9532dd53	only upload s3 stats on master, nightly, and release branch (#49645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49645 Reviewed By: malfet Differential Revision: D25665851 Pulled By: walterddr fbshipit-source-id: 1cf50f6e3657f70776aaf3c5d3823c8a586bf22d	2020-12-22 14:15:18 -08:00
Natalia Gimelshein	04e04abd06	remove unused THCBlas (#49725 ) Summary: removes unused THCBlas, call `at::cuda::blas::gemm` directly where needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49725 Reviewed By: mruberry Differential Revision: D25680831 Pulled By: ngimel fbshipit-source-id: d826f3f558b156f45f2a4864daf3f6d086bda78c	2020-12-22 13:55:22 -08:00
pbialecki	1451d84766	Minor doc fix: change truncating to rounding in TF32 docs (#49625 ) Summary: Minor doc fix in clarifying that the input data is rounded not truncated. CC zasdfgbnm ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/49625 Reviewed By: mruberry Differential Revision: D25668244 Pulled By: ngimel fbshipit-source-id: ac97e41e0ca296276544f9e9f85b2cf1790d9985	2020-12-22 13:46:33 -08:00
Alex Suhan	21398fb6cb	Fix get_overlap_status for tensors without storage (#49638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49638 Reviewed By: ngimel Differential Revision: D25681908 Pulled By: asuhan fbshipit-source-id: 2ea8623614f2f0027f6437cf2819ba1657464f54	2020-12-22 12:38:59 -08:00
albanD	c23808d8e8	Reland: Add base forward grad logic (#49734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49734 RFC: https://github.com/pytorch/rfcs/pull/11 This PR add the basic logic to handle forward grad as dual Tensors. It contains the following: - Mechanism to save dual state on a Tensor and clear it up when the dual level ends - C++ and python user facing API - Updated view system that is able to track both forward and backward views The current PR has the following limitations: - Extensive tests are in the next PR in the stack as formulas are needed to write full tests. - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack) - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR. - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise. - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise. Reading guide: - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view. - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development. - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677) - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243) - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9) - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d) - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8) - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3) - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433) - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D25678797 Pulled By: albanD fbshipit-source-id: 3d58550c11b5f58b9b73fd30596d042b857fb9dd	2020-12-22 12:11:27 -08:00
Oleg Khabinov	eabe05ab72	[onnxifi] Get rid of class member (#49380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49380 Couldn't resist removing a class member that is only used in one function. Reviewed By: yinghai Differential Revision: D25547366 fbshipit-source-id: 74e61c6a0068566fb7956380862999163e7e94bf	2020-12-22 12:02:52 -08:00
Nikita Shulga	7b4a7661d6	Make PyTorch partially cross-compilable for Apple M1 (#49701 ) Summary: Update CPUINFO to include https://github.com/pytorch/cpuinfo/pull/51 Update sleef to include https://github.com/shibatch/sleef/pull/376 Modify aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt to recognize CMAKE_OSX_ARCHITECTURES Pull Request resolved: https://github.com/pytorch/pytorch/pull/49701 Test Plan: `cmake -DCMAKE_OSX_ARCHITECTURES=x86_64 -DPYTHON_EXECUTABLE=/usr/bin/python3 -DUSE_XNNPACK=NO -DBUILD_TEST=YES .. -G Ninja; ninja basic` finishes successfully on Apple M1 Reviewed By: janeyx99 Differential Revision: D25669219 Pulled By: malfet fbshipit-source-id: 5ee36b64e3a7ac76448f2a300ac4993375a26de5	2020-12-22 09:33:12 -08:00
Jeff Daily	42b5601f30	[ROCm] add 4.0 to nightly builds (#49632 ) Summary: Depends on https://github.com/pytorch/builder/pull/614. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49632 Reviewed By: ngimel Differential Revision: D25665880 Pulled By: walterddr fbshipit-source-id: b37a55b7e3028648453b422683fa4a72e0ee04a4	2020-12-22 08:41:13 -08:00
anjali411	4d9d03fe47	Complex backward for torch.sqrt (#49461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49461 resolves https://github.com/pytorch/pytorch/issues/48398 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D25589454 Pulled By: anjali411 fbshipit-source-id: 46e9f913c8ab3e18c98d6f623b2394044b6fe079	2020-12-22 07:58:42 -08:00
kshitij12345	2df249f0ab	[fix] inplace remainder/% (#49390 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49214 BC-Breaking Before this PR, `%=` didn't actually do the operation inplace and returned a new tensor. After this PR, `%=` operation is actually inplace and the modified input tensor is returned. Before PR, ```python >>> import torch >>> a = torch.tensor([11,12,13]) >>> id(a) 139627966219328 >>> a %= 10 >>> id(a) 139627966219264 ``` After PR, ```python >>> import torch >>> a = torch.tensor([11,12,13]) >>> id(a) 139804702425280 >>> a %= 10 >>> id(a) 139804702425280 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49390 Reviewed By: izdeby Differential Revision: D25560423 Pulled By: zou3519 fbshipit-source-id: 2b92bfda260582aa4ac22c4025376295e51f854e	2020-12-22 07:30:03 -08:00
Richard Zou	dfb7520c47	NewModuleTest: Don't call both check_jacobian and gradcheck (#49566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49566 Fixes #49422. check_jacobian and gradcheck do roughly the same thing: they both compute an analytic jacobian and a numeric jacobian and check that they are equivalent. Furthermore, NewModuleTest will (by default) call both check_jacobian and gradcheck, leading to some redundant checks that waste CI resources. However, there is one subtle difference: `check_jacobian` can handle the special case where a Module takes in dense inputs and dense parameters but returns sparse gradients, but that is not something gradcheck can handle. This is only used in the tests for nn.Embedding and nn.EmbeddingBag. This PR does the following: - have NewModuleTest call gradcheck instead of check_jacobian by default - add a new "has_sparse_gradients" flag to NewModuleTest. These are True for the nn.Embedding and nn.EmbeddingBag sparse gradient tests. If `has_sparse_gradients` is True, then we call check_jacobian, otherwise, we call gradcheck. - Kills the "jacobian_input" flag. This flag was used to tell NewModuleTest to not attempt to compute the jacobian for the inputs to the module. This is only desireable if the input to the module isn't differentiable and was only set in the case of nn.Embedding / nn.EmbeddingBag that take a LongTensor input. `gradcheck` handles these automatically by not checking gradients for non-differentiable inputs. Test Plan: - Code reading - run test_nn.py Reviewed By: albanD Differential Revision: D25622929 Pulled By: zou3519 fbshipit-source-id: 8d831ada98b6a95d63f087ea9bce1b574c996a22	2020-12-22 06:48:31 -08:00
Yi Wang	c348faedc4	[Gradient Compression] Warm-start of PowerSGD (#49451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49451 Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 119014132 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D25583086 fbshipit-source-id: a757df3c4cfcc0ead4647f7de2f43198f1e063ee	2020-12-22 01:19:14 -08:00
Martin Yuan	590e7168ed	[PyTorch] Remove direct reference to native symbols in sparse related non-native codes (#49721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49721 As a refactor effort of per-app selective build, we are decoupling ATen/native from the rest of aten (D25413998). All symbols of ATen/native could only be referenced through dispatcher (https://github.com/pytorch/pytorch/issues/48684). This diff is to decouple the native reference recently introduced for sparse tensors. ghstack-source-id: 119028080 Test Plan: CI Reviewed By: dhruvbird, ngimel Differential Revision: D25675711 fbshipit-source-id: 381cbb3b361ee41b002055399d4996a9ca21377c	2020-12-21 22:16:20 -08:00
Hao Lu	d54cf2aa27	[pt][ATen] Optimize bmm (#49506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49506 - Get rid of expensive stuff like `TensorArg`, `checkBackend`, `checkSize`, and `TensorAccessor`. - Add `checkDim` that does not require creating a `TensorArg` which incurs a refcount bump - Avoid unnecessary calls to `torch.select`, which goes through the dispatcher in the cases we care about, with mat1 and mat2 not permuted or permuted with dims = [0, 2, 1]. The pt version of bmm supports crazy cases like when the inputs are permuted with dims = [1, 2, 0], which is uncommon in SparseNNs. Test Plan: Unit test: ``` buck test //caffe2/test:linalg ``` Benchmark with the adindexer model: ``` Before: I1216 14:02:24.155516 2595800 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0847197. Iters per second: 11803.6 After: I1216 14:02:26.583878 2595939 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.082051. Iters per second: 12187.5 ``` Reviewed By: bwasti Differential Revision: D25577574 fbshipit-source-id: 8aba69b950e7b4d9d1b14ba837931695a908c068	2020-12-21 22:08:39 -08:00
James Reed	11598da229	[FX] Fix python code having spurious newlines from placeholders (#49720 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49720 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D25675825 Pulled By: jamesr66a fbshipit-source-id: a9028acad9c8feb877fff5cd09aedabed52a3f4b	2020-12-21 21:41:24 -08:00
Vasiliy Kuznetsov	edce6b138d	fx quant: fix types on _find_quants (#49616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49616 Add types to `_find_quants` I/O and fix resulting errors, needed for an upcoming bug fix. Test Plan: ``` mypy torch/quantization python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25645719 fbshipit-source-id: 4bf788b55fd4fd086c83a4438b9c2df22b9cff49	2020-12-21 21:05:57 -08:00
Vasiliy Kuznetsov	7c90b20f38	fx quant: add types to observed_module.py (#49607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49607 Readability Test Plan: ``` mypy torch/quantization ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25643895 fbshipit-source-id: b4b8741b07ac4827c3bacd2084df81fbfdd0c2d5	2020-12-21 21:05:53 -08:00
Vasiliy Kuznetsov	9d5d193704	fx quant: types for fusion_patterns.py (#49606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49606 Adds more types, for readability. Test Plan: ``` mypy torch/quantization ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25643894 fbshipit-source-id: 4aad52fe4e59ad74b6e0e3acd0f98fba91561a29	2020-12-21 21:05:49 -08:00
Vasiliy Kuznetsov	ab2194f912	unbreak mypy torch/quantization (#49549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49549 Somehow `mypy torch/quantization` got broken in the past couple of days: https://gist.github.com/vkuzo/07af454246f0a68e6fa8929beeec7e0d . I didn't see any relevant PRs other than https://github.com/pytorch/pytorch/pull/47725, which doesn't seem related. The error doesn't seem real, as the arguments to `_cudnn_rnn_flatten_weight` seem correct. For now, ignoring the failure so we have a clean `mypy` run on `torch/quantization`. Test Plan: ``` mypy torch/quantization ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25616972 fbshipit-source-id: 46c207fe1565ec949c0b1f57d6cd0c93f627e6bd	2020-12-21 21:02:48 -08:00
Mikhail Zolotukhin	a5b27d7a31	[TensorExpr] Move `SimpleIREval` implementation from .h to .cpp. (#49697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49697 Mostly mechanical move. This refactoring helps to hide unnecessary details from the SimpleIREval interface and make it more similar to a pure 'codegen'. Test Plan: Imported from OSS Reviewed By: nickgg Differential Revision: D25668696 Pulled By: ZolotukhinM fbshipit-source-id: 423247bfcdfa88403e8ec92152f00110bb9da19c	2020-12-21 20:20:15 -08:00
Mikhail Zolotukhin	e1f73ced1e	[TensorExpr] Change `LoopNest::vectorize` to accept `For` instead of `Stmt`. (#49696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49696 And make it static. Test Plan: Imported from OSS Reviewed By: navahgar, nickgg Differential Revision: D25668695 Pulled By: ZolotukhinM fbshipit-source-id: 8d7fb507d6f3beca70e868d9e0f4c46247311a99	2020-12-21 20:17:20 -08:00
Walter Shen	f5178bf151	Revert D25607503: Add base forward grad logic Test Plan: revert-hammer Differential Revision: D25607503 (`fdf02eff3d`) Original commit changeset: f1396290de1d fbshipit-source-id: 057206e28ff48ee288856adfe3ca577d4880789f	2020-12-21 19:56:28 -08:00
Jane Xu	aa2782b9ec	replacing THC_CLASS and THC_API with TORCH_CUDA_API (#49690 ) Summary: THC_API and THC_CLASS were leftover macros from before the consolidation of caffe2, aten, and torch. Now that they're combined, these are misleading and should just be TORCH_CUDA_API. The only file I manually edited was `THCGeneral.h.in`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49690 Reviewed By: malfet Differential Revision: D25667982 Pulled By: janeyx99 fbshipit-source-id: 2fdf7912b2a0537b7c25e1fed21cc301fa59d57f	2020-12-21 19:21:22 -08:00
Howard Huang	7eb392d73f	Fix TCPStore type coercion (#49685 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49052 The TCPStore example with 4 arguments was working because the datetime value was being implicitly converted to a bool. Modified the pybind definition and updated documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49685 Test Plan: ``` import torch.distributed as dist from datetime import timedelta dist.TCPStore("127.0.0.1", 0, True, timedelta(seconds=30)) ``` Now fails with ``` TypeError: __init__(): incompatible constructor arguments. The following argument types are supported: 1. torch._C._distributed_c10d.TCPStore(host_name: str, port: int, world_size: int, is_master: bool, timeout: datetime.timedelta = datetime.timedelta(seconds=300)) Invoked with: '127.0.0.1', 0, True, datetime.timedelta(seconds=30) ``` Reviewed By: mrshenli, ngimel Differential Revision: D25668021 Pulled By: H-Huang fbshipit-source-id: ce40b8648d0a414f0255666fbc680f1a66fae090	2020-12-21 19:04:15 -08:00
Pritam Damania	1043ecf68d	Use store based barrier only for certain store types. (#49694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49694 The store based barrier introduced in https://github.com/pytorch/pytorch/pull/49419 broke for certain store types. This is a quick fix to resolve the issues for other store types. ghstack-source-id: 119006874 Test Plan: 1) waitforbuildbot Reviewed By: ppwwyyxx, rohan-varma Differential Revision: D25668404 fbshipit-source-id: 751fb8b229ad6f50ee9c50f63a70de5a91c9eda5	2020-12-21 18:41:28 -08:00
Iurii Zdebskyi	7e1356db7b	Move device guard from MultiTensorApply.cuh (#46664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46664 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D24453343 Pulled By: izdeby fbshipit-source-id: b82a658af50ededc985195ed02dbf60e792c7a13	2020-12-21 18:08:54 -08:00
Bradley Davis	5b163e230a	[jit][tracer] allow traced modules to return dicts with tuple values when strict=False (#49568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49568 We have some inference use cases where the expected output of a module is of the form `{"key": (t1, t1)}` and are currently jit tracing the modules until we can reach jit script compatibility. Test Plan: buck test mode/dev caffe2/test:jit -- 'test_trace_returning_complex_dict' Reviewed By: houseroad Differential Revision: D25624152 fbshipit-source-id: 5adef0e3c9d54cd31ad5fece4ac6530d541fd673	2020-12-21 15:35:46 -08:00
Raghavan Raman	46c9a0e679	Do not use negative values in GCD computation. (#49379 ) Summary: GCD should always return positive integers. When negative values are used, we hit a corner case that results in an infinite recursion during simplification. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49379 Reviewed By: ezyang Differential Revision: D25597115 Pulled By: navahgar fbshipit-source-id: b0e8ac07ee50a5eb775c032628d4840df7424927	2020-12-21 15:08:43 -08:00
albanD	fdf02eff3d	Add base forward grad logic (#49097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49097 RFC: https://github.com/pytorch/rfcs/pull/11 This PR add the basic logic to handle forward grad as dual Tensors. It contains the following: - Mechanism to save dual state on a Tensor and clear it up when the dual level ends - C++ and python user facing API - Updated view system that is able to track both forward and backward views The current PR has the following limitations: - Extensive tests are in the next PR in the stack as formulas are needed to write full tests. - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack) - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR. - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise. - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise. Reading guide: - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view. - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development. - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677) - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243) - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9) - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d) - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8) - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3) - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433) - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030) Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D25607503 Pulled By: albanD fbshipit-source-id: f1396290de1d75760f3d380c43cdd56e86fa6099	2020-12-21 14:39:43 -08:00
Nikita Shulga	befe337072	Fix test_cuda_init_race skip rules (#49693 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49432 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49693 Reviewed By: walterddr, janeyx99 Differential Revision: D25668027 Pulled By: malfet fbshipit-source-id: 802cbd39e4ebe585709179f332b680f5f7978814	2020-12-21 14:30:00 -08:00
kiyosora	983bfc79ed	Enable product for bool tensor (#48637 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48637 Reviewed By: mrshenli Differential Revision: D25658596 Pulled By: mruberry fbshipit-source-id: ff3ada74b6d281c8e4753ed38339a1c036f722ee	2020-12-21 14:11:26 -08:00
Lu Fang	49c9994fb7	Clean up backward compatibility skip list (#49691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49691 Quite a few stale items, let's make the list short. Test Plan: oss ci Reviewed By: hl475 Differential Revision: D25667464 fbshipit-source-id: cff1be8b5e0068470b3f621acf6bf4fbd414233e	2020-12-21 13:40:30 -08:00
Protonu Basu	92f37ae263	change block codegen to handle new inlining in NNC (#47687 ) Summary: minor changes to block codegen to handle new inlining in NNC. For Block code generation we need to delay inlining before collecting dimension data about the tensors. We need to collect the dimension of the tensor before they were flattened. We don't have this information after the inlining pass, so for Block we run inling after we have collected this data using `CreateBufferMap` analysis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47687 Reviewed By: ZolotukhinM Differential Revision: D24864869 Pulled By: protonu fbshipit-source-id: 9574c0599f7d959a1cf0eb49d4e3e541cbe9b1d3	2020-12-21 13:36:25 -08:00
Hui Guo	476cabdfff	added macros in jit logging to check whether loggings are enabled; replaced similar checks in LLVM codegen with such macros (#49121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49121 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25445971 Pulled By: huiguoo fbshipit-source-id: 980775a94159aa0b3b66fae938962761b38703d5	2020-12-21 13:01:22 -08:00
Hui Guo	aebb7d1836	converted current debugging statements in LLVM codegen to jit-logging statements #48771 (#49040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49040 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25407356 Pulled By: huiguoo fbshipit-source-id: 1c1f893ed8d0877bee27e9a673a5dce2203c2bad	2020-12-21 12:58:12 -08:00
Raghuraman Krishnamoorthi	f7a085af98	Dynamic GRU quantization support (#49448 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49448 ghstack-source-id: 118982171 Test Plan: buck test caffe2/test:quantization -- 'test_qlstmGRU $quantization\.test_quantized_op\.TestDynamicQuantizedRNNOp$' --print-passing-details buck test caffe2/test:quantization -- 'test_quantized_rnn $quantization\.test_quantize\.TestPostTrainingDynamic$' --print-passing-details buck test caffe2/test:quantization -- 'test_qrnncell $quantization\.test_quantized_op\.TestDynamicQuantizedRNNOp$' --run-disabled --print-passing-details Reviewed By: vkuzo Differential Revision: D25579815 fbshipit-source-id: 413cc8888eb8058230b94c9576d2fa54b0ed1416	2020-12-21 12:36:59 -08:00
Philip Meier	a84b93a6f8	add close() method to tqdm mock (#46040 ) Summary: In `torchvision` we use [`torch.hub.tqdm`](`2cc20d7485/torchvision/datasets/utils.py (L11)`) to display the dataset download. One of our methods uses [`tqdm().close()`](`2cc20d7485/torchvision/datasets/utils.py (L188)`), which is [not included in the mock](`283ae1998c/torch/hub.py (L22-L49)`). This PR adds a `close()` method to the mock. Cc fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/46040 Reviewed By: mrshenli Differential Revision: D25619429 Pulled By: fmassa fbshipit-source-id: a137f2417d8a47923ccb1ec6b7d5298c1545245c	2020-12-21 12:24:30 -08:00
Nikita Shulga	12942ea52b	[BE] Introduce `set_cwd` context manager (#49657 ) Summary: Used to temporarily change working directory, but restore it even if exception is raised Use it in test_type_hints and during code coverage collection Pull Request resolved: https://github.com/pytorch/pytorch/pull/49657 Reviewed By: walterddr Differential Revision: D25660543 Pulled By: malfet fbshipit-source-id: 77f08d57e4b60b95daa4068d0dacf7c25f978526	2020-12-21 12:08:48 -08:00
Alexander	44ce0b8883	Sparse-sparse matrix multiplication (CPU/CUDA) (#39526 ) Summary: This PR implements matrix multiplication support for 2-d sparse tensors using the COO sparse format. The current implementation of `torch.sparse.mm` support this configuration, `torch.sparse.mm(sparse_matrix1, sparse_matrix2.to_dense())`, but this could spend a lot of memory when sparse_matrix2's shape is large. This implementation extends `torch.sparse.mm` function to support `torch.sparse.mm(sparse_matrix1, sparse_matrix2)` Resolves #[20988](https://github.com/pytorch/pytorch/issues/20988) for CPU/CUDA. - [x] sparse matmul - [x] CPU/CUDA C++ implementation - [x] unittests - [x] update torch.sparse.mm documentation - [x] autograd support The CPU sparse-sparse matmul was implemented taking as a reference this work "Sparse Matrix Multiplication Package (SMMP)". The GPU sparse-sparse matmul is based on cuSparse, there is specific code for CUSPARSE when CUSPARSE_VERSION >= 11 and old version of CUSPARSE. Both CPU/CUDA rely on the sparse-sparse matmul algorithm using the CSR indices format as it is one of the fastest algorithm. Here it is the latest benchmark (script is here) results for torch.sparse.mm (CUDA) and torch.sparse.mm (CPU) and scipy, values are float32 scalars: size \| density \| sparse.mm(CUDA) \| sparse.mm(CPU) \| scipy_coo_matmul -- \| -- \| -- \| -- \| -- (32, 10000) \| 0.01 \| 822.7 \| 79.4 \| 704.1 (32, 10000) \| 0.05 \| 1741.1 \| 402.6 \| 1155.3 (32, 10000) \| 0.1 \| 2956.8 \| 840.8 \| 1885.4 (32, 10000) \| 0.25 \| 6417.7 \| 2832.3 \| 4665.2 (512, 10000) \| 0.01 \| 1010.2 \| 3941.3 \| 26937.7 (512, 10000) \| 0.05 \| 2216.2 \| 26903.8 \| 57343.7 (512, 10000) \| 0.1 \| 4868.4 \| 87773.7 \| 117477.0 (512, 10000) \| 0.25 \| 16639.3 \| 608105.0 \| 624290.4 (1024, 10000) \| 0.01 \| 1224.8 \| 13088.1 \| 110379.2 (1024, 10000) \| 0.05 \| 3897.5 \| 94783.9 \| 236541.8 (1024, 10000) \| 0.1 \| 10559.1 \| 405312.5 \| 525483.4 (1024, 10000) \| 0.25 \| 57456.3 \| 2424337.5 \| 2729318.7 A new backward algorithm was implemented using only `sparse @ sparse` and `sparse_mask` operations. Here is some benchmarking: ``` [------------------------- sparse.mm-backward -------------------------] \| sparse.backward \| dense.backward ----------------------------------------------------------------------- (32, 10000) \| 0.01 \| 13.5 \| 2.4 (32, 10000) \| 0.05 \| 52.3 \| 2.4 (512, 10000) \| 0.01 \| 1016.8 \| 491.5 (512, 10000) \| 0.05 \| 1604.3 \| 492.3 (1024, 10000) \| 0.01 \| 2384.1 \| 1963.7 (1024, 10000) \| 0.05 \| 3965.8 \| 1951.9 ``` I added new benchmark tests. Now I am using a real dataset used in recent studies [1, 2] with different sparsity levels. ``` [---------------------------------- matmul ---------------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: ------------------------------------------------------------------ (cpu) torch \| 5.4 \| 5.4 \| 5.2 \| 5.3 \| 5.3 \| 5.4 torch.sparse \| 122.2 \| 51.9 \| 27.5 \| 11.4 \| 4.9 \| 1.8 scipy \| 150.1 \| 87.4 \| 69.2 \| 56.8 \| 38.4 \| 17.1 (cuda) torch \| 1.3 \| 1.1 \| 1.1 \| 1.1 \| 1.1 \| 1.1 torch.sparse \| 20.0 \| 8.4 \| 5.1 \| 2.5 \| 1.5 \| 1.1 [----------------------------------- backward -----------------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: ----------------------------------------------------------------------- (cpu) torch \| 17.7 \| 17.9 \| 17.7 \| 17.7 \| 17.6 \| 17.9 torch.sparse \| 672.9 \| 432.6 \| 327.5 \| 230.8 \| 176.7 \| 116.7 (cuda) torch \| 3.8 \| 3.6 \| 3.5 \| 3.5 \| 3.6 \| 3.5 torch.sparse \| 68.8 \| 46.2 \| 35.6 \| 24.2 \| 17.8 \| 11.9 Times are in milliseconds (ms). ``` In summary, I can say that the new `sparse @ sparse` backward algorithm is better as it is more about saving space than performance. Moreover, it is better than other options tested before. ## References 1. Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen. Sparse GPU Kernels for Deep Learning. Proceedings of the International Conference for High Performance Computing, 2020. [https://github.com/google-research/google-research/tree/master/sgk](https://github.com/google-research/google-research/tree/master/sgk) 2. Trevor Gale, Erich Elsen, Sara Hooker. The State of Sparsity in Deep Neural Networks. [https://github.com/google-research/google-research/tree/master/state_of_sparsity](https://github.com/google-research/google-research/tree/master/state_of_sparsity) Pull Request resolved: https://github.com/pytorch/pytorch/pull/39526 Reviewed By: mruberry Differential Revision: D25661239 Pulled By: ngimel fbshipit-source-id: b515ecd66d25f347d637e159d51aa45fb43b6938	2020-12-21 11:53:55 -08:00
Xiong Wei	3779bdec56	Implementing NumPy-like function torch.broadcast_to (#48997 ) Summary: Related https://github.com/pytorch/pytorch/issues/38349 Implement NumPy-like function `torch.broadcast_to` to broadcast the input tensor to a new shape. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48997 Reviewed By: anjali411, ngimel Differential Revision: D25663937 Pulled By: mruberry fbshipit-source-id: 0415c03f92f02684983f412666d0a44515b99373	2020-12-21 11:24:50 -08:00
Nick Gibson	db2e9c1e7f	[NNC] Intermediate allocs flattened and dependency support (#49554 ) Summary: Makes two changes in NNC for intermediate buffer allocations: 1. Flattens dimensions of buffers allocated in LoopNest::prepareForCodegen() to match their flattened usages. 2. Adds support for tracking memory dependencies of Alloc/Free to the MemDependencyChecker, which will allow us to check safety of accesses to intermediate buffers (coming in a future diff). I didn't add any new tests as the mem dependency checker tests already cover it pretty well, particularly the GEMM test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49554 Reviewed By: VitalyFedyunin Differential Revision: D25643133 Pulled By: nickgg fbshipit-source-id: 66be3054eb36f0a4279d0c36562e63aa2dae371c	2020-12-21 10:35:15 -08:00
Tom McClintock	a3aafea076	Fixed a typo in dataloader.py. (#49437 ) Summary: This small PR fixes a one character typo in the docstring for `DataLoader`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49437 Reviewed By: ngimel Differential Revision: D25665971 Pulled By: mrshenli fbshipit-source-id: b60f975f1e3bf0bb8f88e39f490f716c602f087e	2020-12-21 10:27:24 -08:00
Théo Dumont	b1a1271f68	Fix typo in add_pr_curve docstrings. (#49648 ) Summary: Very small PR to fix a typo. ### Description Fixed 1 typo in the documentation of `torch/utils/tensorboard/writer.py` (replaced "_should in_" by "_should be in_") Pull Request resolved: https://github.com/pytorch/pytorch/pull/49648 Reviewed By: ngimel Differential Revision: D25665831 Pulled By: mrshenli fbshipit-source-id: a4e733515603bb9313c1267fdf2cfcc2bc2773c6	2020-12-21 10:21:55 -08:00
Erjia Guan	b80a36614f	Fix return type Any for Ternary ops (#49165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49165 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D25463694 Pulled By: ejguan fbshipit-source-id: 5cf907e8de6eeb0171d61175a60fac9812b76c6c	2020-12-21 10:12:41 -08:00
Ivan Yashchuk	8be205ae13	Added linalg.solve (#48456 ) Summary: This PR adds `torch.linalg.solve`. `linalg_solve_out` uses in-place operations on the provided result tensor. I modified `apply_solve` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_solve_out` but removing the error checks and device memory synchronization. In comparison to `torch.solve` this routine accepts 1-dimensional tensors and batches of 1-dim tensors for the right-hand-side term. `torch.solve` requires it to be at least 2-dimensional. Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48456 Reviewed By: izdeby Differential Revision: D25562222 Pulled By: mruberry fbshipit-source-id: a9355c029e2442c2e448b6309511919631f9e43b	2020-12-21 10:11:12 -08:00
Jeffrey Wan	5ce94991eb	Fix sinc docs typo (#49667 ) Summary: Fix small typo in sinc docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/49667 Reviewed By: ngimel Differential Revision: D25665721 Pulled By: soulitzer fbshipit-source-id: 5f78b9e34bb0084e51ae79d1afc450bcb0ae3d75	2020-12-21 09:52:09 -08:00
Anshul Jain (FRL)	ef172e138c	[Mask R-CNN]Add Int8 AABB Generate proposals Op (#49574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49574 Adds support for additional Eigen Utils for custom type defs. Reviewed By: linbinyu Differential Revision: D25624556 fbshipit-source-id: 0ffa90aaf8cbf1d08825e95156fb40d966ca7042	2020-12-21 09:43:33 -08:00
Erjia Guan	7ed140a1a0	[WIP][DataLoader] Prototype of SamplerIterableDataset (#49363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49363 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D25623637 Pulled By: ejguan fbshipit-source-id: 9155d27d1fc91996b74110795cc73f1da0eedd44	2020-12-21 07:09:34 -08:00
Erjia Guan	554f79acb9	[WIP][DataLoader] Prototype of BatchIterableDataset (#49186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49186 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D25623636 Pulled By: ejguan fbshipit-source-id: 01a08cccb69301481c55b46358203354b9b4f5fa	2020-12-21 07:09:31 -08:00
Erjia Guan	1b6fc1fd42	[WIP][DataLoader] CollateIterableDataset prototype (#48933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48933 Prototype for CollateIterableDataset. Move `collate_batch_fn` to BatchIterableDataset - CollateIterableDataset - [x] Prototype - [x] Tests - BatchIterableDataset - [x] Prototype - [x] Tests - SamplerIterableDataset - [x] Prototype - [x] Tests Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D25623635 Pulled By: ejguan fbshipit-source-id: 99ba077619f672551ac15367baaba985db35a9c2	2020-12-21 07:04:25 -08:00
generatedunixname89002005325676	bab732a3a3	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D25662961 fbshipit-source-id: f5811a5797fd6dc8733fdf86f35c93d12a08d53a	2020-12-21 04:14:44 -08:00
Edson Romero	5c3788d5d7	Add support for torch.tensor_split to accept a tensor for `indices` argument (#49169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49169 Trying to solve PR request https://github.com/pytorch/pytorch/issues/47479. This diff tries to overload method `torch.tensor_split` to also accept a tensor for argument `split_size_or_sections` which currently accepts a python list or int. The motivation is to avoid converting a tensor to a list so that when tracing a model/module the tensor operations can be recorded. Implementation is following the diff that originally added the `tensor_split` method D24166164 (`ef4817fe5a`). Test Plan: ``` buck test caffe2/test:torch -- tensor_split ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/5910974550563805/ ``` buck test caffe2/test:others -- tensor_split ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/1688849905082678/ Reviewed By: mruberry Differential Revision: D25440885 fbshipit-source-id: 6705dc551279e3a5eb1e5ec1ede2728eab85ffb1	2020-12-20 21:43:44 -08:00
Yi Wang	96aed203bf	[Gradient Compression] Replace the assertions in PowerSGD comm hook by stream syncrhonization (#49435 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49435 Previously the assertion that prevents illegal memory access is because of the torch.any that returns a boolean value, which initiates a data transfer from the device to the host and forces a synchronization. An explicit synchronization is more to the point. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 118664204 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D25573484 fbshipit-source-id: 516d0d502da2863b516c15332702335ee662f072	2020-12-20 17:24:06 -08:00
Yi Wang	342bfd892f	[Gradient Compression] Add error feedback to layerwise PowerSGD (#49418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49418 Add error feedback to the original implementation of PowerSGD. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 118670930 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D25555538 fbshipit-source-id: c01145cc9acf574a4c6aa337dbbba0ba7d9350b2	2020-12-20 17:22:39 -08:00
Peter Bell	5c25f8faf3	stft: Change require_complex warning to an error (#49022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49022 BC-breaking note: Previously torch.stft took an optional `return_complex` parameter that indicated whether the output would be a floating point tensor or a complex tensor. By default `return_complex` was False to be consistent with the previous behavior of torch.stft. This PR changes this behavior so `return_complex` is a required argument. PR Summary: * #49022 stft: Change require_complex warning to an error Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25658906 Pulled By: mruberry fbshipit-source-id: 11932d1102e93f8c7bd3d2d0b2a607fd5036ec5e	2020-12-20 14:48:25 -08:00
Ivan Yashchuk	f5ee619d2a	Updated derivative rules for complex svd and pinverse (#47761 ) Summary: Updated `svd_backward` to work correctly for complex-valued inputs. Updated `common_methods_invocations.py` to take dtype, device arguments for input construction. Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`. Added `svd` and `pinverse` to list of complex tests. References for complex-valued SVD differentiation: - https://giggleliu.github.io/2019/04/02/einsumbp.html - https://arxiv.org/abs/1909.02659 The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant. https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/ The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl). Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47761 Reviewed By: ngimel Differential Revision: D25658897 Pulled By: mruberry fbshipit-source-id: ba33ecbbea3f592238c01e62c7f193daf22a9d01	2020-12-20 14:39:31 -08:00
Yi Wang	8b61fbdac9	Resubmit: [Gradient Compression] Implement the original layerwise PowerSGD (#49639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49639 Resubmit #49417 with a fix for distributed_test. The previous submission broke a multi-gpu test that runs on 4 GPUs. Since this test only runs on master, couldn't detect it before the submission. The real diff is: `4ca1014bb5` This time I have verified that the previous failed test `pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test` could pass after creating a PR (#49651) from a separate branch: https://app.circleci.com/pipelines/github/pytorch/pytorch/253644/workflows/c1c02b70-0877-40e6-8b4c-61f60f6b70ed/jobs/9768079 ghstack-source-id: 118969912 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook、 Reviewed By: mrshenli Differential Revision: D25654961 fbshipit-source-id: 2a45c8ceb9bdb54ff7309a8b66ec87e913e0150e	2020-12-20 13:02:52 -08:00
guol-fnst	c0deb231db	disable kthvalue overlap (#48254 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47934 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48254 Reviewed By: bdhirsh Differential Revision: D25276689 Pulled By: VitalyFedyunin fbshipit-source-id: a70774e31c269b41786170e99ec1ede42596ba7b	2020-12-19 11:30:27 -08:00
Luca Wehrstedt	1ac05cfe01	Remove DataPtr extractor from CUDAFuture (#48840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48840 The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams. This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor. In https://github.com/pytorch/pytorch/pull/48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to. In my opinion, this approach is just brilliant! Thank wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes. ghstack-source-id: 118704935 Test Plan: Unit tests Reviewed By: wanchaol Differential Revision: D25334355 fbshipit-source-id: 3f1d3bf6e6e8505a114c877fb9a6fcc3f68d91d3	2020-12-19 11:03:45 -08:00
Ilia Cherniavskii	e0f60c9720	Disable test on windows (#49636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49636 test_export_stacks fails with permission errors Test Plan: CI Imported from OSS Reviewed By: robieta Differential Revision: D25654680 fbshipit-source-id: 5689289e06eebc0686030f90ed56483a072b6850	2020-12-18 22:09:52 -08:00
Xiong Zhang	e2d2d9bb0c	[PyTorch Mobile] Preserve bundled input related methods when calling optimize_for_mobile (#49170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49170 Added an extra step to always preserve the bundled inputs methods if they are present in the input module. Also added a check to see if all the methods in the `preseved_methods` exist. If not, we will now throw an exception. This can hopefully stop hard-to-debug inputs from getting into downstream functions. ~~Add an optional argument `preserve_bundled_inputs_methods=False` to the `optimize_for_mobile` function. If set to be True, the function will now add three additional functions related with bundled inputs to be preserved: `get_all_bundled_inputs`, `get_num_bundled_inputs` and `run_on_bundled_input`.~~ Test Plan: `buck test mode/dev //caffe2/test:mobile -- 'test_preserve_bundled_inputs_methods $test_mobile_optimizer\.TestOptimizer$'` or `buck test caffe2/test:mobile` to run some other related tests as well. Reviewed By: dhruvbird Differential Revision: D25463719 fbshipit-source-id: 6670dfd59bcaf54b56019c1a43db04b288481b6a	2020-12-18 22:01:46 -08:00
Shen Li	ad9923e5d5	Revert D25511543: [Gradient Compression] Implement the original layerwise PowerSGD Test Plan: revert-hammer Differential Revision: D25511543 (`71f3399e19`) Original commit changeset: 19ef188bc2d4 fbshipit-source-id: a363641a059aeacc57684884998cf8fb7363d748	2020-12-18 20:30:29 -08:00
Jerry Zhang	5cde23fdd4	[quant][graphmode][fx] Allow user to specify qconfig for call_method (#49621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49621 This adds support to configure qconfig for a call_method, e.g. x.chunk, this will help workaround a problem in our internal model. TODO: since call_method is also a string and we flatten the qconfig, might need to resolve namespace conflict between call_method and module_name TODO: Add scope support to set the qconfig for call_method correctly with original qconfig Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D25651828 fbshipit-source-id: 82d66b121d37c8274fd481b6a2e9f9b54c5ca73d	2020-12-18 20:21:52 -08:00
mrshenli	e4eaa6de5f	Fix lint (#49629 ) Summary: Fix lint on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/49629 Reviewed By: rohan-varma Differential Revision: D25654199 Pulled By: mrshenli fbshipit-source-id: 2ab5669ad47996c0ca0f9b6611855767d5af0506	2020-12-18 19:26:06 -08:00
Lucas Hosseini	7278e3bd29	Bump tensorpipe version (#49599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49599 Reviewed By: lw Differential Revision: D25639036 Pulled By: mrshenli fbshipit-source-id: 595b396a01d7fa9049d88447ab9079e286637afe	2020-12-18 18:52:41 -08:00
Pritam Damania	159de1f1d6	Add benchmark for torch.distributed.pipeline.sync.Pipe (#49577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49577 Repurposing the benchmarking from https://github.com/facebookresearch/fairscale/blob/master/benchmarks/pipe.py and pulling in a stripped down version of the benchmark into PyTorch. Sample output: ``` Running benchmark with args: Namespace(batch_size=8, checkpoint='never', chunks=4, host='localhost', max_batch=10, num_decoder_layers=10, num_devices=4) Number of parameters for model: 292833040 \| batch 1 \| wps 3593.07 \| loss 25.98 \| ppl 192556591553.37 \| batch 2 \| wps 4405.16 \| loss 19.36 \| ppl 256201548.33 \| batch 3 \| wps 4404.98 \| loss 23.56 \| ppl 17111244076.37 \| batch 4 \| wps 4413.25 \| loss 27.11 \| ppl 594561327825.83 \| batch 5 \| wps 4408.53 \| loss 25.92 \| ppl 181277705101.33 \| batch 6 \| wps 4385.64 \| loss 24.92 \| ppl 66592883598.50 \| batch 7 \| wps 4434.11 \| loss 24.75 \| ppl 56113635884.68 \| batch 8 \| wps 4441.25 \| loss 24.88 \| ppl 63666024212.82 \| batch 9 \| wps 4425.49 \| loss 25.35 \| ppl 101959669008.98 \| batch 10 \| wps 4421.05 \| loss 25.34 \| ppl 101597621863.94 Peak memory usage for GPUs: cuda:0: 2.38GiB, cuda:1: 3.04GiB, cuda:2: 3.04GiB, cuda:3: 3.67GiB, ``` ghstack-source-id: 118939686 Test Plan: sentinel Reviewed By: rohan-varma Differential Revision: D25628721 fbshipit-source-id: 41c788eed4f852aef019aec18a84cb25ad254f3a	2020-12-18 18:33:47 -08:00
Pritam Damania	8c52fdf522	Improve documentation for pipeline parallelism. (#48638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48638 Polishing up some of the docs for the main `Pipe` class and its `forward` method. ghstack-source-id: 118820804 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D25237705 fbshipit-source-id: ba3d8737b90a80024c827c0887fc56f14bf678b7	2020-12-18 18:28:26 -08:00
Yi Wang	71f3399e19	[Gradient Compression] Implement the original layerwise PowerSGD (#49417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49417 The existing implementation applies PowerSGD to a batch of flattened tensors, which is a coarse-grained compression. This hook now is renamed as "batched_powerSGD_hook". Now implement the original implementation in the paper, which applies PowerSGD to each per-parameter tensor. This is a layerwise fine-grained compression. Although this original implementation is slower, it is expected to achieve a higher accuracy, especially when the shapes of per-param tensors cannot be aligned. Also add a test in distributed_test.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 118921275 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D25511543 fbshipit-source-id: 19ef188bc2d4c7406443c8fa233c1f2c2f27d93c	2020-12-18 18:02:15 -08:00
Nikita Shulga	6f381de006	Inline coverage report combining/reporting (#49615 ) Summary: Instead of calling coverage frontend import coverage module and call combine() and html_report() Fixes https://github.com/pytorch/pytorch/issues/49596 by not using a strict mode when combining those reports Pull Request resolved: https://github.com/pytorch/pytorch/pull/49615 Reviewed By: seemethere Differential Revision: D25645196 Pulled By: malfet fbshipit-source-id: be55b5c23a3569a331cbdf3f86d8c89bc27d5fe1	2020-12-18 17:08:46 -08:00
Hui Guo	e2e44bb10a	[Issue #46210 ] added torch.fx.len() to provide support for len(); added a test case for torch.fx.len() (#49532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49532 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D25608804 Pulled By: huiguoo fbshipit-source-id: 93ac02ab57db5d200d92443062286c34782ec0ef	2020-12-18 16:43:57 -08:00
Elias Ellison	3659560fba	[NNC] Disable masked fill (#49622 ) Summary: There's a bug internally, disable as quick fix before investigation Pull Request resolved: https://github.com/pytorch/pytorch/pull/49622 Test Plan: Imported from GitHub, without a `Test Plan:` line. build Reviewed By: zheng-xq, PursueHappinessDirectly Differential Revision: D25651897 Pulled By: eellison fbshipit-source-id: dd1454f2ef7506d7844016128aa6320d7e69aa6e	2020-12-18 16:28:00 -08:00
Jeffrey Wan	5ab9593098	`torch.reciprocal`: promote integer inputs to float (#49102 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49091 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49102 Reviewed By: VitalyFedyunin Differential Revision: D25639541 Pulled By: soulitzer fbshipit-source-id: 1dd360bd7b77f106d606143d8d3961610bac8cb7	2020-12-18 16:17:30 -08:00
Ilia Cherniavskii	485aee7a22	Output stacks (support for SVG visualization) (#48438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48438 Outputting stacks in a format suitable for SVG vizualization (e.g. with https://github.com/brendangregg/FlameGraph tool) Test Plan: python test/test_profiler.py -k test_export_stacks e.g. resnet18 (note: actual SVG is interactive): <img width="1193" alt="Screen Shot 2020-11-24 at 7 06 27 PM" src="https://user-images.githubusercontent.com/30845429/100178160-397f3500-2e88-11eb-81c4-34b19c5fcb87.png"> Reviewed By: dzhulgakov Differential Revision: D25174270 Pulled By: ilia-cher fbshipit-source-id: 6b60084071b209441805c468f5ff777318e42d1a	2020-12-18 16:10:41 -08:00
Jeffrey Wan	d0a12c5a47	Add sinc operator (#48740 ) Summary: Implements the sinc operator. See https://numpy.org/doc/stable/reference/generated/numpy.sinc.html ![image](https://user-images.githubusercontent.com/13428986/101653855-cdffa080-3a0d-11eb-8426-ecc81c152ebd.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48740 Reviewed By: ezyang Differential Revision: D25597565 Pulled By: soulitzer fbshipit-source-id: 6dbcf282ee4eba34930bc9e5c85c0c5e79cf0322	2020-12-18 15:52:24 -08:00
Ivan Kobzarev	d088359e5a	[torchscript] Fix constant propagation schemas (#49605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49605 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D25643157 Pulled By: IvanKobzarev fbshipit-source-id: c5440622f6cf559afadca853e1eb7a9fbb8edf7f	2020-12-18 15:28:42 -08:00
Pritam Damania	9d91360b5d	Cleanup APIs for pipeline parallelism. (#48630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48630 1) Make torch.distributed.pipeline package public. 2) Make several helper methods private. ghstack-source-id: 118820803 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D25235688 fbshipit-source-id: c32833ebf090ddbd4eaf06fcb5e3f9d421623a60	2020-12-18 15:17:13 -08:00
Sam Estep	39d89e06e0	Upload test times to S3 (#49190 ) Summary: This PR currently just modifies the `test/print_test_stats.py` script (run in the `pytorch_linux_test` job) so that now it uploads test times to the new `ossci-metrics` S3 bucket (rather than just to Scribe) if passed the `--upload-to-s3` parameter. The next step is to add an additional step to that `pytorch_linux_test` job which checks if it's being run on a PR, and if so, finds the `master` commit to compare against (similar to what's done in the now-unused `.jenkins/pytorch/short-perf-test-{c,g}pu.sh` scripts) and adds test time info to the Dr CI comment if the PR is significantly different from the base revision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49190 Test Plan: An "integration test" would be to just look in [the `ossci-metrics` S3 bucket](https://s3.console.aws.amazon.com/s3/buckets/ossci-metrics) to confirm that the CI run(s) for this PR did indeed upload their test time data successfully. To test this locally, first make sure you have all the packages you need, such as these: ``` $ conda install -c anaconda boto3 $ conda install -c conda-forge unittest-xml-reporting ``` Then run whatever tests you want; these are the ones I used for my local smoke test, for no particular reason: ``` $ python test/test_spectral_ops.py --save-xml=/tmp/reports/spectral_ops ``` Once the tests finish, run the script to upload their times to S3: ``` $ CIRCLE_SHA1="$(git rev-parse HEAD)" CIRCLE_JOB=foo test/print_test_stats.py --upload-to-s3 /tmp/reports/spectral_ops ``` Now check that they uploaded successfully: ``` $ aws s3 cp "s3://ossci-metrics/test_time/$(git rev-parse HEAD)/foo/" /tmp/reports --recursive ``` And that it's a valid `.json.bz2` file: ``` $ bzip2 -kdc /tmp/reports/Z.json.bz2 \| jq . \| head -n21 { "build_pr": null, "build_tag": null, "build_sha1": "e46f43621b910bc2f18dd33c08f5af18a542d5ed", "build_branch": null, "build_job": "foo", "build_workflow_id": null, "total_seconds": 0.9640000000000003, "suites": { "TestFFTCPU": { "total_seconds": 0.9640000000000003, "cases": [ { "name": "test_fft_invalid_dtypes_cpu", "seconds": 0.022, "errored": false, "failed": false, "skipped": false }, { "name": "test_istft_throws_cpu", ``` Reviewed By: walterddr, malfet Differential Revision: D25618035 Pulled By: samestep fbshipit-source-id: 4d8013859a38a49e5bba700c5134951ca1a9d8b7	2020-12-18 14:46:37 -08:00
Zachary DeVito	b361e33a66	[package] implicitly extern stdlib before mocking (#49306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49306 This allows you to mock out everything except for specific patterns while still correctly externing the python standard library. This makes it less likely that you will need to override require_module. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D25526212 Pulled By: zdevito fbshipit-source-id: 7339f4c7f12af883496f79de95e57d452bb32dc2	2020-12-18 14:16:46 -08:00
James Reed	fb755ad33e	[FX] Emit named tuple construction node when NamedTuple appears as an arg (#49553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49553 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D25618577 Pulled By: jamesr66a fbshipit-source-id: 042f742f9ca02e59bbceda97bfcf47f9bac07873	2020-12-18 14:10:17 -08:00
Pritam Damania	27f355f87e	Test pipeline parallelism works with DDP. (#48470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48470 Adding a unit test to test this works as expected. Although, this doesn't work with other checkpointing modes of the pipe and checkpoint=never needs to be set for this to work. ghstack-source-id: 118820806 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D25182668 fbshipit-source-id: 85e69e338bf388c132a303ad93e29ec2cc4a0ed8	2020-12-18 13:34:44 -08:00
Nikitha Malgi	e17f0fd676	Adding support for bitwise augassignment operators (#44621 ) Summary: ======== Fixes #{42915} This commit adds support for Bitwise Shorthands in TorchScript, i.e : \|=,&=,^=,<<=,>>=,**= Testing: ====== This commit also adds test for the above fix in test_jit.py The test can be invoked by pytest -k augassign test/test_jit.py Here is a snapshot of the testing: <img width="1238" alt="image" src="https://user-images.githubusercontent.com/70345919/93105141-8f9f5300-f663-11ea-836b-3b52da6d2be5.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/44621 Reviewed By: mrshenli Differential Revision: D23906344 Pulled By: nikithamalgifb fbshipit-source-id: 4c93a7430a625f698b163609ccec15e51417d564	2020-12-18 12:07:54 -08:00
Ilia Cherniavskii	daaf932a99	New profiler API (#48280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48280 Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test Reviewed By: ngimel Differential Revision: D25142220 Pulled By: ilia-cher fbshipit-source-id: c57fa42855895075328733d7379eaf3dc1743d14	2020-12-18 11:49:02 -08:00
Dhruv Matani	4a870f6518	[PyTorch Mobile] Export Operator List from Mobile CompilationUnit instead of from TorchScript Model (#49385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49385 Currently, the API to export operator lists accepts a `torch::jit::Module` object, and spits out an operator list. The operator list is practically used only for mobile. This is not ideal because the set of root operators may change by the time the model is subsequently optmized and exported for mobile. What we need to to instead is glean the list of operators from the mobile model itself (`bytecode.pkl` specifically), and expose that instead. Also updated the logic in `converter`. ### Before this change: 1. Get operator List from Torch Script Model 2. Convert to bytecode mobile model ### After this change: 1. Convert to bytecode mobile model 2. Use this converted mobile model to get the list of operators for each method on the model ghstack-source-id: 118796752 Test Plan: Added a unit test in `test_lite_interpreter.cpp` to ensure that all model referenced operators show up in the exported operator list. Also make `test_lite_interpreter.cpp` runnable from `xplat/caffe2/BUCK` since this is where the production code will be built from. Verified that the list of operators produced before and after this change for an example model (segmentation) are the same. {P147863234} Also verified that the operator lists for BI-Xray model is different (we have been having problems with missing operators for this one): {P154903132} Reviewed By: iseeyuan Differential Revision: D24690094 fbshipit-source-id: 0426a6ef90456a811010cfe337c415882ae2deff	2020-12-18 11:17:57 -08:00
Jane Xu	71ca600af9	Renaming CAFFE2_API to TORCH_API (#49496 ) Summary: Since caffe2 and torch have been consolidated, CAFFE2_API should be merged with TORCH_API. Addresses a TODO. Manually edited some references of the removed `CAFFE2_API`: * `CONTRIBUTING.md` * `caffe2/proto/CMakeLists.txt` * `cmake/ProtoBuf.cmake` * `c10/macros/Export.h` * `torch/csrc/WindowsTorchApiMacro.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49496 Reviewed By: malfet, samestep Differential Revision: D25600726 Pulled By: janeyx99 fbshipit-source-id: 7e068d959e397ac183c097d7e9a9afeca5ddd782	2020-12-18 10:54:50 -08:00
James Reed	c9e052130a	[FX] Enforce args is tuple and kwargs is dict (#49526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49526 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D25606115 Pulled By: jamesr66a fbshipit-source-id: f2a21d02a2cf8c08cbd618efc5a6a28d34806851	2020-12-18 10:21:19 -08:00
Taylor Robie	faf6032945	Remove deadlines for Caffe2 hypothesis_test when running on GPU. (#49591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49591 A bunch of these tests are marked flaky, and have been since time immemorial. (Read: as far back as Buck will build.) However closer inspection reveals that they fail if and only if run on a GPU worker. What seems to be going on is that there are more jobs than GPUs, so the contention causes waits which registers as timeouts on the test. This diff is kind of hacky, but it basically just drops deadlines if a GPU is present. Because Caffe2 is going away I'm not too terribly concerned about a beautiful solution, but we may as well keep some test coverage if it's easy. CC Sebastian, Ilia, Min, and Hongzheng who also have tasks for what seems to be the same flakiness. Test Plan: Turn the tests back on and see if they fall over. (The failure repros reliably on an OnDemand GPU and is fixed by this change, so it's not really just a hail Mary.) Reviewed By: ngimel Differential Revision: D25632981 fbshipit-source-id: 43dcce416fea916ba91f891e9e5b59b2c11cca1a	2020-12-18 10:00:24 -08:00
albanD	ccd646696b	Fix Module backward hooks for all Tensor inputs/outputs (#46163 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/598 This is BC-breaking as we now explicitly don't call the hook when there are not Tensors at the top level of the output. This feature was not working anyways as the returned grad_input/grad_output were wrong (not respecting the output structure and wrong inputs for multi-Node Module). This is also BC-breaking as we now report the correct gradients for `nn.Module`s that contain multiple autograd `Node`s while we use to return bad results before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46163 Reviewed By: ailzhang, mruberry Differential Revision: D24894180 Pulled By: albanD fbshipit-source-id: e1b5d193d2818eb2f51e2a2722c7405c8bd13c2b	2020-12-18 09:04:36 -08:00
jonykarki	0b27d57062	fixed the first line of torch.rst to match the __init__.py file's first line (#49584 ) Summary: Changed the first line of the torch.rst file to match that of the __init__.py file Fixes https://github.com/pytorch/pytorch/issues/49228 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49584 Reviewed By: VitalyFedyunin Differential Revision: D25639260 Pulled By: mrshenli fbshipit-source-id: a0bafd945ff92115eed932662feedc46d29dfaab	2020-12-18 08:55:58 -08:00
Richard Zou	7545ff6619	Refactor VmapPhysicalView::newLogicalToPhysical (#49482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49482 Motivation ========== Batching rules always invoke newLogicalToPhysical at the very end to turn a physical tensor into a logical BatchedTensor (an example is below): ``` Tensor select_backward_batching_rule(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t index) { auto grad_physical = MultiBatchVmapTransform::logicalToPhysical(grad); auto grad_input = at::zeros(grad_physical.getPhysicalShape(input_sizes), grad.options()); auto physical_dim = getGradInputPhysicalDim(dim, input_sizes, grad_physical.numBatchDims()); grad_input.select(physical_dim, index).copy_(grad_physical.tensor()); return grad_physical.newLogicalFromPhysical(grad_input); } ``` However, albanD noted that this function is confusing and ambiguous because it's unclear which physical tensor is being turned into the logical (in this case, grad_physical is a VmapPhysicalView, but we're really transforming grad_input and returning it). https://github.com/pytorch/pytorch/pull/44505#discussion_r487144018 I didn't want to make too many changes to the batching rule API because I think we'll change it even more in the future, but this PR attempts to remove the ambiguity by applying one of the suggestions in https://github.com/pytorch/pytorch/pull/44505#discussion_r487144018 This PR ======= The diagnosis of the problem is that we were conflating "VmapPhysicalView", which maps logical attributes on a Tensor (like dimension and shape) to physical attributes, with the reverse physical-to-logical map. This PR creates a new VmapPhysicalToLogicalMap object that handles the latter. Instead of calling `grad_physical.newLogicalFromPhysical(grad_input)`, an author of batching rules should now retrieve the VmapPhysicalToLogicalMap object and apply it to their physical input. So the above code becomes: ``` grad_physical.getPhysicalToLogicalMap().apply(grad_input) ``` I've also moved VmapPhysicalView::makeLogicalFromPhysicalListInplace to VmapPhysicalToLogicalMap::applyInplace. Test Plan ========= wait for tests Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D25592645 Pulled By: zou3519 fbshipit-source-id: 9c6ede9901ec6b70e5763193064658a8f91e6d48	2020-12-18 08:48:02 -08:00
Rong Rong (AI Infra)	f975f99d1d	add checkout PR tip step for quick checks (#49590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49590 Reviewed By: samestep Differential Revision: D25633341 Pulled By: walterddr fbshipit-source-id: 6e8db1f628f562d7632390bdb7788437cb1bf63d	2020-12-18 08:41:27 -08:00
Shijun Kong	2de345d44d	Add op bench for caffe2 quantile op (#49598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49598 Add op bench for caffe2 quantile op Test Plan: `buck run mode/opt caffe2/benchmarks/operator_benchmark/c2:quantile_op_test -- --wramup_iterations=10000 --iterations=10000` Reviewed By: radkris-git Differential Revision: D25590085 fbshipit-source-id: 0db58ac87c595b2bf2958f6299a1bf2ccea019db	2020-12-18 08:32:59 -08:00
Peng Wu	6568572712	Support integral types for kAbs in SimpleIREvaluator (#49357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49357 This is a follow-up fix for PR #48679, where the previous PR adds support for integer inputs to aten::abs by promoting integers to float and then demote the result back to integers. This PR supports integer inputs to aten::abs more efficiently in the SimpleIREvaluator by allowing implementing integer inputs for kAbs (renamed from kFabs). - Rename kFabs to kAbs - Add support for integer input to kAbs in SimpleIREvalator (note that: llvm_codegen and cuda_codegen already supports integer inputs to kAbs) Test Plan: - `PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops` - `python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops` Imported from OSS Reviewed By: eellison Differential Revision: D25545791 fbshipit-source-id: e52f51a352d149f66ce8341fb3beb479be08a230	2020-12-18 07:57:58 -08:00
Ilia Cherniavskii	72b00a8a52	Revert D25480770: Set USE_KINETO=1 Test Plan: revert-hammer Differential Revision: D25480770 (`1a92802bde`) Original commit changeset: 037cd774f554 fbshipit-source-id: 6a6062195033ca91fcc0cfa1e890e47efc774ac1	2020-12-18 07:06:28 -08:00
Ilia Cherniavskii	1a92802bde	Set USE_KINETO=1 (#49201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49201 This unblocks kineto profiler for 1.8 release. This PR supercedes https://github.com/pytorch/pytorch/pull/48391 Note: this will somewhat increase the size of linux server binaries, bc we add libkineto.a and libcupti_static.a: -rw-r--r-- 1 jenkins jenkins 1107502 Dec 10 21:16 build/lib/libkineto.a -rw-r--r-- 1 root root 13699658 Nov 13 2019 /usr/local/cuda/lib64/libcupti_static.a Test Plan: CI https://github.com/pytorch/pytorch/pull/48391 Imported from OSS Reviewed By: ngimel Differential Revision: D25480770 fbshipit-source-id: 037cd774f5547d9918d6055ef5cc952a54e48e4c	2020-12-18 01:48:10 -08:00
Nikita Shulga	020c443fd1	Fix CustomAutogradTest.ReentrantPriority rerun failures (#49581 ) Summary: Clear static variable at the end of the test to ensure test passes after re-runs Pull Request resolved: https://github.com/pytorch/pytorch/pull/49581 Test Plan: `./bin/test_api "--gtest_filter=CustomAutogradTest.ReentrantPriority" --gtest_repeat=50` Before the change all subsequent runs of the test failed with ``` ../test/cpp/api/autograd.cpp:681: Failure Expected equality of these values: order.size() Which is: 310 10 ``` Reviewed By: mrshenli Differential Revision: D25632374 Pulled By: malfet fbshipit-source-id: 4814d22b5dff15e1b38a0187e51070771fd58370	2020-12-18 00:34:06 -08:00
Pritam Damania	43f6da787e	Use store based barrier in init_process_group. (#49419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49419 As described in https://github.com/pytorch/pytorch/issues/48110, the newly introduced `barrier()` in `init_process_group` messes up NCCL communicator state since it uses a bunch of default devices to perform an allreduce which simulates a barrier(). As a ressult, subsequent NCCL operations might not behave as expected. ghstack-source-id: 118861776 Test Plan: 1) unit test added. 2) waitforbuildbot Reviewed By: mrshenli Differential Revision: D25566550 fbshipit-source-id: ab083b67b634d7c515f4945deb228f959b27c936	2020-12-18 00:02:54 -08:00
Mike Ruberry	5fcfebd84a	Disables method variant grad and grad grad checks (#49576 ) Summary: These are redundant with the functional variant checks and can be very costly, as some grad and gradgrad testing takes minutes to run per variant. Maybe in the future we'll add them back for operations with divergent method implementations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49576 Reviewed By: albanD, ngimel Differential Revision: D25631691 Pulled By: mruberry fbshipit-source-id: 247f750979d9dafab2454cdbfa992a2aa6da724a	2020-12-17 23:46:40 -08:00
Xu Zhao	573f4aa352	FLOPS Roofline Analysis Feature for PyTorch Profiler. (#46506 ) Summary: FLOPs Roofline Analysis Feature for PyTorch Profiler. Currently, PyTorch Profiler lacks the ability to measure the FLOPs of operators, such as mm and conv. FLOPs are helpful to estimate the computation complexity of the operators. For now, we use input shapes to estimate the number of floating pointer operations. In the future, we may compute this information by tracking hardware counters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46506 Test Plan: Run `python test/test_profiler_flops.py -k test_flops`. The test will print a profiler table with "FLOPS" column, like the following: ---------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------- ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls Input Shapes MFLOPS ---------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------- ------------ aten::matmul 0.06% 57.653us 82.97% 79.310ms 79.310ms 1 [[40, 33, 1, 243], [243, 243]] -- aten::mm 82.84% 79.186ms 82.86% 79.204ms 79.204ms 1 [[1320, 243], [243, 243]] 984.323 aten::conv2d 0.04% 36.345us 16.06% 15.347ms 15.347ms 1 [[40, 16, 18, 260], [33, 16, 18, 18], [33], [ 44065010.318 aten::convolution 0.02% 16.016us 16.02% 15.310ms 15.310ms 1 [[40, 16, 18, 260], [33, 16, 18, 18], [33], [ -- aten::_convolution 0.07% 63.855us 16.00% 15.294ms 15.294ms 1 [[40, 16, 18, 260], [33, 16, 18, 18], [33], [ -- aten::mkldnn_convolution 15.89% 15.188ms 15.93% 15.225ms 15.225ms 1 [[40, 16, 18, 260], [33, 16, 18, 18], [33], [ -- aten::relu 0.10% 98.223us 0.64% 612.157us 306.079us 2 [[40, 33, 1, 243]] -- aten::threshold 0.49% 465.416us 0.54% 513.934us 256.967us 2 [[40, 33, 1, 243], [], []] -- aten::add_ 0.29% 279.301us 0.29% 279.301us 279.301us 1 [[40, 33, 1, 243], [243], []] -- aten::empty 0.10% 99.113us 0.10% 99.113us 24.778us 4 [[], [], [], [], [], []] -- ---------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------- ------------ Self CPU time total: 95.584ms . ---------------------------------------------------------------------- Ran 1 test in 0.176s For now, we only provide FLOPs calculation for aten::conv2d and aten::mm operators. Reviewed By: ezyang Differential Revision: D25214452 Pulled By: xuzhao9 fbshipit-source-id: 0ae841bd8dbdeb032346dc3d9d38e19875aa1da3	2020-12-17 21:19:25 -08:00
Lu Fang	5db12b6811	Add type inference for dequantization.tensors (#49517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49517 We should add concrete type info for Tensor List case as well. Test Plan: ci Reviewed By: qizzzh Differential Revision: D25599223 fbshipit-source-id: 3614e9ec25fc963a8d6a0bd641735fcca6c87032	2020-12-17 21:01:35 -08:00
Rong Rong (AI Infra)	ed0489c11a	disable concat nested namespace check (#49571 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49571 Disable nested namespace check since OSS standard is ``` set(CMAKE_CXX_STANDARD 14) ``` and its currently causing confusion on clang-tidy internally such as D25214452 Test Plan: clang-tidy Reviewed By: xuzhao9 Differential Revision: D25626392 fbshipit-source-id: 1fb472c89ebe9b83718ae27f2c1d77b8b2412b5e	2020-12-17 20:45:37 -08:00
Elias Ellison	9058040527	Add more list peephole idioms (#48268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48268 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D25104617 Pulled By: eellison fbshipit-source-id: b41c03d5da6e9b88acf21a859f61c5c70608c150	2020-12-17 20:25:41 -08:00
Stas Bekman	39d3578e91	[ddp launch] solve zombie problem (#49305 ) Summary: I was exhausted with needing to hunt down zombies when working with ddp launcher, so this PR solves the various zombie issues. This PR addresses 2 distinct zombie scenarios caused by ddp launch.py: 1. When the main process is killed, the child processes aren't killed and continue running 2. When any of the children processes dies (e.g. OOM), the rest of the children and the parent remain running, but really are stuck To solve these problems this PR switches from `wait` to `poll` and uses signal handlers. The main problem with `wait()` was that it's not async, and I was having a 2nd process OOM, and the code was stuck waiting for the first process to finish which will not happen since the first process is blocking now waiting for the 2nd process - a sort of deadlock. My 2nd card is smaller than the first one, so it occasionally OOMs. Using `asyncio` would probably be the cleanest solution, but as it's relatively new in python, perhaps polling is good enough. I wrote this little script to reproduce 2 problematic scenarios and a normal running setup, it does 3 different things according to the `--mode` arg - `oom` - causes the 2nd process to exit prematurely emulating OOM - `clean-finish` - just exit normally in both processes - `False` (lack of arg) just keep on running - emulating multiple normally running processes ``` # oom.py import argparse from time import sleep import sys def main(): parser = argparse.ArgumentParser() parser.add_argument("--local_rank", default=False, type=int) parser.add_argument("--mode", default=False, type=str) args, _ = parser.parse_known_args() print(f"{args.local_rank} is starting") sleep(3) if args.mode == "oom": # emulate OOM in 2nd card if args.local_rank == 1: raise RuntimeError("OOM") if args.mode == "clean-finish": sleep(1) print(f"{args.local_rank} is cleanly finishing") sys.exit(0) while (True): # emulate long running process print(f"{args.local_rank} is running") sleep(1) if __name__ == "__main__": main() ``` Let's begin: ### 1. Normal execution ``` python -m torch.distributed.launch --nproc_per_node=2 ./oom.py --mode=clean-finish ``` All the processes exit upon completion - I won't bother pasting the log here - just testing that my code didn't break the normal running ### 2. OOM ``` python -m torch.distributed.launch --nproc_per_node=2 ./oom.py --mode=oom ``` ``` POLLING FOR 17547 POLLING FOR 17548 0 0 is starting 1 1 is starting POLLING FOR 17547 POLLING FOR 17548 POLLING FOR 17548 POLLING FOR 17547 POLLING FOR 17547 POLLING FOR 17548 0 is running Traceback (most recent call last): File "./oom.py", line 33, in <module> main() File "./oom.py", line 20, in main raise RuntimeError("OOM") RuntimeError: OOM POLLING FOR 17548 process 17548 is no more Killing subprocess 17547 Killing subprocess 17548 Traceback (most recent call last): File "/home/stas/anaconda3/envs/main-38/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/stas/anaconda3/envs/main-38/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/distributed/launch.py", line 341, in <module> main() File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/distributed/launch.py", line 327, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/home/stas/anaconda3/envs/main-38/bin/python', '-u', './oom.py', '--local_rank=1', '--mode=oom']' returned non-zero exit status 1. ``` All processes exited and the trace was printed ### 3. Exit on SIGINT/SIGTERM If I started a process and then realized I made a mistake I want to be able to kill it cleanly and if any sub-processes have already been spawned I want them to be killed too. Here the sighandler takes care of trapping the SIGTERM/SIGINT. ``` python -m torch.distributed.launch --nproc_per_node=2 ./oom.py ``` Here the processes emulate a long normal run. So let's Ctrl-C the process as soon as it started and see: ``` POLLING FOR 18749 POLLING FOR 18750 0 0 is starting 1 1 is starting POLLING FOR 18749 POLLING FOR 18750 POLLING FOR 18750 POLLING FOR 18749 POLLING FOR 18749 POLLING FOR 18750 0 is running 1 is running POLLING FOR 18750 POLLING FOR 18749 0 is running 1 is running ^CTraceback (most recent call last): Killing subprocess 18749 Traceback (most recent call last): File "./oom.py", line 33, in <module> File "./oom.py", line 33, in <module> Killing subprocess 18750 Parent got kill signal=SIGINT, exiting ``` all processes got killed -------------------------------- So this covered the 2 problematic cases and 1 normal case Notes: - we could probably switch to `sleep(3)` - `1` is probably too fast - all the debug prints will be removed once you are happy - I left them so that it's easier for you to test that my PR does the right thing. Thank you! Pull Request resolved: https://github.com/pytorch/pytorch/pull/49305 Reviewed By: izdeby Differential Revision: D25565617 Pulled By: rohan-varma fbshipit-source-id: 1ea864113f283d4daac5eef1131c8d745aae4c99	2020-12-17 20:07:59 -08:00
Bram Wasti	1047957831	[te][reapply] Add fast log approximation based on sleef (#49575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49575 This is a fast log implementations benchmark: ``` buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench -c 'fbcode.caffe2_gpu_type=none' ``` Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr -- *.fastLogFloat Reviewed By: bertmaher Differential Revision: D25627157 fbshipit-source-id: a4920f4f4005ce617d372b375e790ca966275cd9	2020-12-17 17:02:00 -08:00
Mike Ruberry	c78fd76f18	Revert D25542799: [PyTorch] Merge CoinflipTLS into RecordFunctionTLS Test Plan: revert-hammer Differential Revision: D25542799 (`9ce1df079f`) Original commit changeset: 310f9fd15710 fbshipit-source-id: 51777914422a560e94430a786c86f5de4007a00b	2020-12-17 16:43:52 -08:00
Mike Ruberry	625bc40def	Revert D25544731: [PyTorch] Avoid extra Tensor refcounting in _cat_out_cpu Test Plan: revert-hammer Differential Revision: D25544731 (`1a0510463a`) Original commit changeset: 7b9656d0371a fbshipit-source-id: 0f7ea74eca282cadf269bbd284d59650a431ed65	2020-12-17 16:43:49 -08:00
Mike Ruberry	385f6b4807	Revert D25545777: [PyTorch] Use .sizes() instead of .size() in _cat_out_cpu Test Plan: revert-hammer Differential Revision: D25545777 (`c1879b573e`) Original commit changeset: b2714fac95c8 fbshipit-source-id: f534f8fc312943f1e6ba3d4029d6cf69b006aca8	2020-12-17 16:43:45 -08:00
Mike Ruberry	52b3775914	Revert D25546409: [PyTorch] Use .sizes() isntead of .size() in cat_serial_kernel_impl Test Plan: revert-hammer Differential Revision: D25546409 (`953f9922ec`) Original commit changeset: 196034716b6e fbshipit-source-id: 0e80f06a98c2842d2f11db7057ffcdcaea85f3bf	2020-12-17 16:43:42 -08:00
Mike Ruberry	19dc5e94a6	Revert D25547962: [PyTorch] Make tls_local_dispatch_key_set inlineable (reapply) Test Plan: revert-hammer Differential Revision: D25547962 (`6f928a4a53`) Original commit changeset: 58424b1da230 fbshipit-source-id: 10ff9f45f6587f67e1c88886f977930b4f7e396a	2020-12-17 16:38:40 -08:00
Ansley Ussery	d17dc37112	Add dict comprehension (#47774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47774 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25615464 Pulled By: ansley fbshipit-source-id: 10bba6f70e812fa580cbbbf097e93de7142484cc	2020-12-17 15:25:30 -08:00
Edward Yang	ea4ccc730e	Revert D25445815: [te] Add fast log approximation based on sleef Test Plan: revert-hammer Differential Revision: D25445815 (`1329066b69`) Original commit changeset: 20696eacd12a fbshipit-source-id: 38830a6abd16260d60e5dd9a5594e65736a9c782	2020-12-17 15:03:17 -08:00
Omkar Salpekar	6db5e85726	[FileStore] Updating Docs to Reflect FileStore changes (#49557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49557 Updating the PyTorch docs to reflect that FileStore now supported the num_keys API. Also included a note to describe the behavior of the API. Test Plan: build and rendered docs. Reviewed By: jiayisuse Differential Revision: D25619000 fbshipit-source-id: 6c660d7ceb32d1d61024df8394aff3fcd0b752c1	2020-12-17 14:54:29 -08:00
Omkar Salpekar	31fcbbdf35	[FileStore] Implemented numKeys and Added Tests (#49556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49556 Implemented the missing Store functionality (specifically numKeys) in the FileStore. Test Plan: Added both C++ and Python tests to verify functionality. Reviewed By: jiayisuse Differential Revision: D25619001 fbshipit-source-id: 9146d0da9e0903622be3035880f619bbb2cc3891	2020-12-17 14:54:24 -08:00
Eli Uriegas	ad4467b93c	.github: Add action workflow to update S3 HTMLS (#49509 ) Summary: Successful run: https://github.com/pytorch/pytorch/runs/1572315901 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/49509 Reviewed By: walterddr Differential Revision: D25619133 Pulled By: seemethere fbshipit-source-id: 092ab12535f3bf4fc85bbfc690d3f5b10a5f8791	2020-12-17 14:50:59 -08:00
Jerry Zhang	4b85239532	[quant][eagermode][fix] Fix quantization for DeQuantStub (#49428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49428 Previously dequantstub will be swapped with nn.quantized.DeQuantize regardless of qconfig reason is we skipped attaching qconfig for DeQuantStub to avoid adding fake quantize module to it but the correct fix is to skip it in insert observers, this PR fixes the issue. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D25569991 fbshipit-source-id: d44a08c6e64c7a49509687dc389b57de1cbb878c	2020-12-17 14:42:40 -08:00
Bram Wasti	1329066b69	[te] Add fast log approximation based on sleef Summary: This is a fast log implementations benchmark: ``` buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench -c 'fbcode.caffe2_gpu_type=none' ``` Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr -- *.fastLogFloat Reviewed By: bertmaher Differential Revision: D25445815 fbshipit-source-id: 20696eacd12a55e797f606f4a6dbbd94c9652888	2020-12-17 14:28:34 -08:00
Martin Yuan	2b61e4d84c	Revert D25152559: T66557700 Support default argument values of a method Test Plan: revert-hammer Differential Revision: D25152559 (`6bde0ca6d3`) Original commit changeset: bbf52f1fbdbf fbshipit-source-id: 592fdb3078b1ac86cd394adc6c1bfd6b10d829e1	2020-12-17 14:05:49 -08:00
Rohan Varma	0d411c4216	Test distributed collectives profiling with Gloo on GPU (#49072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49072 As per the title, we should enable these tests for Gloo when run on GPU and the profiler is enabled with `use_cuda=True`. Enabling ProcessGroupNCCL profiling test to work with `use_cuda=True` is being tracked in https://github.com/pytorch/pytorch/issues/48987. ghstack-source-id: 118789003 Test Plan: CI Reviewed By: mrshenli Differential Revision: D25388986 fbshipit-source-id: 664d922ac2e10c77299daebdc6d3c92bb70eb56e	2020-12-17 13:43:06 -08:00
Stephen Jia	20b90f3909	Set is_non_overlapping_and_dense_ flag in OpaqueTensorImpl constructor (#49470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49470 https://github.com/pytorch/pytorch/pull/48625 changes the default contiguous settings for `TensorImpl` causing the Vulkan backend to crash. Therefore, add argument that can set `is_non_overlapping_and_dense_` back to false for `OpaqueTensorImpl` constructor. Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D25592826 Pulled By: SS-JIA fbshipit-source-id: e5d9de9a733875cb00c0546a3bc3271e5c6e23a3	2020-12-17 13:36:34 -08:00
Edward Yang	eb131cf484	Revert D25105217: [pytorch][PR] Fix bad error message when int overflow Test Plan: revert-hammer Differential Revision: D25105217 (`c675727adf`) Original commit changeset: a5aa7c026694 fbshipit-source-id: ddb4c93f9317e1747def8842a8072c84776cd487	2020-12-17 11:59:39 -08:00
Rohan Varma	a727bf2851	Refactor RPC matchBuiltInOp to get rid of exception swallowing (#49009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49009 As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by https://github.com/pytorch/pytorch/pull/41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. ghstack-source-id: 118794661 Test Plan: CI Reviewed By: mrshenli Differential Revision: D25392905 fbshipit-source-id: 6f93251635740bcf902824548b2bc6f9249be5f0	2020-12-17 11:37:21 -08:00
Jerry Zhang	b8d98f05e7	[reland][quant][docs] Add fx graph mode quantization to quantization docs (#49211 ) (#49515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49515 Test Plan: Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D25601061 fbshipit-source-id: 74e917d57895e9b4131a01fdcea8df3e94322bec	2020-12-17 10:30:10 -08:00
peterjc123	815d38395a	PyLong_{As/From}{Long/UnsignedLong} lint checks (#49280 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45581 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49280 Reviewed By: mruberry Differential Revision: D25592330 Pulled By: ezyang fbshipit-source-id: 5c16d6aed88ad1feaa7f129b4cd44c0561be2de2	2020-12-17 09:32:08 -08:00
generatedunixname89002005325676	c20b916cbd	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D25609974 fbshipit-source-id: 4db8f8100336a2f0f2af8bc7b960d3711a5d1d7d	2020-12-17 05:32:07 -08:00
Andrey Malevich	f5a26a554b	[C2] Revive unsafe CoalesceOp (#49402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49402 In cases of NCCLAllReduce operations there could be non-trivial overhead for launching cooperative kernels (especially in case of async execution of different parts of the model). This diff is reviving this operator to make it possible to fuse multiple operations into a single kernel. Test Plan: Unit-test. Used in a later diff. Reviewed By: xianjiec Differential Revision: D25531206 fbshipit-source-id: 64b1c161233a726f9e2868f1059316e42a8ea1fc	2020-12-17 04:31:29 -08:00
Sebastian Messmer	26974e6b28	Remove set_quantizer_ from native_functions.yaml (#49463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49463 set_quantizer_ takes a ConstQuantizerPtr argument, which is neither supported by JIT nor by c10. Also, it doesn't get dispatched (CPU and CUDA have the same implementation) and it is excluded from python bindings generation. So there is no real reason why this needs to be in native_functions.yaml Removing it unblocks the migration to c10-fullness since this is an op that would have been hard to migrate. See https://fb.quip.com/QRtJAin66lPN ghstack-source-id: 118710663 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25587763 fbshipit-source-id: 8fab921f4c256c128d48d82dac731f04ec9bad92	2020-12-17 03:28:00 -08:00
Mike Ruberry	f5b68e74d7	Revert D25574962: [pytorch][PR] Updated derivative rules for complex svd and pinverse Test Plan: revert-hammer Differential Revision: D25574962 (`9955355853`) Original commit changeset: 832b61303e88 fbshipit-source-id: d73f77f3e51b0f535dad6d21c5bebf8d41a6bfbd	2020-12-17 00:59:43 -08:00
Ansha Yu	c18af03a41	[pt] fuse ClipRangesGatherSigridHash (#49181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49181 Fuse ClipRangesGatherSigridHash Test Plan: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adindexer/merge/traced_merge_dper_fixes.pt --pt_inputs=/data/users/ansha/tmp/adindexer/merge/container_precomputation_bs1.pt --iters=30000 --warmup_iters=10000 --num_threads=1 --pred_net=/data/users/ansha/tmp/adindexer/precomputation_merge_net.pb --c2_inputs=/data/users/ansha/tmp/adindexer/merge/c2_inputs_precomputation_bs1.pb --c2_sigrid_transforms_opt=1 --c2_use_memonger=1 --c2_weights=/data/users/ansha/tmp/adindexer/merge/c2_weights_precomputation.pb --pt_enable_static_runtime --pt_cleanup_activations=true --pt_enable_out_variant=true --do_profile --compare_results ``` Verify op fused: Node #3: 0.00104917 ms/iter, %173 : Tensor, %174 : Tensor = fb::clip_ranges_gather_sigrid_hash_offsets(%75, %76, %39, %40, %41, %38, %26) Before: 0.0919786 After: 0.0911792 Reviewed By: hlu1 Differential Revision: D25468225 fbshipit-source-id: 36bd91c140eaa57cb42cdaad46d878b94f162a9d	2020-12-17 00:42:46 -08:00
Nikitha Malgi	26e076d19e	Adding fix for invalid annotation types for dictionary (#49425 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49362 Summary: This PR fixes the issue where invalid annotation types are used for a dictionary. Unsupported assertion message is generated for all invalid annotations Test Case: python test/test_jit.py TestJit.test_dict_invalid_annotations Pull Request resolved: https://github.com/pytorch/pytorch/pull/49425 Reviewed By: navahgar Differential Revision: D25601578 Pulled By: nikithamalgifb fbshipit-source-id: 91633e3d0891bdcb5402f044a74d02fe352ecd6f	2020-12-17 00:28:29 -08:00
Ryan Spring	65876d3f51	Change aten::native_layer_norm signature to match torch.layer_norm definition (#48971 ) Summary: This PR is to change the `aten::native_layer_norm` and `aten::native_layer_norm_backward` signature to match `torch.layer_norm` definition. The current definition doesn't provide enough information to the PyTorch JIT to fuse layer_norm during training. `native_layer_norm(X, gamma, beta, M, N, eps)` => `native_layer_norm(input, normalized_shape, weight, bias, eps)` `native_layer_norm_backward(dY, X, mean, rstd, gamma, M, N, grad_input_mask)` => `native_layer_norm_backward(dY, input, normalized_shape, mean, rstd, weight, bias, grad_input_mask)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48971 Reviewed By: izdeby Differential Revision: D25574070 Pulled By: ngimel fbshipit-source-id: 23e2804295a95bda3f1ca6b41a1e4c5a3d4d31b4	2020-12-16 23:09:18 -08:00
Xiang Gao	2ea1d97e3b	Add BFloat16 support for isinf and isfinite (#49356 ) Summary: Also fix some tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49356 Reviewed By: mruberry Differential Revision: D25604364 Pulled By: ngimel fbshipit-source-id: 9efdd83aaa96cacc66e9689db9f9d8c24175a693	2020-12-16 22:36:14 -08:00
Jerry Zhang	ede0b169ea	[quant][be] Add typing for quantization_mappings.py (#49179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49179 Test Plan: Imported from OSS Reviewed By: vkuzo, wat3rBro Differential Revision: D25470520 fbshipit-source-id: 16e35fec9a5f3339860bd2305ae8ffdd8e2dfaf7	2020-12-16 21:36:00 -08:00
Ailing Zhang	4edaf4d759	Bring back math_silu_backward which works for all backends. (#49439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49439 Test Plan: Imported from OSS Reviewed By: nikithamalgifb, ngimel Differential Revision: D25594129 Pulled By: ailzhang fbshipit-source-id: 627bbea9ba478ee3a8edcc6695abab6431900192	2020-12-16 21:06:12 -08:00
Iurii Zdebskyi	6230e337d5	Add torch._foreach_zero_ API (#47286 ) Summary: In this PR - add `_foreach_zero_` API - Update all optimizers under /_multi_tensor/ to use `_foreach_zero_` in `zero_grad` method Performance improvement ----------------- OP: zero_ ----------------- for-loop: 630.36 us foreach: 90.84 us script ``` import torch import torch.optim as optim import torch.nn as nn import torchvision import torch.utils.benchmark as benchmark_utils inputs = [torch.rand(3, 200, 200, device="cuda") for _ in range(100)] def main(): for op in [ "zero_" ]: print("\n\n----------------- OP: ", op, " -----------------") stmt = "[torch.{op}(t) for t in inputs]" timer = benchmark_utils.Timer( stmt=stmt.format(op = op), globals=globals(), label="str(optimizer)", ) print(f"autorange:\n{timer.blocked_autorange()}\n\n") stmt = "torch._foreach_{op}(inputs)" timer_mta = benchmark_utils.Timer( stmt=stmt.format(op = op), globals=globals(), label="str(optimizer_mta)", ) print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n") if __name__ == "__main__": main() ``` TODO - Refactor zero_grad once foreach APIs are stable. Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/47286 Reviewed By: ngimel Differential Revision: D24706240 Pulled By: izdeby fbshipit-source-id: aac69d6d134d65126ae8e5916f3627b73d8a94bf	2020-12-16 20:04:25 -08:00
Daniel Balchev	4ce2b0b0ac	Set caffe2::pthreadpool() size in ParallelOpenMP (#45566 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/45418. This is probably not the best solution, but it's a rebase of the solution we're considering until https://github.com/pytorch/pytorch/issues/45418 is solved. If you can outline a better one I'm willing to implement it (: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45566 Reviewed By: ezyang Differential Revision: D24621568 Pulled By: glaringlee fbshipit-source-id: 89dad5c61d8b5c26984d401551a1fe29df1ead04	2020-12-16 19:53:08 -08:00
Pritam Damania	db2ecefc01	[reland] Support torch.distributed.irecv(src=None, ...) (#49383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49383 Reland of https://github.com/pytorch/pytorch/pull/47137 ghstack-source-id: 118735407 Test Plan: waitforbuildbot Reviewed By: osalpekar Differential Revision: D25551910 fbshipit-source-id: 2e1f2f77e7c69204056dfe6ed178e8ad7650ab32	2020-12-16 19:39:23 -08:00
Rong Rong (AI Infra)	df2337097d	add files to SLOW_TESTS for target determinator (#49500 ) Summary: - test_torch was split into 6 in https://github.com/pytorch/pytorch/issues/47356. - also test_linalg has 10 slowtest marking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49500 Reviewed By: ezyang, malfet Differential Revision: D25598085 Pulled By: walterddr fbshipit-source-id: 74b0b433897721db86c00e236d1dd925d7a6d3d0	2020-12-16 19:10:56 -08:00
Vasiliy Kuznetsov	82ac6c75af	fx quant: make sure observer is inserted before a quantized output (#49420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49420 Before: if an output was marked as quantized, it could actually not be quantized, if the previous node was not quantized. After: if an output was marked as quantized, it will be quantized regardless of the quantization status of the previous node. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_quant_output_always_observed ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25566834 fbshipit-source-id: 84755a1605fd3847edd03a7887ab9f635498c05c	2020-12-16 18:53:37 -08:00
Vasiliy Kuznetsov	84506e0316	fx quant: fix fq when input is quantized and node does not need fq (#49382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49382 Fixes an edge case. If the input to the graph is quantized and the first node does not need activation observation, makes sure that the observer is not inserted. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_int8_input_no_unnecessary_fq ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25551041 fbshipit-source-id: a6cba235c63ca7f6856e4128af7c1dc7fa0085ea	2020-12-16 18:53:33 -08:00
Vasiliy Kuznetsov	7542076097	fx quant: do not insert observers at quantized inputs (#49239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49239 Context: the existing implementation of `quantized_input_idxs` is convert-only. Therefore, observers are inserted between the input and the first quantized node. This is a problem during QAT, because the initial input is a fake_quant, and it starts with scale=1 and zp=0. This does not match the quantization parameters of the graph input, which can lead to incorrect numerics. Fix: do not insert observer for a quantized input. Test Plan: ``` python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25499486 fbshipit-source-id: 303b49cc9d95a9fd06fef3b0859c08be34e19d8a	2020-12-16 18:53:30 -08:00
Vasiliy Kuznetsov	92df8706a0	fx quant: move {input\|output}_quantized_idxs cfg from convert to prepare (#49238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49238 Moves the `input_quantized_idxs` and `output_quantized_idxs` options from the convert config to the prepare config. This is done because these operations are related to placing observers, which is numerics changing during QAT. The next PR will adjust the behavior of `input_quantized_idxs` in prepare in QAT to prevent placing a fake_quant at the input if the input is marked quantized. Placing a fake_quant there can lead to numerical inaccuracies during calibration, as it would start with scale=1 and zp=0, which may be different from the quantization parameters of the incoming quantized input. Test Plan: ``` python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25498762 fbshipit-source-id: 17ace8f803542155652b310e5539e1882ebaadc6	2020-12-16 18:53:27 -08:00
Vasiliy Kuznetsov	36b20923ba	eager quant: remove fake_quant after add/mul nodes during QAT (#49213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49213 Changes behavior of Eager mode quantization to remove observation after add_scalar/mul_scalar. This is not used, and it removes one difference between Eager and FX modes. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat python test/test_quantization.py TestQuantizationAwareTraining.test_add_scalar_uses_input_qparams python test/test_quantization.py TestQuantizationAwareTraining.test_mul_scalar_uses_input_qparams ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25486276 fbshipit-source-id: 34a5d6ce0d08739319ec0f8b197cfc1309d71040	2020-12-16 18:50:11 -08:00
Elias Ellison	904586271b	Add fusion support of aten::to (#48976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48976 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25413164 Pulled By: eellison fbshipit-source-id: 0c31787e8b5e1368b0cba6e23660799b652389cd	2020-12-16 18:36:16 -08:00
Elias Ellison	80b508f207	[NNC] add support for masked_fill (#48974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48974 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25413165 Pulled By: eellison fbshipit-source-id: 8cece1dc3692389be90c0d77bd71b103254d5ad3	2020-12-16 18:36:13 -08:00
Elias Ellison	50386b9988	[NNC] Add Support For is_nan (#48973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48973 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25413166 Pulled By: eellison fbshipit-source-id: 0c79258345df18c60a862373fa16931228fb92ef	2020-12-16 18:31:01 -08:00
Stas Bekman	60b4c40101	[extensions] fix `is_ninja_available` during cuda extension building (#49443 ) Summary: tldr: current version of `is_ninja_available` of `torch/utils/cpp_extension.py` fails to run in the recent incarnations of pip w/ new build isolation feature which is now a default. This PR fixes this problem. The full story follows: -------------------------- Currently trying to build https://github.com/facebookresearch/fairscale/ which builds cuda extensions fails with the recent pip versions. The build is failing to perform `is_ninja_available`, which runs a simple subprocess to run `ninja --version` but does it with some /dev/null stream override which seems to break with the new pip versions. Currently I have `pip==20.3.3`. The recent pip performs build isolation which first fetches all dependencies to somewhere under /tmp/pip-install-xyz and then builds the package. If I build: ``` pip install fairscale --no-build-isolation ``` everything works. When building normally (i.e. without `--no-build-isolation`), the failure is a long long trace, <details> <summary>Full log</summary> <pre> pip install fairscale Collecting fairscale Downloading fairscale-0.1.1.tar.gz (83 kB) \|████████████████████████████████\| 83 kB 562 kB/s Installing build dependencies ... done Getting requirements to build wheel ... error ERROR: Command errored out with exit status 1: command: /home/stas/anaconda3/envs/main-38/bin/python /home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjvw00c7v cwd: /tmp/pip-install-1wq9f8fp/fairscale_347f218384a64f24b8d5ce846641213e Complete output (55 lines): running egg_info writing fairscale.egg-info/PKG-INFO writing dependency_links to fairscale.egg-info/dependency_links.txt writing requirements to fairscale.egg-info/requires.txt writing top-level names to fairscale.egg-info/top_level.txt Traceback (most recent call last): File "/home/stas/anaconda3/envs/main-38/bin/ninja", line 5, in <module> from ninja import ninja ModuleNotFoundError: No module named 'ninja' Traceback (most recent call last): File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in <module> main() File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main json_out['return_val'] = hook(hook_input['kwargs']) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 114, in get_requires_for_build_wheel return hook(config_settings) File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 149, in get_requires_for_build_wheel return self._get_build_requires( File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 130, in _get_build_requires self.run_setup() File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 145, in run_setup exec(compile(code, __file__, 'exec'), locals()) File "setup.py", line 56, in <module> setuptools.setup( File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup return distutils.core.setup(attrs) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/core.py", line 148, in setup dist.run_commands() File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 298, in run self.find_sources() File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 305, in find_sources mm.run() File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 536, in run self.add_defaults() File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 572, in add_defaults sdist.add_defaults(self) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/command/sdist.py", line 228, in add_defaults self._add_defaults_ext() File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/command/sdist.py", line 311, in _add_defaults_ext build_ext = self.get_finalized_command('build_ext') File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/cmd.py", line 298, in get_finalized_command cmd_obj = self.distribution.get_command_obj(command, create) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 858, in get_command_obj cmd_obj = self.command_obj[command] = klass(self) File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 351, in __init__ if not is_ninja_available(): File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1310, in is_ninja_available subprocess.check_call('ninja --version'.split(), stdout=devnull) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/subprocess.py", line 364, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['ninja', '--version']' returned non-zero exit status 1. ---------------------------------------- ERROR: Command errored out with exit status 1: /home/stas/anaconda3/envs/main-38/bin/python /home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjvw00c7v Check the logs for full command output. </pre> </details> and the middle of it is what we want: ``` File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 351, in __init__ if not is_ninja_available(): File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1310, in is_ninja_available subprocess.check_call('ninja --version'.split(), stdout=devnull) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/subprocess.py", line 364, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['ninja', '--version']' returned non-zero exit status 1. ``` For some reason pytorch fails to run this simple code: ``` # torch/utils/cpp_extension.py def is_ninja_available(): r''' Returns ``True`` if the `ninja <https://ninja-build.org/>`_ build system is available on the system, ``False`` otherwise. ''' with open(os.devnull, 'wb') as devnull: try: subprocess.check_call('ninja --version'.split(), stdout=devnull) except OSError: return False else: return True ``` I suspect that pip does something to `os.devnull` and that's why it fails. This PR proposes a simpler code which doesn't rely on anything but `subprocess.check_output`: ``` def is_ninja_available(): r''' Returns ``True`` if the `ninja <https://ninja-build.org/>`_ build system is available on the system, ``False`` otherwise. ''' try: subprocess.check_output('ninja --version'.split()) except Exception: return False else: return True ``` which doesn't use `os.devnull` and performs the same function. There could be a whole bunch of different exceptions there I think, so I went for the generic one - we don't care why it failed, since this function's only purpose is to suggest whether ninja can be used or not. Let's check ``` python -c "import torch.utils.cpp_extension; print(torch.utils.cpp_extension.is_ninja_available())" True ``` Look ma - no std noise to take care of. (i.e. no need for /dev/null). I was editing the installed environment-wide `cpp_extension.py` file directly, so didn't need to tweak `PYTHONPATH` - I made sure to replace `'ninja --version'.` with something that should fail and I did get `False` for the above command line. I next did a somewhat elaborate cheat to re-package an already existing binary wheel with this corrected version of `cpp_extension.py`, rather than building from source: ``` mkdir /tmp/pytorch-local-channel cd /tmp/pytorch-local-channel # get the latest nightly wheel wget https://download.pytorch.org/whl/nightly/cu110/torch-1.8.0.dev20201215%2Bcu110-cp38-cp38-linux_x86_64.whl # unpack it unzip torch-1.8.0.dev20201215+cu110-cp38-cp38-linux_x86_64.whl # edit torch/utils/cpp_extension.py to fix the python code with the new version as in this PR emacs torch/utils/cpp_extension.py & # pack the files back zip -r torch-1.8.0.dev20201215+cu110-cp38-cp38-linux_x86_64.whl caffe2 torch torch-1.8.0.dev20201215+cu110.dist-info ``` Now I tell pip to use my local channel, plus `--pre` for it to pick up the pre-release as an acceptable wheel ``` # install using this local channel git clone https://github.com/facebookresearch/fairscale/ cd fairscale pip install -v --disable-pip-version-check -e . -f file:///tmp/pytorch-local-channel --pre ``` and voila all works. ``` [...] Successfully installed fairscale ``` I noticed a whole bunch of ninja not found errors in the log, which I think is the same problem with other parts of the build system packages which also use this old check copied all over various projects and build tools, and which the recent pip breaks. ``` writing manifest file '/tmp/pip-modern-metadata-_nsdesbq/fairscale.egg-info/SOURCES.txt' Traceback (most recent call last): File "/home/stas/anaconda3/envs/main-38/bin/ninja", line 5, in <module> from ninja import ninja ModuleNotFoundError: No module named 'ninja' [...] /tmp/pip-build-env-fqflyevr/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py:364: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) ``` but these don't prevent from the build completing and installing. I suppose these need to be identified and reported to various other projects, but that's another story. The new pip does something to `os.devnull` I think which breaks any code relying on it - I haven't tried to figure out what happens to that stream object, but this PR which removes its usage solves the problem. Also do notice that: ``` git clone https://github.com/facebookresearch/fairscale/ cd fairscale python setup.py bdist_wheel pip install dist/fairscale-0.1.1-cp38-cp38-linux_x86_64.whl ``` works too. So it is really a pip issue. Apologies if the notes are too many, I tried to give the complete picture and probably other projects will need those details as well. Thank you for reading. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49443 Reviewed By: mruberry Differential Revision: D25592109 Pulled By: ezyang fbshipit-source-id: bfce4420c28b614ead48e9686f4153c6e0fbe8b7	2020-12-16 18:02:11 -08:00
Gao, Xiang	d409da0677	Fix CUDA extension ninja build (#49344 ) Summary: I am submitting this PR on behalf of Janne Hellsten(nurpax) from NVIDIA, for the convenience of CLA. Thanks Janne a lot for the contribution! Currently, the ninja build decides whether to rebuild a .cu file or not pretty randomly. And there are actually two issues: First, the arch list in the building command is ordered randomly. When the order changes, it will unconditionally rebuild regardless of the timestamp. Second, the header files are not included in the dependency list, so if the header file changes, it is possible that ninja will not rebuild. This PR fixes both issues. The fix for the second issue requires nvcc >= 10.2. nvcc < 10.2 can still build CUDA extension as it used to be, but it will be unable to see the changes in header files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49344 Reviewed By: glaringlee Differential Revision: D25540157 Pulled By: ezyang fbshipit-source-id: 197541690d7f25e3ac5ebe3188beb1f131a4c51f	2020-12-16 17:45:12 -08:00
Ailing Zhang	1c6e179b38	Relax the atol/rtol of layernorm math kernel test. (#49507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49507 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25598424 Pulled By: ailzhang fbshipit-source-id: b3f43e84f177cf7c14831b0b83a399b155c813c4	2020-12-16 17:37:51 -08:00
kiyosora	c675727adf	Fix bad error message when int overflow (#48250 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48114 Before: ``` >>> torch.empty(2 * 10 ** 20) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: empty(): argument 'size' must be tuple of ints, but found element of type int at pos 1 ``` After fix: ``` >>> torch.empty(2 * 10 ** 20) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: Overflow when unpacking long ``` Unclear whether we need a separate test for this case, I can add one if it's necessary... Pull Request resolved: https://github.com/pytorch/pytorch/pull/48250 Reviewed By: linbinyu Differential Revision: D25105217 Pulled By: ezyang fbshipit-source-id: a5aa7c0266945c8125210a2fd34ce4b6ba940c92	2020-12-16 17:30:45 -08:00
Eli Uriegas	a5cc0a6f4c	.circleci: Only downgrade if we have conda (#49519 ) Summary: Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/49519 Reviewed By: robieta Differential Revision: D25603779 Pulled By: seemethere fbshipit-source-id: ca8d811925762a5a413ca906d94c974a4ac5b132	2020-12-16 17:14:17 -08:00
Sebastian Messmer	872f6486b1	Prevent accidentally writing old style ops (#49510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49510 Adding old style operators with out arguments will break XLA. This prevents that. See for background: https://fb.workplace.com/groups/pytorch.dev/permalink/809934446251704/ This is a temporary change that will prevent this breakage for the next couple of days until the problem is resolved for good. It will be deleted in https://github.com/pytorch/pytorch/pull/49164 then. ghstack-source-id: 118756437 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D25599112 fbshipit-source-id: 6b0ca4da4b55da8aab9d1b332cd9f68e7602301e	2020-12-16 16:34:49 -08:00
Elias Ellison	9056173acc	[NNC] Dont inline outputs buffers on cpu (#49488 ) Summary: In https://github.com/pytorch/pytorch/pull/48967/ we enabled output buffer inlining, which results in duplicate computation if one output depends on another. This was done to fix correctness for CUDA, but is not needed for correctness for CPU and results in perf slowdown. The output buffer inlining solution for CUDA is intended to be an interim solution because it does not work with reductions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49488 Reviewed By: ezyang Differential Revision: D25596071 Pulled By: eellison fbshipit-source-id: bc3d987645da5ce3c603b4abac3586b169656cfd	2020-12-16 16:28:25 -08:00
Mike Ruberry	47c65f8223	Revert D25569586: stft: Change require_complex warning to an error Test Plan: revert-hammer Differential Revision: D25569586 (`5874925b46`) Original commit changeset: 09608088f540 fbshipit-source-id: 6a5953b327a4a2465b046e29bb007a0c5f4cf14a	2020-12-16 16:21:52 -08:00
Edward Yang	3efd5d8f01	Introduce tools.codegen.api.translate (#49122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49122 cpparguments_exprs has induced a lot of head scratching in many recent PRs for how to structure the code in a good way. This PR eliminates the old algorithm for an entirely new algorithm inspired by logic programming. The net result is shorter, cleaner and should be more robust to future changes. This PR is a bit of a whopper. Here is the order to review it. - tools/codegen/api/types.py - Deleted CppArgument, CppArgumentPackIface (and subclasses), CppExpr, DispatcherExpr, DispatcherArgument, NativeExpr, NativeArgument, MetaArgument. All things previously called XArgument are now Binding. All things previously called XExpr are now Expr. I deleted the `__str__` implementation on Binding and fixed all call sites not to use it. On Binding, I renamed `str_no_default` and `str_default` to `defn` and `decl` for better symmetry with the corresponding signature concepts, although I'm open to naming them back to their original versions. - Obviously, things are less type safe without the class distinctions. So I introduce a new ADT called CType. CType represents the semantic C++ type of a binding: it is both the C++ type (e.g., `const Tensor&`) as well as the argument name that specifies what the binding denotes (e.g., `other`). Every binding now records its CType. The key observation here is that you don't actually care if a given expression is from the cpp or dispatcher or native API; what you care is having enough information to know what the expression means, so you can use it appropriately. CType has this information. For the most part, ArgNames are just the string names of the arguments as you see them in JIT schema, but there is one case (`possibly_redundant_memory_format`) where we encode a little extra information. Unlike the plain strings we previously used to represent C++ types, CType have a little bit of structure around optional and references, because the translation code needs to work around these concepts. - I took the opportunity to kill all of the private fields like `_arguments` and `_returns_type` (since the argument types don't make sense anymore). Everything is computed for you on the fly. If this is a perf problem in codegen we can start using `cached_property` decorator. - All of the heavy lifting in CppSignature.argument_packs has been moved to the cpp module. We'll head over there next. Similarly, all of the exprs methods are now calling translate, the new functionality which we haven't gotten to yet - tools/codegen/api/cpp.py - We refactor all of the type computation functions to return CType instead of str. Because CTypes need to know the denotation, there is a new `binds: ArgName` argument to most functions that provides the denotation, so we can slot it in. (An alternative would have been to construct CTypes without denotations and then fill them in post-facto, but I didn't do it this way. One downside is there are some places where I need a CType without denotation, so I fill these in with `__placeholder__` whenever this happens). - `argument` and `arguments` are now extremely simple. There is no more Pack business, just produce one or more Bindings. The one thing of note is that when both a `memory_format` and `options` are in scope, we label the memory format as `possibly_redundant_memory_format`. This will be used in translation - tools/codegen/api/dispatcher.py and tools/codegen/api/native.py - same deal as cpp.py. One thing is that `cpparguments_exprs` is deleted; that is in the translator - tools/codegen/api/translate.py - the translator! It uses a very simple backwards deduction engine to work out how to fill in the arguments of functions. There are comments in the file that explain how it works. - Everything else: just some small call site tweaks for places when I changed API. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25455887 Pulled By: ezyang fbshipit-source-id: 90dc58d420d4cc49281aa8647987c69f3ed42fa6	2020-12-16 16:18:40 -08:00
XiaobingSuper	f66147ebca	BFloat16: add explicit dtype support for to_mkldnn and to_dense (#48881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48881 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25537190 Pulled By: VitalyFedyunin fbshipit-source-id: a61a433c638e2e95576f88f081b64ff171b2316e	2020-12-16 16:09:42 -08:00
Scott Wolchok	6f928a4a53	[PyTorch] Make tls_local_dispatch_key_set inlineable (reapply) (#49412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49412 FLAGS_disable_variable_dispatch had to go, but it looks like the only user was some benchmarks anyway. ghstack-source-id: 118669590 Test Plan: Small (order of 0.1% improvement) on Internal benchmarks. Wait for GitHub CI since this was reverted before due to CI break Reviewed By: ezyang Differential Revision: D25547962 fbshipit-source-id: 58424b1da230fdc5d27349af762126a5512fce43	2020-12-16 16:04:35 -08:00
Scott Wolchok	953f9922ec	[PyTorch] Use .sizes() isntead of .size() in cat_serial_kernel_impl (#49371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49371 As with previous diff, .sizes() is strictly more efficient. ghstack-source-id: 118627223 Test Plan: internal benchmark Differential Revision: D25546409 fbshipit-source-id: 196034716b6e11efda1ec8cb1e0fce7732d73eb4	2020-12-16 16:04:32 -08:00
Scott Wolchok	c1879b573e	[PyTorch] Use .sizes() instead of .size() in _cat_out_cpu (#49368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49368 The former is faster because it doesn't allow negative indexing (which we don't use). ghstack-source-id: 118624598 Test Plan: internal benchmark Reviewed By: hlu1 Differential Revision: D25545777 fbshipit-source-id: b2714fac95c801fd735fac25b238b4a79b012993	2020-12-16 16:04:29 -08:00
Scott Wolchok	1a0510463a	[PyTorch] Avoid extra Tensor refcounting in _cat_out_cpu (#49364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49364 We had a local `Tensor` when we only needed a `const Tensor&`. ghstack-source-id: 118624595 Test Plan: Internal benchmark. Reviewed By: hlu1 Differential Revision: D25544731 fbshipit-source-id: 7b9656d0371ab65a6313cb0ad4aa1df707884c1c	2020-12-16 16:04:26 -08:00
Scott Wolchok	9ce1df079f	[PyTorch] Merge CoinflipTLS into RecordFunctionTLS (#49359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49359 This should be both slightly more efficient (1 less TLS guard check in at::shouldRunRecordFunction) and definitely more correct (CoinflipTLS is now saved whenever RecordFunctionTLS is saved), fixing a bad merge that left RecordFunctionTLS::tries_left dead. ghstack-source-id: 118624402 Test Plan: Review, CI Reviewed By: hlu1 Differential Revision: D25542799 fbshipit-source-id: 310f9fd157101f659cea13c331b2a0ee6db2db88	2020-12-16 16:00:49 -08:00
Frank Seide	6bde0ca6d3	T66557700 Support default argument values of a method (#48863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48863 Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`). Test Plan: buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg Reviewed By: raziel, iseeyuan Differential Revision: D25152559 fbshipit-source-id: bbf52f1fbdbfbc6f8fa8b65ab524b1cd4648f9c0	2020-12-16 15:55:03 -08:00
Nikita Shulga	d0fb55454b	Refine `ConvParams::use_nnpack()` (#49464 ) Summary: NNPACK convolution algorithms can only be used for kernels up to 16x16 Fixes https://github.com/pytorch/pytorch/issues/49462 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49464 Reviewed By: xuzhao9 Differential Revision: D25587879 Pulled By: malfet fbshipit-source-id: 658197f23c08cab97f0849213ecee3f91f96c932	2020-12-16 15:42:43 -08:00
Jeffrey Wan	399b07a8f9	Add note to torch docs for sinh/cosh (#49413 ) Summary: Address https://github.com/pytorch/pytorch/issues/48641 Documents the behavior of sinh and cosh in the edge cases ``` >>> b = torch.full((15,), 89, dtype=torch.float32) >>> torch.sinh(b) tensor([2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38]) >>> b = torch.full((16,), 89, dtype=torch.float32) >>> torch.sinh(b) tensor([inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]) >>> b = torch.full((17,), 89, dtype=torch.float32) >>> torch.sinh(b) tensor([ inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, 2.2448e+38]) >>> b = torch.full((32,), 89, dtype=torch.float32)[::2] >>> torch.sinh(b) tensor([2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38]) ``` See https://sleef.org/purec.xhtml Pull Request resolved: https://github.com/pytorch/pytorch/pull/49413 Reviewed By: ezyang Differential Revision: D25587932 Pulled By: soulitzer fbshipit-source-id: 6db75c45786f4b95f82459d0ce5efa37ec0774f0	2020-12-16 14:51:08 -08:00
Rohan Varma	f0217e2f52	Fix link in distributed contributing doc and add link (#49141 ) Summary: One of the links for ramp up tasks wasn't showing any results and the other was only RPC results. Instead of this, I just changed it to one link that has `pt_distributed_rampup` which seems reasonable as the developer will be able to see both RPC and distributed tasks. Also added test command for DDP tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49141 Reviewed By: ezyang Differential Revision: D25597560 Pulled By: rohan-varma fbshipit-source-id: 85d7d2964a19ea69fe149c017cf88dff835b164a	2020-12-16 14:38:56 -08:00
Mike Ruberry	676bfa6dbd	Revert D25507480: [quant][docs] Add fx graph mode quantization to quantization docs Test Plan: revert-hammer Differential Revision: D25507480 (`7729581414`) Original commit changeset: 9e9e4b5fef97 fbshipit-source-id: fdb08d824209b97defaba2e207d1a914575a6ae7	2020-12-16 14:26:18 -08:00
Daniil Osokin	09173ae65e	Allow zero annealing epochs (#47579 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47578. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47579 Reviewed By: H-Huang Differential Revision: D25429403 Pulled By: vincentqb fbshipit-source-id: c42fbcd71b46e07c672a1e9661468848ac16de38	2020-12-16 14:09:43 -08:00
Sebastian Messmer	4431731c68	Making ops c10-full: Storage arguments (#49146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49146 Add support for Storage arguments to IValue and the JIT typing system, and make ops that were blocked on that c10-full. ghstack-source-id: 118710665 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25456799 fbshipit-source-id: da14f125af352de5fcf05a83a69ad5a69d5a3b45	2020-12-16 14:00:34 -08:00
Jeffrey Wan	7767dcfc8d	Revert D25564477: [pytorch][PR] Add sinc operator Test Plan: revert-hammer Differential Revision: D25564477 (`bbc71435b7`) Original commit changeset: 13f36a2b84da fbshipit-source-id: 58cbe8109efaf499dd017531878b9fbbb27976bc	2020-12-16 13:19:16 -08:00
Peter Bell	5874925b46	stft: Change require_complex warning to an error (#49022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49022 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25569586 Pulled By: mruberry fbshipit-source-id: 09608088f540c2c3fc70465f6a23f2aec5f24f85	2020-12-16 12:47:56 -08:00
Jerry Zhang	7729581414	[quant][docs] Add fx graph mode quantization to quantization docs (#49211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49211 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D25507480 fbshipit-source-id: 9e9e4b5fef979f5621c1bbd1b49e9cc6830da617	2020-12-16 12:40:02 -08:00
Ivan Yashchuk	9955355853	Updated derivative rules for complex svd and pinverse (#47761 ) Summary: Updated `svd_backward` to work correctly for complex-valued inputs. Updated `common_methods_invocations.py` to take dtype, device arguments for input construction. Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`. Added `svd` and `pinverse` to list of complex tests. References for complex-valued SVD differentiation: - https://giggleliu.github.io/2019/04/02/einsumbp.html - https://arxiv.org/abs/1909.02659 The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant. https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/ The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl). Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47761 Reviewed By: izdeby Differential Revision: D25574962 Pulled By: mruberry fbshipit-source-id: 832b61303e883ad3a451b84850ccf0f36763a6f6	2020-12-16 12:32:22 -08:00
Ralf Gommers	39a23c797b	Add docs/README.md to make existing doc build info more discoverable (#49286 ) Summary: Closes gh-42003 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49286 Reviewed By: glaringlee Differential Revision: D25535250 Pulled By: ezyang fbshipit-source-id: a7790bfe4528fa6a31698126cc687793fdf7ac3f	2020-12-16 11:55:45 -08:00
Luca Wehrstedt	6f814d45aa	Update TensorPipe submodule (#49467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49467 Credit to beauby for the Bazel fixes. Test Plan: Export and run on CI Reviewed By: beauby Differential Revision: D25588027 fbshipit-source-id: efe1c543eb7438ca05254de67cf8b5cee625119a	2020-12-16 11:33:17 -08:00
Richard Zou	2ec3e803eb	Update accumulate_grad to support vmap (#49119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49119 I don't know how the accumulate_grad code gets hit via calling autograd.grad, so I went through all places in accumulate_grad that are definitely impossible to vmap through and changed them. To support this: - I added vmap support for Tensor::strides(). It returns the strides that correspond to the public dimensions of the tensor (not the ones being vmapped over). - Changed an instance of empty_strided to new_empty_strided. - Replaced an in-place operation in accumulate_grad.h Test Plan: - added a test for calling strides() inside of vmap - added tests that exercise all of the accumulate_grad code path. NB: I don't know why these tests exercise the code paths, but I've verified that they do via gdb. Suggestions for some saner test cases are very welcome. Reviewed By: izdeby Differential Revision: D25563543 Pulled By: zou3519 fbshipit-source-id: 05ac6c549ebd447416e6a07c263a16c90b2ef510	2020-12-16 11:30:16 -08:00
Richard Zou	f98d8c6237	Move inplace_is_vmap_compatible to BatchedTensorImpl.h (#49118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49118 I need this in the next stack up. It seems useful to have as a helper function. Test Plan: - run tests Reviewed By: izdeby Differential Revision: D25563546 Pulled By: zou3519 fbshipit-source-id: a4031fdc4b2373cc230ba3c66738d91dcade96e2	2020-12-16 11:30:13 -08:00
Igor Gitman	1b6d18aa7c	Adding support for CuDNN-based LSTM with projections (#47725 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46213 I didn't yet update the documentation, will add those change soon. A few other things that I didn't do, but want to clarify if I maybe should. 1. I didn't expose projections in c++ API: torch/csrc/api/src/nn/modules/rnn.cpp. Let me know if this is desirable and I will add those changes. 2. I didn't expose projections in "lstm_cell" function and "_thnn_differentiable_lstm_cell_backward" functions from aten/src/ATen/native/RNN.cpp. As far as I understand, they are not needed for nn.LSTM CPU execution. For lstm_cell, projections don't bring any real benefit, since if cell is used separately, it can be easily added in Python. For "_thnn_differentiable_lstm_cell_backward", I'm actually not sure where exactly that function is used, so I also disabled projections there for now. Please let me know if I should change that. 3. I added check that projections are not supported for quantized LSTMs to quantized_lstm_<data/input> functions. But I didn't add any checks to LSTMCell code. It seems that since I disabled projections in "lstm_cell" function, they should also not be available for quantized models through any other API than quantized_lstm_<data/input>. Please let me know if I'm not correct and I will add checks to other places. 4. Projections are not supported for CuDNN versions < 7.1.2. Should I add the check for CuDNN version and disable projections in that case? If so, what will be the best way to do that? 5. Currently I added projection weight as the last weight, so the layout is "w_ih, w_hh, b_ih, b_hh, w_hr". This breaks the assumption that biases come after weights and thus I had to add additional if-s in various places. Alternative way would be to have "w_ih, w_hh, w_hr, b_ih, b_hh" layout, in which case the assumption will be true. But in that case I will need to split the loop in get_parameters function from aten/src/ATen/native/cudnn/RNN.cpp. And in some cases, I will still need to add an "undefined" tensor in the 3rd position, because we get all 5 weights from CuDNN most of the time. So I'm not sure which way is better. Let me know if you think I should change to the weights-then-biases layout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47725 Reviewed By: zou3519 Differential Revision: D25449794 Pulled By: ngimel fbshipit-source-id: fe6ce59e481d1f5fd861a8ff7fa13d1affcedb0c	2020-12-16 11:27:02 -08:00
Gao, Xiang	48d1ad1ada	Reland "Add test for empty tensors for batch matmuls" (#48797 ) Summary: This reverts commit c7746adbc6e6ace9d4c2b54e32c8d36a7b7b0e31. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/48797 Reviewed By: mruberry Differential Revision: D25575264 Pulled By: ngimel fbshipit-source-id: c7f3b384db833d727bb5bd8a51f1493a13016d09	2020-12-16 11:19:27 -08:00
Natalia Gimelshein	afce5890ff	Revert D25421263: [pytorch][PR] [numpy] torch.{all/any} : output dtype is always bool Test Plan: revert-hammer Differential Revision: D25421263 (`c508e5b1bf`) Original commit changeset: c6c681ef9400 fbshipit-source-id: 4c0c9acf42b06a3ed0af8f757ea4512ca35b6c59	2020-12-16 11:11:13 -08:00
James Donald	d7659be58d	[caffe2][autograd] Avoid extensive -Wunused-variable warnings on _any_requires_grad (#49167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49167 Building with clang and a fair warning level can result in hundreds of lines of compiler output of the form: ``` caffe2\gen_aten_libtorch\autograd\generated\VariableType_1.cpp(2279,8): warning: unused variable '_any_requires_grad' [-Wunused-variable] auto _any_requires_grad = compute_requires_grad( self ); ^ caffe2\gen_aten_libtorch\autograd\generated\VariableType_1.cpp(2461,8): warning: unused variable '_any_requires_grad' [-Wunused-variable] auto _any_requires_grad = compute_requires_grad( grad_output, self ); ^ caffe2\gen_aten_libtorch\autograd\generated\VariableType_1.cpp(2677,8): warning: unused variable '_any_requires_grad' [-Wunused-variable] auto _any_requires_grad = compute_requires_grad( self ); ^ ... ``` This happens when requires_derivative == False. Let's mark `_any_requires_grad` as potentially unused. If this were C++17 we would use `[[maybe_unused]]` but to retain compatibility with C++11 we just mark it with `(void)`. Test Plan: CI + locally built Reviewed By: ezyang Differential Revision: D25421548 fbshipit-source-id: c56279a184b1c616e8717a19ee8fad60f36f37d1	2020-12-16 10:38:11 -08:00
Heitor Schueroff	45b33c83f1	Revert "Revert D24923679: Fixed einsum compatibility/performance issues (#46398 )" (#49189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49189 This reverts commit d307601365c3b848072b8b8381208aedc1a0aca5 and fixes the bug with diagonals and ellipsis combined. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D25540722 Pulled By: heitorschueroff fbshipit-source-id: 86d0c9a7dcfda600b546457dad102af2ff33e353	2020-12-16 10:38:07 -08:00
Jeffrey Wan	bbc71435b7	Add sinc operator (#48740 ) Summary: Implements the sinc operator. See https://numpy.org/doc/stable/reference/generated/numpy.sinc.html ![image](https://user-images.githubusercontent.com/13428986/101653855-cdffa080-3a0d-11eb-8426-ecc81c152ebd.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48740 Reviewed By: izdeby Differential Revision: D25564477 Pulled By: soulitzer fbshipit-source-id: 13f36a2b84dadfb4fd1442a2a40a3a3246cbaecb	2020-12-16 10:33:02 -08:00
Omkar Salpekar	09c741868c	[c10d Store] Store Python Docs Fixes (#49130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49130 The Python Store API docs had some typos, where boolean value were lower case, which is incorrect Python syntax. This diff fixes those typos. Test Plan: Built and Rendered Docs Reviewed By: mrshenli Differential Revision: D25411492 fbshipit-source-id: fdbf1e6b8f81e9589e638286946cad68eb7c9252	2020-12-16 10:29:09 -08:00
Omkar Salpekar	4b3f05a471	[Docs] Updating init_process_group docs to indicate correct rank range (#49131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49131 Users frequently assume the correct range of ranks is 1 ... `world_size`. This PR udpates the docs to indicate that the correct rank range users should specify is 0 ... `world_size` - 1. Test Plan: Rendering and Building Docs Reviewed By: mrshenli Differential Revision: D25410532 fbshipit-source-id: fe0f17a4369b533dc98543204a38b8558e68497a	2020-12-16 10:26:04 -08:00
Eli Uriegas	c52f1dc365	.circleci: downgrade conda-package-handling to 1.6.0 (#49434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49434 There was a bug that was introduced in conda-package-handling >= 1.6.1 that makes archives above a certain size fail out when attempting to extract see: https://github.com/conda/conda-package-handling/issues/71 coincides with https://github.com/pytorch/builder/pull/611 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: xuzhao9, janeyx99, samestep Differential Revision: D25573390 Pulled By: seemethere fbshipit-source-id: 82173804f1b30da6e4b401c4949e2ee52065e149	2020-12-16 10:17:47 -08:00
Martin Yuan	f2ee8c6241	Instantiate PackedConvWeight to avoid linking error (#49442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49442 When moving Aten/native to app level, symbols from native/quantized may sit in a target away from some of its call sites. As a result, there are linking errors of missing symbols of instantiations of PackedConvWeight::prepack. The solution is to instantiate PackedConvWeight in the same compilation unit. It's similar to D24941989 (`fe6bb2d287`). ghstack-source-id: 118676374 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D25576703 fbshipit-source-id: d6e3d11d51d8172ab8487ce44ec8c042889f0f11	2020-12-16 10:09:09 -08:00
Xiang Gao	86902f84bf	CUDA BFloat embedding (#44848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44848 Reviewed By: izdeby Differential Revision: D25574204 Pulled By: ngimel fbshipit-source-id: b35f7253a6ad2b83f7b6b06862a5ab77295373e0	2020-12-16 09:24:46 -08:00
lixinyu	001ff3acf6	webdataset prototype - LoadFilesFromDiskIterableDataset (#48955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48955 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D25541393 Pulled By: glaringlee fbshipit-source-id: dea6ad64a7ba40abe45612d99f078b14d1da8bbf	2020-12-16 08:39:17 -08:00
lixinyu	6786b2b966	webdataset prototype - ListDirFilesIterableDataset (#48944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48944 This is a stack PR for webdataset prototype. I am trying to make each stack a separate dataset. To make the implementation simple, each dataset will only support the basic functionality. - [x] ListDirFilesDataset - [x] LoadFilesFromDiskIterableDataset - [x] ReadFilesFromTarIterableDataset - [x] ReadFilesFromZipIterableDataset - [x] RoutedDecoderIterableDataset Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D25541277 Pulled By: glaringlee fbshipit-source-id: 9e738f6973493f6be1d5cc1feb7a91513fa5807c	2020-12-16 08:34:20 -08:00
ivannz	efc090652e	Enhanced generators with grad-mode decorators (#49017 ) Summary: This PR addresses the feature request outlined in https://github.com/pytorch/pytorch/issues/48713 for two-way communication with enhanced generators from [pep-342](https://www.python.org/dev/peps/pep-0342/). Briefly, the logic of the patch resembles `yield from` [pep-380](https://www.python.org/dev/peps/pep-0380/), which cannot be used, since the generator must be interacted with from within the grad-mode context, while yields from the decorator must take place outside of the context. Hence any interaction with the wrapped generator, be it via [.send](https://docs.python.org/3/reference/expressions.html?highlight=throw#generator.send), [.throw](https://docs.python.org/3/reference/expressions.html?highlight=throw#generator.throw), and even [.close](https://docs.python.org/3/reference/expressions.html?highlight=throw#generator.close) must be wrapped by a `with` clause. The patch is compatible with `for i in gen: pass` and `next(gen)` use cases and allows two-way communication with the generator via `.send <-> yield` points. ### Logic At lines [L37-L38](`2d40296c0c/torch/autograd/grad_mode.py (L37-L38)`) we (the decorator) start the wrapped generator (coroutine) by issuing `None` into it (equivalently, we can use `next(get)` here). Then we dispatch responses of the generator to our ultimate caller and relay the latter's requests into the generator in the loop on lines [L39-L52](`2d40296c0c/torch/autograd/grad_mode.py (L39-L52)`). We yield the most recent response on [L40-L41](`2d40296c0c/torch/autograd/grad_mode.py (L40-L41)`), at which point we become paused, waiting for the next ultimate caller's interaction with us. If the caller sends us a request, then we become unpaused and move to [L51-L52](`2d40296c0c/torch/autograd/grad_mode.py (L51-L52)`) and forward it into the generator, at which point we pause, waiting for its response. The response might be a value, an exception or a `StopIteration`. In the case of an exception from the generator, we let it bubble up from the immediately surrounding [except clause](https://docs.python.org/3/reference/compound_stmts.html#the-try-statement) to the ultimate caller through the [outer try-except](`2dc287bba8/torch/autograd/grad_mode.py (L36-L54)`). In the case of a `StopIteration`, we take it's payload and propagate it to the caller via [return](`2d40296c0c/torch/autograd/grad_mode.py (L54)`). In the case of a value, the flow and the loop continues. The caller throwing an exception at us is handled much like a proper request, except for the exception playing the role of the request. In this case we forward it into the generator on lines [L47-L49](`2d40296c0c/torch/autograd/grad_mode.py (L47-L49)`) and await its response. We explicitly advance the traceback one frame up, in order to indicate the source of the exception within the generator. Finally the `GeneratorExit` is handled on lines [L42-L45](`2d40296c0c/torch/autograd/grad_mode.py (L42-L45)`) and closes the generator. Updates: clarified exception propagation Pull Request resolved: https://github.com/pytorch/pytorch/pull/49017 Reviewed By: izdeby Differential Revision: D25567796 Pulled By: albanD fbshipit-source-id: 801577cccfcb2b5e13a08e77faf407881343b7b0	2020-12-16 07:15:33 -08:00
Scott Wolchok	76d09ec33e	[PyTorch] Avoid move-constructing a List in listConstruct (#49355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49355 List's move ctor is a little bit more expensive than you might expect, but we can easily avoid it. ghstack-source-id: 118624596 Test Plan: Roughly 1% improvement on internal benchmark. Reviewed By: hlu1 Differential Revision: D25542190 fbshipit-source-id: 08532642c7d1f1604e16c8ebefd1ed3e56f7c919	2020-12-16 07:07:12 -08:00
Sebastian Messmer	ec8e9d31cf	Making ops c10-full: optional lists (#49088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49088 We had special case logic to support `int[]?` and `double[]?` but nothing for `DimnameList[]?`. This PR generalizes the logic to support optional lists so it should now work with all types. It also enables c10-fullness for ops that were blocked by this. Note that using these arguments in a signature was always and still is expensive because the whole list needs to be copied. We should probably consider alternatives in the future like for example using `torch::List` instead of `ArrayRef`, that could work without copying the list. ghstack-source-id: 118660071 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25423901 fbshipit-source-id: dec58dc29f3bb4cbd89e2b95c42da204a9da2e0a	2020-12-16 02:55:11 -08:00
Sebastian Messmer	d69d42db78	Making ops c10 full: optional out arguments (#49083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49083 We have some (but very few) ops that take optional out arguments `Tensor(a!)? out`. This PR makes them non-optional mandatory arguments and enables c10-fullness for them. There is only a very small number of ops affected by this. Putting this up for discussion. Alternatives considered: If we keep them optional, we run into lots of issues in the dispatcher. We have to decide what the dispatcher calling convention for this argument type should be. 1) If we keep passing them in as `Tensor&` arguments and return them as `tuple<Tensor&, Tensor&, Tensor&>`, so basically same as currently, then the schema inference check will say "Your kernel function got inferred to have a `Tensor` argument but your native_functions.yaml declaration says `Tensor?`. This is a mismatch, you made an error". We could potentially disable that check, but that would open the door for real mistakes to not be reported anymore in the future. This sounds bad. 2) If we change them to a type that schema inference could differentiate from `Tensor`, say we pass them in as `const optional<Tensor>&` and return them as `tuple<const optional<Tensor>&, const optional<Tensor>&, const optional<Tensor>&>`, then our boxing logic fails because it can't recognize those as out overloads anymore and shortcut the return value as it is doing right now. We might be able to rewrite the boxing logic, but that could be difficult and could easily develop into a rabbit hole of having to clean up `Tensor&` references throughout the system where we use them. Furthermore, having optional out arguments in C++ doesn't really make sense. the C++ API puts them to the front of the argument list, so you can't omit them anyways when calling an op. You would be able to omit them when calling from Python with out kwargs, but not sure if we want that discrepancy between the c++ and python API. ghstack-source-id: 118660075 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25422197 fbshipit-source-id: 3cb25c5a3d93f9eb960d70ca014bae485be9f058	2020-12-16 02:53:42 -08:00
Hao Lu	306bab220e	Revert D25554109: [StaticRuntime][ATen] Add out variant for narrow_copy Test Plan: revert-hammer Differential Revision: D25554109 (`ed04b71651`) Original commit changeset: 6bae62e6ce34 fbshipit-source-id: bfa038e150166d0116bcae8f7a6415d98d4146de	2020-12-16 02:44:45 -08:00
Hao Lu	ed04b71651	[StaticRuntime][ATen] Add out variant for narrow_copy (#49449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49449 Similar to permute_out, add the out variant of `aten::narrow` (slice in c2) which does an actual copy. `aten::narrow` creates a view, however, an copy is incurred when we call `input.contiguous` in the ops that follow `aten::narrow`, in `concat_add_mul_replacenan_clip`, `casted_batch_one_hot_lengths`, and `batch_box_cox`. {F351263599} Test Plan: Unit test: ``` buck test //caffe2/aten:native_test ``` Benchmark with the adindexer model: ``` bs = 1 is neutral Before: I1214 21:32:51.919239 3285258 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0886948. Iters per second: 11274.6 After: I1214 21:32:52.492352 3285277 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0888019. Iters per second: 11261 bs = 20 shows more gains probably because the tensors are bigger and therefore the cost of copying is higher Before: I1214 21:20:19.702445 3227229 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.527563. Iters per second: 1895.51 After: I1214 21:20:20.370173 3227307 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.508734. Iters per second: 1965.67 ``` Reviewed By: bwasti Differential Revision: D25554109 fbshipit-source-id: 6bae62e6ce3456ff71559b635cc012fdcd1fdd0e	2020-12-16 01:47:46 -08:00
Rohan Varma	40d7c1091f	Unescape string in RPC error message (#49373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49373 Unescaping the string in RPC error message to provide better error msg Test Plan: CI Reviewed By: xush6528 Differential Revision: D25511730 fbshipit-source-id: 054f46d5ffbcb1350012362a023fafb1fe57fca1	2020-12-16 01:40:31 -08:00
Vasiliy Kuznetsov	a9137aeb06	quantized tensor: add preliminary support for advanced indexing, try 2 (#49346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49346 This is less ambitious redo of https://github.com/pytorch/pytorch/pull/49129/. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25539365 fbshipit-source-id: 98485875aaaf5743e1a940e170258057691be4fa	2020-12-16 01:28:38 -08:00
Hao Lu	8954eb3f72	[StaticRuntime] Fusion pass for ClipRanges/GatherRanges/LengthsToOffsets (#49113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49113 Reviewed By: ajyu Differential Revision: D25388512 fbshipit-source-id: 3daa5b9387a3a10b6c220688df06540c4d844aea	2020-12-16 00:34:49 -08:00
lixinyu	94e328c038	fix optimizer.pyi typo 'statue'->'state' (#49388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49388 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D25553672 Pulled By: glaringlee fbshipit-source-id: e9f2233bd678a90768844af2d8d5e2994d59e304	2020-12-15 23:41:56 -08:00
Hao Lu	cbeb4c25e5	[StaticRuntime] Permute_out (#49447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49447 Adding an out variant for `permute`. It's better than fixing the copy inside contiguous because 1) we can leverage the c2 math library, 2) contiguous creates a tensor inside the function which isn't managed by the MemoryPlanner in StaticRuntime Test Plan: Benchmark: ``` After: I1214 12:35:32.218775 991920 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0902339. Iters per second: 11082.3 Before: I1214 12:35:43.368770 992620 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0961521. Iters per second: 10400.2 ``` Reviewed By: yinghai Differential Revision: D25541666 fbshipit-source-id: 013ed0d4080cd01de4d3e1b031ab51e5032e6651	2020-12-15 23:09:31 -08:00
mattip	acd72e79a3	update breathe (#49407 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47462, but not completely. Update breathe to the latest version to get fixes for the "Unable to resolve..." issues. There are still some build errors, but much fewer than before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49407 Reviewed By: izdeby Differential Revision: D25562163 Pulled By: glaringlee fbshipit-source-id: 91bfd9e9ac70723816309f489022d72853f5fdc5	2020-12-15 21:47:07 -08:00
Nikita Shulga	58551e52f0	[CMake] Use libtorch_cuda list defined in bzl file (#49429 ) Summary: Since NCCL is an optional CUDA dependency, remove nccl.cpp from the core filelist Pull Request resolved: https://github.com/pytorch/pytorch/pull/49429 Reviewed By: nikithamalgifb Differential Revision: D25569883 Pulled By: malfet fbshipit-source-id: 61371a4c6b0438e4e0a7f094975b9a9f9ffa4032	2020-12-15 20:51:16 -08:00
Scott Wolchok	22c6dafd33	[PyTorch] Use plain old function pointer for RecordFunctionCallback (reapply) (#49408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49408 Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback. ghstack-source-id: 118665808 Test Plan: Wait for GitHub CI since we had C++14-specific issues with this one in previous PR https://github.com/pytorch/pytorch/pull/48629 Reviewed By: malfet Differential Revision: D25563207 fbshipit-source-id: 6a2831205917d465f8248ca37429ba2428d5626d	2020-12-15 19:16:01 -08:00
James Reed	e9d7d37ad0	[FX] Rename Node._uses and refactor Node.all_input_nodes (#49415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49415 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D25565341 Pulled By: jamesr66a fbshipit-source-id: 2290ab62572632788809ba16319578bf0c0260ee	2020-12-15 17:13:57 -08:00
Andrey Malevich	46debe7f23	[DPER] Introduce barrier operation to force synchronization of threads in async execution (#49322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49322 In some cases async execution might loose dependencies (Alias like ops) or produce suboptimal scheduling when there is an option which parts to schedule first. Example of the later behavior can happen in ModelParallel training where copy can get lower priority compared to the rest of the execution on the given GPU, which will caused other GPUs to starve. This operator allows to address these issues by introducing extra explicit dependencies between ops. Test Plan: Unit-test/ E2E testing in the future diffs. Reviewed By: xianjiec Differential Revision: D24933471 fbshipit-source-id: 1668994c7856d73926cde022378a99e1e8db3567	2020-12-15 16:13:42 -08:00
Yanan Cao	7518f54611	Add flag torch_jit_disable_warning_prints to allow disabling all warnings.warn (#49313 ) Summary: Adding a flag torch_jit_disable_warning_prints to optimize interpreter performance by suppressing (potentially large amount) of warnings.warn. This is to work around TorchScript's warning behavior mismatch with Python. Python by default triggers a warning once per location but TorchScript doesn't support it. This causes same warning to trigger and print once per inference run, hurting performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49313 Reviewed By: SplitInfinity Differential Revision: D25534274 Pulled By: gmagogsfm fbshipit-source-id: eaeb57a335c3e6c7eb259671645db05d781e80a2	2020-12-15 15:22:41 -08:00
caozhong	aff0b68a58	Fix include files for out-of-tree compilation (#48827 ) Summary: Signed-off-by: caozhong <zhong.z.cao@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/48827 Reviewed By: agolynski Differential Revision: D25375988 Pulled By: ailzhang fbshipit-source-id: a8d5ab4572d991d6d96dfe758011517651ff0a6b	2020-12-15 14:40:44 -08:00
Amogh Akshintala	16f4b0ed6b	Replace THError() check in THCTensorMathReduce.cu with C10_CUDA_KERNEL_LAUNCH_CHECK() (#49424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49424 As per conversation in this [comment](https://www.internalfb.com/intern/diff/D25541113 (`e2510a0b60`)/?dest_fbid=393026838623691&transaction_id=3818008671564312) on D25541113 (`e2510a0b60`), although THError does more than just log any errors associated cuda kernel launches, we're going to go ahead and replace it with C10_CUDA_KERNEL_LAUNCH_CHECK, so as to be consistent throughout the code base. Standardization FTW. This commit is purposefully sent in as a single file change so it can be easily reverted if it introduces a regression. Test Plan: Checked that the code still builds with ``` buck build //caffe2/aten:ATen-cu ``` Also ran basic aten tests ``` buck test //caffe2/aten:atest ``` Reviewed By: r-barnes Differential Revision: D25567863 fbshipit-source-id: 1093bfe2b6ca6b9a3bfb79dcdc5d713f6025eb77	2020-12-15 14:08:09 -08:00
kshitij12345	c508e5b1bf	[numpy] torch.{all/any} : output dtype is always bool (#47878 ) Summary: BC-breaking note: This PR changes the behavior of the any and all functions to always return a bool tensor. Previously these functions were only defined on bool and uint8 tensors, and when called on uint8 tensors they would also return a uint8 tensor. (When called on a bool tensor they would return a bool tensor.) PR summary: https://github.com/pytorch/pytorch/pull/44790#issuecomment-725596687 Fixes 2 and 3 Also Fixes https://github.com/pytorch/pytorch/issues/48352 Changes * Output dtype is always `bool` (consistent with numpy) BC Breaking (Previously used to match the input dtype) * Uses vectorized version for all dtypes on CPU * Enables test for complex * Update doc for `torch.all` and `torch.any` TODO * [x] Update docs * [x] Benchmark * [x] Raise issue on XLA Pull Request resolved: https://github.com/pytorch/pytorch/pull/47878 Reviewed By: H-Huang Differential Revision: D25421263 Pulled By: mruberry fbshipit-source-id: c6c681ef94004d2bcc787be61a72aa059b333e69	2020-12-15 13:59:32 -08:00
Mikhail Zolotukhin	38a59a67f3	[JIT] Support multiple outputs in subgraph matcher. (#48992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48992 Differential Revision: D25388100 Test Plan: Imported from OSS Reviewed By: heitorschueroff Pulled By: ZolotukhinM fbshipit-source-id: d95713af2220cf4f99ac92f59f8e5b902f2f3822	2020-12-15 13:09:24 -08:00
Bram Wasti	3ffe9e0f43	[static runtime] refine fusion group (#49340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49340 This refines the fusion group to include on certain types of operations. We cannot safely handle "canRunNatively" types and the memonger pass causes regressions on some internal models, so it was disabled (to be revisited with proper memory optimization once Tensor pools are implemented) Test Plan: ``` buck test mode/no-gpu caffe2/test:static_runtime buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest ``` Reviewed By: ZolotukhinM Differential Revision: D25520105 fbshipit-source-id: add61d103e4f8b4615f5402e760893ef759a60a9	2020-12-15 12:57:35 -08:00
Bert Maher	f4e15c4a23	[te] Fix bugs with shift operators (#49396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49396 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49271 Two things: 1. These throw exceptions in their constructor, which causes a segfault (), so move the exceptions to ::make. 2. They technically support FP types but the rules are complicated so let's not bother. () The reason for the segfault: all Exprs including these inherit from KernelScopedObject, whose constructor adds the object to a list for destruction at the end of the containing KernelArena's lifetime. But if the derived-class constructor throws, the object is deleted even though it's still in the KernelArena's list. So when the KernelArena is itself deleted, it double-frees the pointer and dies. I've also fixed And, Or, and Xor in this diff. ghstack-source-id: 118594998 Test Plan: `buck test //caffe2/test:jit` Reviewed By: bwasti Differential Revision: D25512052 fbshipit-source-id: 42670b3be0cc1600dc5cda6811f7f270a2c88bba	2020-12-15 12:44:59 -08:00
Sebastian Messmer	5912316cf7	Making ops c10-full: Generator arguments (#49013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49013 I don't know why this works. I know, this is never a good way to start a PR description :P I know that Generator is a dispatch relevant argument when called from an unboxed API and is ignored for dispatch purposes when called from a boxed API. This should break something, but maybe we don't have test cases for that. We likely need to align the unboxed and boxed dispatch behavior before landing this. The best solution would be to make Generator not dispatch relevant in unboxing. But that might be a bigger change. An acceptable solution could be to make Generator dispatch relevant in boxing, but that needs perf measurements. This PR needs further discussion. ghstack-source-id: 118619230 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D25394998 fbshipit-source-id: f695c659ee6e3738f74cdf0af1a514ac0c30ebff	2020-12-15 11:21:43 -08:00
Sebastian Messmer	a6274c1278	Making ops c10 full: out overloads with default arguments (#49012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49012 For some reason we apply default arguments to the functions in at::native too. So when an out overload had default arguments, we couldn't move the out argument to the end because of those default arguments preceding it. This PR fixes that and makes out overloads with default arguments c10-full ghstack-source-id: 118619222 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25394605 fbshipit-source-id: 2ed1c3ce0d04a548e3141df2dca517756428fe15	2020-12-15 11:21:40 -08:00
Sebastian Messmer	b47fa5e88b	Making ops c10-full: Dimname arguments (#49008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49008 ghstack-source-id: 118619229 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25392590 fbshipit-source-id: 9a4c8917aaa254fac42f33973409f5497f878df2	2020-12-15 11:21:37 -08:00
Sebastian Messmer	c5f90a25c0	Making ops c10-full: ops blocked by manual registrations (#49007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49007 Some ops had manual registrations, e.g. in VmapModeRegistrations and those manual registrations had to be changed too when making the op c10-full. This PR makes those ops c10-full and fixes the manual registrations. ghstack-source-id: 118619231 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25392591 fbshipit-source-id: f4124c0547594879646cb1778357f857ea951132	2020-12-15 11:21:33 -08:00
Sebastian Messmer	e391dbc1b5	Making ops c10 full: ops returning multiple out arguments (#49006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49006 There was an issue in the unboxing logic with ops returning multiple out arguments. This PR fixes that and makes those ops c10 full. Additionally, it makes some ops c10 full that slipped through the cracks before. ghstack-source-id: 118619224 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25392592 fbshipit-source-id: 6947304f34c5658fc12dc6608a21aff7bc4491e2	2020-12-15 11:21:30 -08:00
Sebastian Messmer	40a02e2ded	Make out ops c10-full (with hacky-wrapper) (#48912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48912 ghstack-source-id: 118619234 (Note: this ignores all push blocking failures!) Test Plan: Benchmark: --- Old (i.e. codegenerated unboxing wrapper + no hacky_wrapper): ``` <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f64d03ebcd0> torch.absolute(t, out=o) setup: t = torch.empty([1]) o = torch.empty([1]) All Noisy symbols removed Instructions: 657204 634396 Baseline: 4192 3786 100 runs per measurement, 1 thread ``` New (i.e. templated unboxing wrapper + hacky_wrapper): ``` <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7fa7de211cd0> torch.absolute(t, out=o) setup: t = torch.empty([1]) o = torch.empty([1]) All Noisy symbols removed Instructions: 658160 633996 Baseline: 4210 3786 100 runs per measurement, 1 threa ``` Reviewed By: bhosmer Differential Revision: D25363335 fbshipit-source-id: ab9c122491e4209a49254dad0f7b3adb677b2c53	2020-12-15 11:16:00 -08:00
Richard Barnes	11334280bf	Suppress `warning: calling a constexpr __host__ function from a __host__ __device__ function is not allowed` warning (#49197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49197 Compiling currently gives a number of these warnings: ``` caffe2/c10/util/TypeCast.h(27): warning: calling a constexpr __host__ function from a __host__ __device__ function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this. detected during: instantiation of "decltype(auto) c10::maybe_real<true, src_t>::apply(src_t) [with src_t=c10::complex<double>]" (57): here instantiation of "uint8_t c10::static_cast_with_inter_type<uint8_t, src_t>::apply(src_t) [with src_t=c10::complex<double>]" (157): here instantiation of "To c10::convert<To,From>(From) [with To=uint8_t, From=c10::complex<double>]" (169): here instantiation of "To c10::checked_convert<To,From>(From, const char *) [with To=uint8_t, From=c10::complex<double>]" caffe2/c10/co ``` Here we fix this by adding `C10_HOST_DEVICE` to the offending function. Test Plan: Compiling ``` buck build mode/dev-nosan -c=python.package_style=inplace dper3/dper3_models/experimental/pytorch/ads:ads_model_generation_script ``` shows this warning. We rely on sandcastle for testing here. Reviewed By: xw285cornell Differential Revision: D25440771 fbshipit-source-id: 876c412eb06e8837978061cc4793abda42fac821	2020-12-15 10:49:07 -08:00
James Reed	778006918c	[WIP][FX] Add FX page to docs (#48814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48814 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D25320051 Pulled By: jamesr66a fbshipit-source-id: b1fdec9615a7a4eb97c557bb3cba7f90b0a4d933	2020-12-15 09:48:29 -08:00
Brian Hirsh	9908b93dcf	fix test_dispatch tests to error on duplicate def (#49254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49254 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25505170 Pulled By: bdhirsh fbshipit-source-id: 6796f4ce022c3141934ee69c7caaa08e663adf39	2020-12-15 08:27:52 -08:00
Lucas Hosseini	8ae9b46e20	Revert D25494735: Update TensorPipe submodule Test Plan: revert-hammer Differential Revision: D25494735 (`5a5e576ab9`) Original commit changeset: 3d6f326ca49d fbshipit-source-id: 369a4519b5b2fec19a7a5faf324b9467177e27f6	2020-12-15 08:11:56 -08:00
Luca Wehrstedt	9234f5026d	Make WorkNCCL use CUDAEvent::query() rather than re-implement it (#49343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49343 at::cuda::CUDAEvent is "lazy" and only creates an event when it's first recorded. Until then, at::cuda::CUDAEvent is empty. If we use at::cuda::CUDAEvent::query() this is taken into account (an empty event is always ready), but WorkNCCL extracts the raw cudaEvent_t value from at::cuda::CUDAEvent and calls cudaEventQuery manually and doesn't check this. This could cause a failure. It's unclear if this is ever supposed to happen, but we're seeing that failure, and we want to sort it out in order to see if there's something "deeper" going on. ghstack-source-id: 118532806 Test Plan: Unit tests Reviewed By: SciPioneer Differential Revision: D25537844 fbshipit-source-id: 506319f4742e1c0a02aa75ecc01112ea3be42d8f	2020-12-15 03:15:48 -08:00
Luca Wehrstedt	5a5e576ab9	Update TensorPipe submodule (#49232 ) Summary: Credit to beauby for the Bazel fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49232 Test Plan: Export and run on CI Reviewed By: beauby Differential Revision: D25494735 fbshipit-source-id: 3d6f326ca49dcd28d0d19cb561818c3c2904cb55	2020-12-15 00:47:39 -08:00
Nikita Shulga	98726119d9	Do not return unitialized qschame from getQSchemeAndQParamVector (#49391 ) Summary: Assign it by default to `kPerTensorAffine` Fixes regressions accidentally discovered by https://app.circleci.com/pipelines/github/pytorch/pytorch/250370/workflows/6f38ae43-a9a5-43f3-8c1f-0f911df69d75/jobs/9589799 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49391 Reviewed By: ngimel Differential Revision: D25554180 Pulled By: malfet fbshipit-source-id: f42a45e9d6743c665c62d057197d009f1542226e	2020-12-15 00:04:38 -08:00
Richard Barnes	39a10fb652	Fix check_kernel_launches.py for macros and provide extended context (#49365 ) Summary: `check_kernel_launches.py` currently gives a false positive in instances such as: ``` 735: <<<smallIndexGrid, smallIndexBlock, 0, stream>>>( \ 736: outInfo, selfInfo, indicesInfo, \ 737: outSelectDim, selfSelectDim, static_cast<TYPE>(sliceSize), \ 738: selfSelectDimSize); \ 739: C10_CUDA_KERNEL_LAUNCH_CHECK(); ``` because the newlines after the last `\` are not consumed by the regex. This fixes that. In addition, the regex is modified to provide greater context for the start of the kernel launch. This changes the context from: ``` 157: ( 158: size, X_strides, Y_dims, X, Y); ``` to ``` 157: <<<M, CAFFE_CUDA_NUM_THREADS, 0, context->cuda_stream()>>>( 158: size, X_strides, Y_dims, X, Y); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49365 Test Plan: ``` buck test //caffe2/test:kernel_launch_checks -- --print-passing-details ``` Reviewed By: aakshintala Differential Revision: D25545402 Pulled By: r-barnes fbshipit-source-id: 76feac6a002187239853752b892f4517722a77bf	2020-12-14 22:09:33 -08:00
Mike Ruberry	25bc906281	Revert D25135415: [PyTorch] Use plain old function pointer for RecordFunctionCallback Test Plan: revert-hammer Differential Revision: D25135415 (`7e23ee1598`) Original commit changeset: 5e92dc79da64 fbshipit-source-id: 45b1634a100084c84dca158a1f16ca760fef6988	2020-12-14 21:04:27 -08:00
Yi Wang	a419a3e25d	Add assertion on any NaN error on the error feedback (#49374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49374 After the assertion is added, the NaN error on certain trainings disappears. It seems that the real error is caused by the underlying illegal memory access. This is a temporary workaround. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 118572471 Test Plan: Real run on Ads 10X model: scripts/wayi/mast_prof_gradient_compression.sh POWER_SGD 8 To reproduce the error, just comment out the assertion. Reviewed By: rohan-varma Differential Revision: D25548299 fbshipit-source-id: 039af7d94a27e0f47ef647c6163fd0e5064951d5	2020-12-14 20:15:39 -08:00
Scott Wolchok	7e23ee1598	[PyTorch] Use plain old function pointer for RecordFunctionCallback (#48629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48629 Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback. ghstack-source-id: 118568240 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D25135415 fbshipit-source-id: 5e92dc79da6473ed15d1e381a21ed315879168f3	2020-12-14 20:08:16 -08:00
Scott Wolchok	900aa4ee97	[PyTorch] remove convenience RecordFunctionCallback interface (#48620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48620 In preparation for storing bare function pointer (8 bytes) instead of std::function (32 bytes). ghstack-source-id: 118568242 Test Plan: CI Reviewed By: ezyang Differential Revision: D25132183 fbshipit-source-id: 3790cfb5d98479a46cf665b14eb0041a872c13da	2020-12-14 20:03:15 -08:00
Zain Patel	bbeee481c3	Fix typo in torch.load docstring for the `f` parameter (#49350 ) Summary: No issue opened for this (that I can see) and it was a fairly small change, so just opening this PR directly! The docstring for `torch.load` had some of parameter descriptions including typos like ``:meth`readline` `` instead of``:meth:`readline` ``. This PR corrects that :) <img width="811" alt="image" src="https://user-images.githubusercontent.com/30357972/102128240-7fa33500-3e45-11eb-8f54-ce5ca7bba96c.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/49350 Reviewed By: glaringlee Differential Revision: D25543041 Pulled By: mrshenli fbshipit-source-id: 10db04d58dd5b07777bdd51d3fcb3c45dea4c84b	2020-12-14 19:16:01 -08:00
Bert Maher	626b8c0cf2	[te] Ban uint8 tensors from fusion groups (#49247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49247 uint8's expose all kind of corner cases in type promotion. As an example, consider: ``` >>> torch.tensor([1], dtype=torch.uint8).lt(-1) tensor([True]) >>> torch.tensor([1], dtype=torch.uint8).lt(torch.tensor(-1)) tensor([True]) >>> torch.tensor([1], dtype=torch.uint8).lt(torch.tensor([-1])) tensor([False]) ``` the difference is how promotions involving scalars (or 0-dim tensors, which are treated like scalars) are prioritized compared to tensor dtypes. Per eellison, the order is something like: 1. Tensor FP types 2. Scalar FP types 3. Tensor Int types 4. Scalar Int types The logic for this is here: `c73e97033a/aten/src/ATen/native/TypeProperties.cpp (L93)` AFAICT the effects are mainly visible for the unsigned byte type (the only unsigned type, besides bool) since the others degrade more or less gracefully. It's hard to re-use this logic as is in TensorIterator/TypeProperties, and it's complicated enough that it's not worth re-implementing in TE unless there's evidence that it matters for real models. ghstack-source-id: 118555597 Test Plan: `buck test //caffe2/test:jit` Reviewed By: eellison Differential Revision: D25489035 fbshipit-source-id: db3ab84286d472fd8a247aeb7b36c441293aad85	2020-12-14 17:40:15 -08:00
Xiang Gao	50b361a821	Enable BF16 for indexing on CUDA (#48801 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/48801 Reviewed By: glaringlee Differential Revision: D25542914 Pulled By: ngimel fbshipit-source-id: 4113eb2729d15b40a89268172cc37122b5213624	2020-12-14 17:24:31 -08:00
Nikita Shulga	23e98e73f6	Fix Windows CUDA-11.1 test jobs (#49376 ) Summary: Fixes typo introduced by https://github.com/pytorch/pytorch/pull/49156 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49376 Reviewed By: seemethere Differential Revision: D25548524 Pulled By: malfet fbshipit-source-id: 6aa3d903f6105c576c009f05a6b9d29f32b35c47	2020-12-14 17:12:20 -08:00
Amogh Akshintala	e2510a0b60	Add Kernel Launch Checks to files under caffe2/aten/THC (#49358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49358 Added the header file (`c10/cuda/CUDAException.h`) where the `C10_CUDA_KERNEL_LAUNCH_CHECK` is defined as needed to files under `caffe2/aten/THC`, and then added `C10_CUDA_KERNEL_LAUNCH_CHECK()` calls after each kernel launch. In some cases, removed some extraneous ErrorChecks Test Plan: Checked that the code still builds with ``` buck build //caffe2/aten:ATen-cu ``` Also ran basic aten tests ``` buck test //caffe2/aten:atest ``` Reviewed By: r-barnes Differential Revision: D25541113 fbshipit-source-id: df1a50e14d291a86b24ca1746ac27fa586f9757c	2020-12-14 16:21:50 -08:00
Ansha Yu	cb3169d7a8	[aten] index_select dim 1 (#47077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47077 Add benchmarks for pt index_select, batch_index_select, and c2's BatchGather Add batch_index_select implementation based on the C2 BatchGather implementation This currently falls back to index_select for backwards and cuda implementations. Alternatively, we can look into the specifics of why index_select is slower and replace the original implementation instead. Test Plan: ./buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/c2/batch_gather_test.par ./buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/pt/index_select_test.par PT results comparing without fix, block_size 1 only, and all dim=1 ``` # no optimization # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: index_select # Mode: Eager # Name: index_select_M256_N512_K1_dim1_cpu # Input: M: 256, N: 512, K: 1, dim: 1, device: cpu Forward Execution Time (us) : 353.450 # Benchmarking PyTorch: index_select # Mode: Eager # Name: index_select_M512_N512_K1_dim1_cpu # Input: M: 512, N: 512, K: 1, dim: 1, device: cpu Forward Execution Time (us) : 862.492 # Benchmarking PyTorch: index_select # Mode: Eager # Name: index_select_M256_N512_K2_dim1_cpu # Input: M: 256, N: 512, K: 2, dim: 1, device: cpu Forward Execution Time (us) : 4555.344 # Benchmarking PyTorch: index_select # Mode: Eager # Name: index_select_M512_N512_K2_dim1_cpu # Input: M: 512, N: 512, K: 2, dim: 1, device: cpu Forward Execution Time (us) : 11003.279 ``` ``` # block size 1 only # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: index_select # Mode: Eager # Name: index_select_M256_N512_K1_dim1_cpu # Input: M: 256, N: 512, K: 1, dim: 1, device: cpu Forward Execution Time (us) : 129.240 # Benchmarking PyTorch: index_select # Mode: Eager # Name: index_select_M512_N512_K1_dim1_cpu # Input: M: 512, N: 512, K: 1, dim: 1, device: cpu Forward Execution Time (us) : 266.776 # Benchmarking PyTorch: index_select # Mode: Eager # Name: index_select_M256_N512_K2_dim1_cpu # Input: M: 256, N: 512, K: 2, dim: 1, device: cpu Forward Execution Time (us) : 4508.593 # Benchmarking PyTorch: index_select # Mode: Eager # Name: index_select_M512_N512_K2_dim1_cpu # Input: M: 512, N: 512, K: 2, dim: 1, device: cpu Forward Execution Time (us) : 10391.655 ``` ``` # dim 1 # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: index_select # Mode: Eager # Name: index_select_M8_N8_K1_dim1_cpu # Input: M: 8, N: 8, K: 1, dim: 1, device: cpu Forward Execution Time (us) : 3.736 # Benchmarking PyTorch: index_select # Mode: Eager # Name: index_select_M256_N512_K1_dim1_cpu # Input: M: 256, N: 512, K: 1, dim: 1, device: cpu Forward Execution Time (us) : 130.460 # Benchmarking PyTorch: index_select # Mode: Eager # Name: index_select_M512_N512_K1_dim1_cpu # Input: M: 512, N: 512, K: 1, dim: 1, device: cpu Forward Execution Time (us) : 267.706 # Benchmarking PyTorch: index_select # Mode: Eager # Name: index_select_M8_N8_K2_dim1_cpu # Input: M: 8, N: 8, K: 2, dim: 1, device: cpu Forward Execution Time (us) : 4.187 # Benchmarking PyTorch: index_select # Mode: Eager # Name: index_select_M256_N512_K2_dim1_cpu # Input: M: 256, N: 512, K: 2, dim: 1, device: cpu Forward Execution Time (us) : 1739.550 # Benchmarking PyTorch: index_select # Mode: Eager # Name: index_select_M512_N512_K2_dim1_cpu # Input: M: 512, N: 512, K: 2, dim: 1, device: cpu Forward Execution Time (us) : 3468.332 ``` C2 results: ```# Benchmarking Caffe2: batch_gather WARNING: Logging before InitGoogleLogging() is written to STDERR W1203 13:19:35.310904 782584 init.h:137] Caffe2 GlobalInit should be run before any other API calls. # Name: batch_gather_M8_N8_K1_devicecpu # Input: M: 8, N: 8, K: 1, device: cpu Forward Execution Time (us) : 0.308 # Benchmarking Caffe2: batch_gather # Name: batch_gather_M256_N512_K1_devicecpu # Input: M: 256, N: 512, K: 1, device: cpu Forward Execution Time (us) : 90.517 # Benchmarking Caffe2: batch_gather # Name: batch_gather_M512_N512_K1_devicecpu # Input: M: 512, N: 512, K: 1, device: cpu Forward Execution Time (us) : 200.009 # Benchmarking Caffe2: batch_gather # Name: batch_gather_M8_N8_K2_devicecpu # Input: M: 8, N: 8, K: 2, device: cpu Forward Execution Time (us) : 0.539 # Benchmarking Caffe2: batch_gather # Name: batch_gather_M256_N512_K2_devicecpu # Input: M: 256, N: 512, K: 2, device: cpu Forward Execution Time (us) : 1001.540 # Benchmarking Caffe2: batch_gather # Name: batch_gather_M512_N512_K2_devicecpu # Input: M: 512, N: 512, K: 2, device: cpu Forward Execution Time (us) : 2005.870 ``` buck test dper3/dper3/modules/low_level_modules/tests:single_operators_test -- test_batch_gather Reviewed By: hlu1 Differential Revision: D24630227 fbshipit-source-id: cd205a30d96a33d239f3266820ada9a90093cf91	2020-12-14 15:39:33 -08:00
Joel Schlosser	220b91660f	[pytorch] Expand PixelShuffle to support any number of batch dims (#49187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49187 Expands the implementation of PixelShuffle to support any number of batch dimensions Test Plan: `buck test caffe2/test:nn -- test_pixel_shuffle` Reviewed By: mruberry Differential Revision: D25399058 fbshipit-source-id: ab0a7f593b276cafc9ebb46a177e2c1dce56d0de	2020-12-14 14:52:57 -08:00
Chester Liu	3a943e9f82	Use Unicode friendly API on Win32 in THAllocator (#47905 ) Summary: This replaces the narrow character set APIs with the wide character set ones in `THAllocator.cpp`. This fixes the potential crashes caused by passing non-ASCII characters in `torch::from_file` on Windows. See: https://github.com/pytorch/pytorch/issues/47422 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47905 Reviewed By: zhangguanheng66 Differential Revision: D25399146 Pulled By: ezyang fbshipit-source-id: 0a183b65de171c48ed1718fa71e773224eaf196f	2020-12-14 14:24:20 -08:00
vikigenius	1e2d1d7242	Fixed cat transform to work with event_dim > 0 (#49111 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44530 As explained in the issue description, CatTransform does not work with event_dim > 0. This PR fixes this. If this gets approved I am hoping to do the same for StackTransform as well. fritzo Can you take a look at this ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/49111 Reviewed By: neerajprad Differential Revision: D25526005 Pulled By: ezyang fbshipit-source-id: e14430093f550d5e0da7a311f9cd44796807830f	2020-12-14 14:16:18 -08:00
Amogh Akshintala	d5a971e193	Add kernel launch checks in caffe2/aten/src/ATen/native/cuda/ (#49269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49269 Added C10_CUDA_KERNEL_LAUNCH_CHECK(); after all kernel launches in caffe2/aten/src/ATen/native/cuda. Several files in the directory still trigger the check_kernel_launches.py tool. These are false positives as the tool doesn't seem to be parsing MACROS correctly. Normalization.cuh <- This file is also highlighted by the check_kernel_launches.py tool, but the highlighted regions are device code where exception handling isn't allowed. Test Plan: Check that the code still builds with ``` buck build //caffe2/aten:ATen-cu ``` https://pxl.cl/1tLRB Also ran ``` buck test //caffe2/aten:atest ``` https://pxl.cl/1tLSw Reviewed By: r-barnes Differential Revision: D25487597 fbshipit-source-id: 7a6689534f7ff85a5d2262831bf6918f1fe0b745	2020-12-14 13:46:25 -08:00
Jane (Yuan) Xu	86cf1e1358	Add another way to verify ccache in CONTRIBUTING.md (#49337 ) Summary: In the case people are confused how to make sure ccache is working, I added another sentence in the documentation for how to check that the symlinks are correctly set up in addition to waiting for 2 clean builds of PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49337 Reviewed By: walterddr Differential Revision: D25535659 Pulled By: janeyx99 fbshipit-source-id: 435696255f517c074dd0d9f96534d22b60f795b2	2020-12-14 13:19:40 -08:00
Edward Yang	6820745e28	Revert D25489030: [PyTorch] Make tls_local_dispatch_key_set inlineable Test Plan: revert-hammer Differential Revision: D25489030 (`be849ed1fd`) Original commit changeset: 63147bae783e fbshipit-source-id: 6ce564979078f28ca9b7c80bc89ef492a2993806	2020-12-14 12:45:26 -08:00
Yi Zhang	4188c374ce	Refactor: use version instead of major version in windows build (#49156 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49219 1. update version instead of major version for env var of CUDA_VERSION 2. update related scripts Pull Request resolved: https://github.com/pytorch/pytorch/pull/49156 Reviewed By: glaringlee Differential Revision: D25535530 Pulled By: ezyang fbshipit-source-id: 0712227f2b06b45ee68efc42717c4308fea1abdc	2020-12-14 12:25:05 -08:00
Ralf Gommers	6cfd7c3811	Remove type annotations from signatures in html docs (#49294 ) Summary: One unintended side effect of moving type annotations inline was that those annotations now show up in signatures in the html docs. This is more confusing and ugly than it is helpful. An example for `MaxPool1d`: ![image](https://user-images.githubusercontent.com/98330/102010280-77f86900-3d3d-11eb-8f83-e7ee0991ed92.png) This makes the docs readable again. The parameter descriptions often already have type information, and there will be many cases where the type annotations will make little sense to the user (e.g., returning typevar T, long unions). Change to `MaxPool1d` example: ![image](https://user-images.githubusercontent.com/98330/102010304-91011a00-3d3d-11eb-860d-ffa174b4d43b.png) Note that once we can build the docs with Sphinx 3 (which is far off right now), we have two options to make better use of the extra type info in the annotations (some of which is useful): - `autodoc_type_aliases`, so we can leave things like large unions unevaluated to keep things readable - `autodoc_typehints = 'description'`, which moves the annotations into the parameter descriptions. Another, more labour-intensive option, is what vadimkantorov suggested in gh-44964: show annotations on hover. Could also be done with some foldout, or other optional way to make things visible. Would be nice, but requires a Sphinx contribution or plugin first. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49294 Reviewed By: glaringlee Differential Revision: D25535272 Pulled By: ezyang fbshipit-source-id: 5017abfea941a7ae8c4595a0d2bdf8ae8965f0c4	2020-12-14 12:19:48 -08:00
Hector Yuen	9e3c25ff1d	sls + layernorm test (#43799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43799 Test Plan: https://www.internalfb.com/intern/testinfra/testconsole/testrun/3096224784866350/ Reviewed By: venkatacrc Differential Revision: D23383351 fbshipit-source-id: c312d481ad15bded83bea90beaaae7742d0c54b8	2020-12-14 11:47:49 -08:00
Scott Wolchok	be849ed1fd	[PyTorch] Make tls_local_dispatch_key_set inlineable (#49264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49264 FLAGS_disable_variable_dispatch had to go, but it looks like the only user was some benchmarks anyway. ghstack-source-id: 118480532 Test Plan: Small (order of 0.1% improvement) on Internal benchmarks Reviewed By: smessmer Differential Revision: D25489030 fbshipit-source-id: 63147bae783e7a45391dd70d86730e48d3e0cafc	2020-12-14 11:17:35 -08:00
Michael Carilli	c068180a17	[CUDA graphs] Cuda RNG-safe graph capture and replay bindings (#48875 ) Summary: Part 2 of https://github.com/pytorch/pytorch/pull/46148 refactor. (part 1 was https://github.com/pytorch/pytorch/pull/48694.) Contains - a few more CUDAGeneratorImpl diffs to clean up graph capture interaction - Capture and replay bindings that interact correctly with CUDAGeneratorImpl - Tests. Diffs compile and tests pass on my machine (ubuntu 20.04, cuda 11.0) but it needs finetuning for many CI builds. See [Note [CUDA Graph-safe RNG states]](`02d89f9f1d/aten/src/ATen/CUDAGeneratorImpl.h (L13-L85)`) for the strategy, based on https://github.com/pytorch/pytorch/pull/46148#issuecomment-724414794. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48875 Reviewed By: zou3519 Differential Revision: D25482654 Pulled By: ngimel fbshipit-source-id: 634dbc4c6c9d7d0d9a62dc81a52d430561f905fe	2020-12-14 10:51:58 -08:00
Tao Xu	25833e5d1c	[CrashFix] Make the dst tensor contiguous when copying from metal Summary: Somehow the destination tensor becomes incontiguous when copying from Metal. We need to call `.contiguous()` explicitly. See the crash log - https://www.internalfb.com/intern/logview/details/facebook_ios_crashes/1d865405fbc1a45f9517470906c9ec08/ Test Plan: - verify the crash - Sandcastle CIs Reviewed By: dreiss Differential Revision: D25502884 fbshipit-source-id: 46ee720bf6b6658e51cb56a4e4c16ce121eeabc7	2020-12-14 10:27:06 -08:00
Sebastian Pop	a0432a7020	[AARCH64] Fix vst1q_f32_x2 implementation (#49273 ) Summary: Add memory operands to inline asm, that informs the compiler that this instruction writes to memory. Fixes https://github.com/pytorch/pytorch/issues/48901 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49273 Reviewed By: walterddr Differential Revision: D25512921 Pulled By: malfet fbshipit-source-id: 474d070e1f7c2167b9958cbeb4e401dc0e4a930b	2020-12-14 10:09:39 -08:00
Xiang Gao	87636c07bb	CUDA BF16 sparse (#48807 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/48807 Reviewed By: mruberry Differential Revision: D25526752 Pulled By: ngimel fbshipit-source-id: 9ff8e637486cfd67d46daf0c05142bbe611e08ec	2020-12-14 09:55:52 -08:00
mingfeima	690eaf9c43	add channels last for AdaptiveAvgPool2d (#48916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48916 optimize adaptive average pool2d forward path optimize adaptive average pool2d backward path remove unused headers minor change minor change rename the header; add adaptive max pooling in future. minor change loosen adapative_pool2d test on nhwc to both device cuda and cpu minor change Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25399469 Pulled By: VitalyFedyunin fbshipit-source-id: 86f9fda35194f21144bd4667b778c861c05a5bac	2020-12-14 09:47:46 -08:00
Nikita Shulga	8397a62a64	Fix cvtfp32_bf16 (#41280 ) Summary: For `Vec256<bfloat16>::blendv()` operator to work correctly, float32 -nan (0xfffffffff) must be converted to bfloat16 -nan (0xffff). But cvtfp32_bf16 converts -nan to nan (0x7fc0) TODO: Fix float32 +-nan conversion: i.e. float32 nan (0x7fffffff) must be converted to bfloat16 (0x7fff) nan Closes https://github.com/pytorch/pytorch/issues/41238 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41280 Reviewed By: mruberry Differential Revision: D23311585 Pulled By: malfet fbshipit-source-id: 79499ce19f1ec3f6c954a874f1cd47f4ece6bdb5	2020-12-14 08:49:30 -08:00
Kyeongpil Kang	bd322c8967	Update docstrings of torch.nn.modules.activation.MultiheadAttention (#48775 ) Summary: - Add the link to the original paper (Attention is All You Need) - Fix indentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/48775 Reviewed By: H-Huang Differential Revision: D25465914 Pulled By: heitorschueroff fbshipit-source-id: bbc296ec1523326e323587023c126e820e90ad8d	2020-12-14 08:34:33 -08:00
Scott Wolchok	7d406b4a07	[PyTorch] Make TORCH_CHECK less likely to interfere with inlining (#49263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49263 Now it is smaller and calls to an out-of-line function in case of failure. ghstack-source-id: 118480531 Test Plan: 1) Inspect perf profile of internal benchmark, much less time spent in (for example) `c10::impl::getDeviceImpl`, which calls TORCH_CHECK and should be inlined 2) Internal benchmarks Reviewed By: smessmer Differential Revision: D25481308 fbshipit-source-id: 0121ada779ca2518ca717f75920420957b3bb1aa	2020-12-14 08:11:23 -08:00
Scott Wolchok	eb051afa78	[PyTorch] native_cpp_binding for size() and stride() (#49262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49262 This uses the newly-added native_cpp_binding feature to avoid dispatcher overhead for `size()` and `stride()`. ghstack-source-id: 118480533 Test Plan: CI Reviewed By: bwasti Differential Revision: D25446275 fbshipit-source-id: 1215eaa530d5aa3d501f89da8c99d0a487d8c1b6	2020-12-14 08:09:35 -08:00
Brian Hirsh	f54ab8fbfe	Revert "Revert D25003113: make validate debug-only in Device copy ctr" (#49123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49123 This reverts commit 7a4a2df2254b78d8c8d42b9f81b5b261a617466e. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25463531 Pulled By: bdhirsh fbshipit-source-id: 7c7ecdc1d63ffd137b84a129887c424b2083a958	2020-12-14 07:33:37 -08:00
Peter Bell	94a3d4b083	Remove unused operator at::_fft_with_size (#48905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48905 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25480385 Pulled By: mruberry fbshipit-source-id: 192d04a1b7e33b4e408cda8a82679c3ae3490a7d	2020-12-13 20:28:41 -08:00
Kurt Mohler	fdadfb6e5d	Fix formatting error in `set_deterministic` documentation (#49136 ) Summary: Fixes formatting error that was preventing a bulleted list from being displayed properly Pull Request resolved: https://github.com/pytorch/pytorch/pull/49136 Reviewed By: zou3519 Differential Revision: D25493130 Pulled By: mruberry fbshipit-source-id: 7fc21e0e2cfa9465a60d2d43b805164316375f01	2020-12-13 19:55:19 -08:00
Jordan Fix	38ed398580	[fx] Add constant folding pass (#48443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48443 Add a constant folding pass in FX: - Iterate over an input graph and tag what nodes are fully constant, i.e. either `get_attr` nodes, or nodes with all inputs that are either `get_attr` or constant - Use `model_transform.split_by_tags()` to split the graph into two - Look for the `output` node in the constant graph to get names of attrs that will be folded - Iterate over the non-constant graph and replace placeholders that are using the same name as the attrs with a `get_attr` as well as a dummy attr on the module - Return these two graphs in a new `FoldedGraphModule`, which is a normal GraphModule but also stores the constant graph on the side along with a `run_folding()` method that will run const folding and update the dummy parameters with the actual folded parameters Test Plan: Added a couple tests Reviewed By: 842974287 Differential Revision: D25033996 fbshipit-source-id: 589c036751ea91bb8155d9be98af7dbc0552ea19	2020-12-13 18:06:07 -08:00
Pritam Damania	f2ba3c1621	Use group.WORLD appropriately in process group initialization. (#48767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48767 As part of investigating https://github.com/pytorch/pytorch/issues/48464, I realized some weird inconsistency in how we use `_default_pg` and `group.WORLD`. `group.WORLD` apparently was an `object()` and never changed despite `_default_pg` changing. In this sense, `group.WORLD` was being used a constant to refer to the default pg, but wasn't of type PG at all. In fact the passed in group is also compared via `==` to `group.WORLD` in many places, and it just worked since the default argument was `group.WORLD`. To clean this up, I got rid of `_default_pg` completely and instead used `group.WORLD` as the default pg throughout the codebase. This also fixes the documentation issues mentioned in https://github.com/pytorch/pytorch/issues/48464. #Closes: https://github.com/pytorch/pytorch/issues/48464 ghstack-source-id: 118459779 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D25292893 fbshipit-source-id: 9a1703c71610aee2591683ab60b010332e05e412	2020-12-13 17:53:42 -08:00
Pritam Damania	dc4db95540	Update pipeline API to accept arbitrary sequence of Tensors and not just Tuple (#48467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48467 The current API's forward method only accepted a Tensor or a Tuple of Tensors, making this more generic by accepting any Sequence of Tensors. ghstack-source-id: 118436340 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D25181944 fbshipit-source-id: 4db251dad52c01abc69f3d327788f2e4289e6c9d	2020-12-12 17:13:05 -08:00
Alexander Golynski	33b7970d9e	fix slow windows test (#49258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49258 Tested by adding `time.sleep(3) ` in SubProcess.run and see test print "test_inherit_tensor: SubProcess too slow" Sample failure: https://app.circleci.com/pipelines/github/pytorch/pytorch/249756/workflows/3605479e-1020-4325-9a4c-8bde5ae38262/jobs/9550663 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D25507209 Pulled By: agolynski fbshipit-source-id: ec808f0f658d0fb4c8447f68ec5ceba2aa66b1b5	2020-12-12 06:48:38 -08:00
Hao Lu	cd927875e0	[pt] Replace size(dim) with sizes()[dim] (#49255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49255 - Replace `size(dim)` with `sizes()[dim]` because `sizes()` does not go through the dispatcher and is marginally better. - Remove unnecessary `size(dim)` and `sizes()` calls by saving the return value of `sizes()` to a temporary var. Reviewed By: radkris-git Differential Revision: D25488129 fbshipit-source-id: 4039e0609df20d5888666a71ad93b15e9a2182c5	2020-12-12 00:51:26 -08:00
Chen Lai	717f31d984	Remove unused reconstruct_scopes function (#48822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48822 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25325012 Pulled By: cccclai fbshipit-source-id: 86ea4c0b2926257c0f82aa05cbcd83278b1b67f7	2020-12-11 23:43:36 -08:00
Bert Maher	dc92f25b38	[te] Use c10::ScalarType utility functions in te::Dtype (#49148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49148 Instead of defining our own variants. I'm pretty sure this fixes a bug too, in that Bfloat16 wasn't being considered FP. Otoh, I don't think it's possible to create TEs with Bfloat so... ghstack-source-id: 118415314 Test Plan: `buck test //caffe2/test:jit` Reviewed By: robieta Differential Revision: D25456767 fbshipit-source-id: bd5822114b76c4fde82f566308909bd2a55f4f21	2020-12-11 22:41:57 -08:00
Bert Maher	eaac28192c	[te] Use Dtype::is_signed instead of an ad hoc local predicate. (#49147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49147 D25456366 adds Dtype::is_signed (which is backed by c10::isSignedType), so use that instead of this one-off. ghstack-source-id: 118415315 Test Plan: ``` buck test //caffe2/test{:jit,:tensorexpr,/cpp/tensorexpr:tensorexpr} ``` Reviewed By: robieta Differential Revision: D25456683 fbshipit-source-id: 428f1e8bff21ea05730690226a44984995c4c138	2020-12-11 22:41:54 -08:00
Bert Maher	ae88d25c23	[te] Fix clamp with uint8 args (#49143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49143 Riddle me this, batman: how could `torch.clamp(torch.tensor([0], dtype=torch.uint8), -10, 10)` equal `10`? The answer: the min/max args are first cast to the dtype of the input, giving min=246 and max 10. Then you have to apply Min and Max in the right order: `Min(Max(in, min), max)`. Differ in any way and you're doomed. Hooray. This PR makes TE match eager mode for this operator, plus fixes a major facepalm in the llvm min/max codegen where we were always generating signed comparisons. ghstack-source-id: 118415318 Test Plan: `buck test //caffe2/test:{jit,tensorexpr}` Reviewed By: robieta Differential Revision: D25456366 fbshipit-source-id: dde3c26c2134bdbe803227601fa3d23eaac750fb	2020-12-11 22:36:52 -08:00
Nikita Shulga	8999915a86	Fix "Missing return statement" mypy error (#49276 ) Summary: Adds `return None` after `assert_never` in the inner `get_one` function Without it, TestTypeHints.test_run_mypy_strict using mypy 0.770 fails with the above mentioned error, see https://app.circleci.com/pipelines/github/pytorch/pytorch/249909/workflows/597d8e34-ff04-4efa-9dde-9e28fbded341/jobs/9557705 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49276 Reviewed By: jamesr66a Differential Revision: D25513658 Pulled By: malfet fbshipit-source-id: 318eaff7e0534b10eafe46c0b834b7f7cefea757	2020-12-11 22:18:50 -08:00
Xiaodong Wang	b5b8fe9876	Revert D25434956: [JIT] Use `is_buffer` in `BufferPolicy::valid` Test Plan: revert-hammer Differential Revision: D25434956 (`a480ca5302`) Original commit changeset: ff2229058abb fbshipit-source-id: faba801e9b5e9fa0117624350518592868856eec	2020-12-11 21:10:15 -08:00
Chunli Fu	693e908656	[shape inference] fix ConstantFill Test Plan: unit test Reviewed By: yinghai Differential Revision: D25326529 fbshipit-source-id: 1322635567f6661637cde90cadaac0197975e133	2020-12-11 19:40:42 -08:00
Martin Yuan	8d58362f59	[PyTorch] Remove native::zeros reference in TensorIndexing (#49117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49117 Try to resolve part of the github issue of https://github.com/pytorch/pytorch/issues/48684 . It essentially calls the same functionality inside at::native::zeros(). After this diff, all references to aten::native symbols are removed. ghstack-source-id: 118261305 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D25444940 fbshipit-source-id: 7f782680daa3aedd1b7301cb08576da2ec70c188	2020-12-11 18:50:51 -08:00
Venkata Chintapalli	635f1cd1a5	Enable LayerNorm test cases Summary: Remove Skip from test defs. Test Plan: https://our.intern.facebook.com/intern/testinfra/testrun/1407375060598951 Reviewed By: hyuen Differential Revision: D25513174 fbshipit-source-id: 0ddfd1713cf7b9daf25f6e62df92d682cade350f	2020-12-11 17:58:24 -08:00
James Reed	76d41c801e	[JIT] Fix toIValue handling of AttributeError when casting ClassType (#49188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49188 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25476573 Pulled By: jamesr66a fbshipit-source-id: cec296fae71cc0cdf36bde60417d7d3b1aa84198	2020-12-11 17:54:16 -08:00
Yi Wang	29f0fa36b1	[Gradient Compression] Minor update of the comments on PowerSGD. (#49246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49246 Previously the comment on matrix_approximation_rank was in PowerSGD_hook function. Now move it into PowerSGDState, because the function arg is already moved to this state as an attribute. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 118414247 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D25501091 fbshipit-source-id: 701e3109a9a3f2a5f9d18d5bf6d0a266518ee8ea	2020-12-11 17:45:53 -08:00
Pritam Damania	21c38e1799	Additional validation for DistributedSampler. (#48865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48865 If DistributedSampler was provided an invalid rank (ex: https://discuss.pytorch.org/t/distributed-datasets-on-multi-machines/105113), it failed with a cryptic assertion failure. To fix this issue, I've added an additional check to DistributedSampler to validate we provide a valid rank. ghstack-source-id: 117906769 Test Plan: 1) waitforbuildbot 2) Unit test added. Reviewed By: malfet Differential Revision: D25344945 fbshipit-source-id: 7685e00c8b2c200efbd2949fb32ee32ea7232a08	2020-12-11 17:22:22 -08:00
Bram Wasti	6b78644623	[te] Add BitCast to the IR (#49184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49184 Adds BitCasting to NNC. This will enable fast approximation algorithms implemented directly in TensorExpressions Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr Reviewed By: bertmaher Differential Revision: D25466476 fbshipit-source-id: f063ab29ba7bab2dcce463e499f2d4a16bdc1f0e	2020-12-11 16:12:20 -08:00
Iurii Zdebskyi	5716b7db72	Enabled Scalar lists (#48222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48222 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25074765 Pulled By: izdeby fbshipit-source-id: 96ebe3c9907178c9338c03fb7993b2ecb26db8f4	2020-12-11 16:04:50 -08:00
Natalia Gimelshein	bfce69d620	inline `has` function for DispatchKeySet (#49191 ) Summary: inlines `has` function for DispatchKeySet, that is frequently used in TensorImpl in calls such as `is_sparse`, `is_cuda` etc. This increases `empty` instruction count (1853228 -> 1937428) without appreciable effect on runtime, and noticeably reduces instruction counts for `copy_` and friends that have to rely on `is_sparse`, `is_cuda` and the like a lot to decide which path to take (3269114 -> 2634114). Pull Request resolved: https://github.com/pytorch/pytorch/pull/49191 Reviewed By: H-Huang Differential Revision: D25483011 Pulled By: ngimel fbshipit-source-id: 2f3ab83e2c836a726b9284ffc50d6ecf3701aada	2020-12-11 15:55:40 -08:00
James Reed	53aa9b8c82	[FX] Move none assignments to same line (#49209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49209 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D25484975 Pulled By: jamesr66a fbshipit-source-id: 44207be878f95ec9420e87af79833191d5cc0c7e	2020-12-11 15:45:40 -08:00
Pritam Damania	2f359e7d55	Add tensorpipe agent tests to multigpu tests. (#49210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49210 The RPC tests use multiple gpus in some cases (ex: DDP + RPC and Pipe + DDP). We should enable multigpu tests for this purpose. ghstack-source-id: 118366595 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D25485506 fbshipit-source-id: eabbf442471ebc700b5986bc751879b9cf72b752	2020-12-11 15:00:38 -08:00
Pritam Damania	df027bfd2c	Modify Pipe to return an RRef. (#47829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47829 As per proposal in https://github.com/pytorch/pytorch/issues/44827, the API needs to return an RRef to support inter-host pipelining. For now, we just return a local RRef and only support pipeline on a single host. But having this change in the API upfront ensures we don't make any BC breaking changes later. ghstack-source-id: 118366784 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D24914022 fbshipit-source-id: e711e7d12efa45645f752f0e5e776a3d845f3ef5	2020-12-11 14:55:16 -08:00
Scott Wolchok	c6147ae4c9	[PyTorch] Fix getCustomClassType() perf (#48981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48981 1) It was copying the entire hash table every time. 2) We don't need to do a hash lookup at all. ghstack-source-id: 118164406 Reviewed By: dzhulgakov Differential Revision: D25385543 fbshipit-source-id: 6be95c742d6713345c51859ce36a7791a9e2e3f0	2020-12-11 14:20:01 -08:00
Ivan Yashchuk	6c1b405a3b	Updated derivative rules for complex QR decomposition (#48489 ) Summary: Updated `qr_backward` to work correctly for complex-valued inputs. Added `torch.qr` to list of complex tests. The previous implementation for real-valued differentiation used equation 42 from https://arxiv.org/abs/1001.1654 The current implementation is a bit simpler but the result for the real-valued input case is the same and all tests still pass. Derivation of complex-valued QR differentiation https://giggleliu.github.io/2019/04/02/einsumbp.html Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48489 Reviewed By: bdhirsh Differential Revision: D25272344 Pulled By: albanD fbshipit-source-id: b53c1fca1683f4aee5f4d5ce3cab9e559170e7cf	2020-12-11 14:14:40 -08:00
Scott Wolchok	e3542d2c12	[PyTorch] avoid unnecessary call to empty_tensor_restride in empty() (#48211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48211 Our empty benchmark makes this call unconditionally. If MemoryFormat::Contiguous is indeed a common case (or if workloads are likely to use a consistent-ish memory format), then I'd expect checking first to be a win. ghstack-source-id: 118224990 Test Plan: Profiled empty benchmark with perf, saw time spent in empty_tensor_restride go down. Ran framework overhead benchmarks. ~7% win on empty(), 0.5-1.5% regression on InPlace, ~2% win on OutOfPlace. Seems like both the In/Out of place ones are likely to be noise because they don't exercise empty? Reviewed By: bhosmer Differential Revision: D24914706 fbshipit-source-id: 916771b335143f9b4ec9fae0d8118222ab6e8659	2020-12-11 13:57:57 -08:00
Ilia Cherniavskii	4bc4ec2686	Reduce kineto logging (#49216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49216 Libkineto is pretty verbose by default, using libkineto api to reduce amount of logging Test Plan: TORCH_CUDA_ARCH_LIST="6.0;7.0" USE_CUDA=1 USE_MKLDNN=1 BUILD_BINARY=1 python setup.py develop install --cmake python test/test_profiler.py Imported from OSS Reviewed By: ngimel Differential Revision: D25488109 fbshipit-source-id: 61b443bcf928db939f730ba32711385bb2b622d4	2020-12-11 13:50:13 -08:00
kiyosora	15200e385a	Enable torch.where() to support Float16 & BFloat16 type inputs (#49004 ) Summary: Fixed https://github.com/pytorch/pytorch/issues/49075 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49004 Reviewed By: zou3519 Differential Revision: D25495225 Pulled By: H-Huang fbshipit-source-id: 09418ee5503f65c8862e40119c5802779505a4db	2020-12-11 13:36:41 -08:00
Brian Hirsh	218eaf4bba	pyi codegen refactor - no need to group python signatures by overload name (#49057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49057 Now that all of the byte-for-byte hacks are removed in the pyi codegen, there's no reason for the codegen to group pyi signature overloads together. I updated the logic in `gen_pyi` that computes signatures (`generate_type_hints()` and _generate_named_tuples()`) to operate per individual `PythonSignatureGroup` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25410849 Pulled By: bdhirsh fbshipit-source-id: 8c190035d7bfc06ed192468efbe7d902922ad1fa	2020-12-11 13:29:24 -08:00
Brian Hirsh	33a9b14da0	pyi codegen - removing byte-for-byte-compatibility hacks (sorting overloads) (#49056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49056 This is another byte-for-byte compatibility hack. I'm now sorting pyi signature overloads (previously the codegen did not). Mostly put this in a separate PR just to more easily reason about the diff in the codegen output. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25410846 Pulled By: bdhirsh fbshipit-source-id: 06e5c32edbce610dd12ec7499014b41b23c646bd	2020-12-11 13:29:22 -08:00
Brian Hirsh	b94ec8c9f7	pyi codegen - removing byte-for-byte compatibility hacks (#49055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49055 Removed the majority of the TODO hacks that I added to the original pyi PR to maintain byte-for-byte compatibility. I left a few of the divergences between pyi deprecated vs. native signatures, since (a) they're smaller and (b) it might make more sense to kill the deprecated functions at some point entirely. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25410847 Pulled By: bdhirsh fbshipit-source-id: cf07cdda92f7492cd83d363cbb810e3810f6b8c8	2020-12-11 13:29:19 -08:00
Brian Hirsh	9920adebfd	pyi cleanup (#49054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49054 These are some followups from the first pyi codegen PR. Still maintaining byte-for-byte compatibility in this one. - Separated `argument_str() with a pyi flag into two functions, `argument_str()` and `argument_str_pyi()` - Added a notes section for pyi at the top of `python.py` - Added a `Python Interface` section that I moved the free-standing pyi functions to Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25410848 Pulled By: bdhirsh fbshipit-source-id: db83a80af900c32b5e32d67ce27767f6e7c2adfb	2020-12-11 13:27:41 -08:00
Ilia Cherniavskii	db5e5b439c	Extra sampling of record function events [resend] (#49114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49114 resend of https://github.com/pytorch/pytorch/pull/48289 Test Plan: see 48289 Reviewed By: robieta Differential Revision: D25443365 Pulled By: ilia-cher fbshipit-source-id: c15ac312222bb4d744e10199ed79801cccae8227	2020-12-11 12:53:37 -08:00
Sebastian Messmer	1cb5aa6c60	Fix structured kernel codegen (#49244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49244 see https://fb.quip.com/ceEdANd5iVsO RegisterMkldnnCPU kernels incorrectly used makeUnboxedOnly() calls to register add_.Tensor kernels. This is because the codegen incorrectly thought they're not c10-full. This PR fixes that. ghstack-source-id: 118411117 Test Plan: After this PR, RegisterMkldnnCPU doesn't contain the makeUnboxedOnly() calls anymore. Reviewed By: ezyang Differential Revision: D25500246 fbshipit-source-id: 8a8c2be9c4f4a5ce7eaae94257c2f8cbd176e92e	2020-12-11 12:37:35 -08:00
Jerry Zhang	2a3bb1cea0	[quant][graphmode][fx][fix] Fix typo in fusion (#49183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49183 Test Plan: Imported from OSS Reviewed By: hx89 Differential Revision: D25473367 fbshipit-source-id: 0cd5e6769eeea0923dd104ea90b0192e3475b3ad	2020-12-11 12:14:53 -08:00
Alexander Golynski	796b267763	fix backwards compatibility for #48711 and its revert (#49240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49240 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D25500727 Pulled By: agolynski fbshipit-source-id: 6a690f52fe671267862b159b6330d37ef08ee291	2020-12-11 12:07:55 -08:00
Shijun Kong	f965b0fcfb	Expose run_async function on torch::jit::Method (#48607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48607 This change builds on top of https://github.com/pytorch/pytorch/pull/46865 further exposing the async interface to `torch::jit::Method`. added unit test for new `run_async` Test Plan: `buck test caffe2/test/cpp/jit/...` Reviewed By: dzhulgakov Differential Revision: D25219726 fbshipit-source-id: 89743c82a0baa1affe0254c1e2dbf873de8e5c76	2020-12-11 11:17:58 -08:00
Tugsbayasgalan Manlaibaatar	42c78ed745	Tuple Slice with both negative and positive stepped size (#48660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48660 We used to support tuple slicing without any step size before, but this PR extends this feature to support arbitrary step size. We do this by manually reconstructing a new tuple in the IR instead of relying on TupleSlice prim. Test Plan: python tests Imported from OSS Reviewed By: gmagogsfm Differential Revision: D25359336 fbshipit-source-id: 28cde536f28dd8a00607814b2900765e177f0ed7	2020-12-11 11:00:38 -08:00
Rohan Varma	c0a0845019	Improve new_group example in the context of SyncBatchNorm (#48897 ) Summary: Closes https://github.com/pytorch/pytorch/issues/48804 Improves some documentation/example in SyncBN docs to clearly show that each rank must call into all `new_group()` calls for creating process subgroups, even if they are not going to be part of that particular subgroup. We then pick the right group, i.e. the group that the rank is part of, and pass that into the SyncBN APIs. Doc rendering: <img width="786" alt="syncbn_update" src="https://user-images.githubusercontent.com/8039770/101271959-b211ab80-373c-11eb-8b6d-d56483fd9f5d.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/48897 Reviewed By: zou3519 Differential Revision: D25493181 Pulled By: rohan-varma fbshipit-source-id: a7e93fc8cc07ec7797e5dbc356f1c3877342cfa3	2020-12-11 10:28:08 -08:00
Dhruv Matani	f10b53d9ea	[PyTorch Mobile] Record dtypes for tensors used in kernel function implementations (#48826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48826 This change updates various macros to pass in the kernel tag string (`const char`) into the macro that sets up the `case` statement for the dtype switch. This macro already receives the dtype (enum) which we also need. There are 2 phases we need to build out for the `dtype` tracing to work: 1. Recording Phase 2. Conditional Compilation Phase For this most part, this change is trying to focus on [1] (The Recording Phase) and sets up a new `RecordScope` enum value to track kernel dtypes. This code is compiled in only if a specific macro is defined (since this is an extremely* hot code path, and even the slightest regression here can cause tremendous slow down overall). I have only added a skeleton of the phase [2] (Conditional Compilation Phase) and there is a no-op `constexpr` method that selects every dtype in the kernel implementation. In subsequent diffs, this will be updated to point to a code-generated function based on the result of tracing the models that were requested. ghstack-source-id: 118336675 Test Plan: See the next few diff in the stack for the application of this change to both record triggered dtypes (in kernel functions) as well as select dtype specific portions of kernel functions. Reviewed By: ezyang Differential Revision: D24220926 fbshipit-source-id: d7dbf21c7dcc6ce981d0fd4dcb62ca829fe3f69d	2020-12-11 09:41:52 -08:00
Luca Wehrstedt	f204f77e6d	Drop FutureNCCL in favor of vanilla CUDAFuture (#49014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49014 We extracted a generic and reusable CUDAFuture class from FutureNCCL, but we had left FutureNCCL around, as a subclass of CUDAFuture, in order to deal with some peculiarity of ProcessGroupNCCL, namely that the future would be completed right away when constructed and that its CUDA events would be _shared_ with the ones of the WorkNCCL. This required some "hacks" in CUDAFuture itself (protected members, fields wrapped in shared_ptrs, ...). My understanding is that creating CUDA events is a rather cheap operation. That would mean that we could afford to record _twice_ the events after each NCCL call, once for the WorkNCCL and once for the future. By doing so, we can use the CUDAFuture class directly and revert all its hacks. ghstack-source-id: 118391217 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25355272 fbshipit-source-id: 3a2a0891724928221ff0f08600675d2f5990e674	2020-12-11 09:25:05 -08:00
generatedunixname89002005325676	dcd1e3d78d	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D25490983 fbshipit-source-id: b24a11214a485a4a24ccf7da1e72715b450d3a81	2020-12-11 08:43:24 -08:00
Sam Estep	2bb2f641c4	Bring fast_nvcc.py to PyTorch OSS (#48934 ) Summary: This PR adds `tools/fast_nvcc/fast_nvcc.py`, a mostly-transparent wrapper over `nvcc` that parallelizes compilation of CUDA files when building for multiple architectures at once. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48934 Test Plan: Currently this script isn't actually used in PyTorch OSS. Coming soon! Reviewed By: walterddr Differential Revision: D25286030 Pulled By: samestep fbshipit-source-id: 971a404cf57f5694dea899a27338520d25191706	2020-12-11 08:17:21 -08:00
Rong Rong	88b3d3371b	add additional arm64 checker in cmake files (#48952 ) Summary: tentatively fixes https://github.com/pytorch/pytorch/issues/48873 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48952 Reviewed By: H-Huang Differential Revision: D25463266 Pulled By: walterddr fbshipit-source-id: 40afefffe8ab98ae7261c770316cb9c25225285f	2020-12-11 08:10:09 -08:00
Heitor Schueroff	2f1d1eb7df	Revert D25428587: [pytorch][PR] add additional interpolation modes for torch.quantile Test Plan: revert-hammer Differential Revision: D25428587 (`25a8397bf3`) Original commit changeset: e98d24f6a651 fbshipit-source-id: fb217b8a19e853e83779a4edd312be86b26eb26d	2020-12-11 07:50:16 -08:00
Luca Wehrstedt	5ab90b2fda	Make CUDAFuture remember and restore current device in callback (#48789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48789 CUDAFuture aims to "capture" the current state of CUDA-related stuff when the future is marked complete (e.g., by looking at current streams and recording events on them) and then "replicate" a similar state when users synchronize with the result of the future (by synchronizing the current streams with these events). However, one "contextual" aspect of CUDA that we weren't capturing/replicating was the current device. This diff tries to fix that. I must mention that we can only do this for callbacks, while we cannot do it for the wait() method. I don't know if such a discrepancy between the two actually makes the overall behavior _worse_. I'd love to hear people's opinions on this. ghstack-source-id: 118081338 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25210335 fbshipit-source-id: 1d1a3f80b1cc42e5114bc88554ed50617f1aaa90	2020-12-11 03:35:53 -08:00
Yi Wang	2b1057b0cf	[RPC Framework] Support retrieving the RRef to the remote module (#48983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48983 Expose an API for users to retrieve the RRef for the underlying module. This would be useful if users would like to run custom code on the remote end for the nn.Module. Original PR issue: RemoteModule enhancements #40550 ghstack-source-id: 118378601 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule Reviewed By: pritamdamania87 Differential Revision: D25386042 fbshipit-source-id: 2dff33e8d5c9770be464eacf0b26c3e82f49a943	2020-12-10 23:53:44 -08:00
Ailing Zhang	8669f02573	Saves a copy of vector<Tensor> in view ops returning TensorList. (#49149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49149 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25480104 Pulled By: ailzhang fbshipit-source-id: 749345164662b15ec56b7b85a64011929e90c0b2	2020-12-10 23:42:26 -08:00
Bert Maher	fce059d4ff	[te] Don't throw when re-registering a CodeGen factory (#49174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49174 We've seen this happening when libtorch is loaded repeatedly on macOS. Tbh I'm not sure I understand why this happens; why do we re-construct these static objects but re-use the static registry itself? But it's fairly straightforward to just overwrite the factory method and no harm in doing so. ghstack-source-id: 118306581 Test Plan: compile Reviewed By: ZolotukhinM Differential Revision: D25466642 fbshipit-source-id: 4c456a57407f23fa0c9f4e74975ed1186e790c74	2020-12-10 23:37:29 -08:00
Sebastian Messmer	56a157fc79	hacky_wrapper_for_legacy_signatures reorders out arguments (#48911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48911 This enables us to use hacky_wrapper_for_legacy_signatures for ops with out arguments so they can use templated unboxing logic without having to be rewritten. This only actually enables it for one op as a proof of concept. There will be a separate PR enabling it for more ops. ghstack-source-id: 118379659 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D25363336 fbshipit-source-id: da075d2cc58814f886a25d52652511dbbe990cec	2020-12-10 23:29:00 -08:00
Hao Lu	da6f249a10	[caffe2] DeserializeToNDArray (#49135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49135 Differential Revision: D25417845 fbshipit-source-id: 4d8efd440bc2577fb717f911a401e7b81d48b907	2020-12-10 21:59:25 -08:00
Edward Yang	59e822026c	Add manual_cpp_binding to native_functions.yaml (#49092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49092 Functions which specify manual_cpp_binding don't automatically get C++ bindings generated for them in TensorBody.h or Functions.h. This lets end users manually define the bindings themselves, which may be helpful if there is a way to short circuit the dispatcher entirely. contiguous() is switched to use this mechanism. Although manual_cpp_binding suggests that we don't generate the binding at all, it is often the case that there is some "fast path", but when this path is not satisfied, we should go back to the slow dispatch. So we still generate a fallback method/function which the user-defined binding can call into in case that we have to go slowpath. The correctness conditions for bindings manually written in this way are subtle. Here are the ones I can think of off the top of my head: - Whatever condition is tested in the C++ body, must ALSO be tested again in the native:: implementation on the other side of the dispatcher. This is because you are NOT GUARANTEED to hit the native:: implementation through the C++ binding, you may go straight to the implementation via a boxed call. - If a binding is written in this way, it is only safe to skip dispatch if you would have returned the same tensor as before. In any situation you would return a fresh tensor, you MUST go to the slow path, because you need to actually get to the autograd kernel. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25428440 Pulled By: swolchok fbshipit-source-id: 6e71767cb8d1086d56cd827c1d2d56cac8f6f5fe	2020-12-10 21:56:53 -08:00
Scott Wolchok	743a4ef0ae	[PyTorch] Enable AutoNonVariableTypeMode in static runtime (#49199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49199 This should save us an extra round of dispatch for resize_, resize_as_, detach_, and copy_, at the cost of disabling profiling and tracing. I'm told that static runtime has its own per-op profiling and we don't need tracing. ghstack-source-id: 118348314 Test Plan: Code review to confirm lack of need for profiling & tracing, and that there isn't a different switch we should be using instead. Internal benchmarks -- seeing 11-12% improvement in overall runtime Reviewed By: hlu1 Differential Revision: D25476819 fbshipit-source-id: 71e2c919b386b25c41084e2e4a54fe765a4f8f22	2020-12-10 21:51:59 -08:00
Rohan Varma	696e30af6e	Fix ProcessGroupNCCL profiling when profiler is not run with use_cuda (#48946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48946 Move recordFunctionEndCallback to after the blocking portion of launching the NCCL kernel, and remove addCallback since it runs the lambda inline anyways, and triggers unnecessary CUDA stream logic. If we want CUDA operations such as NCCL kernels accurately profiled, we should use the profiler with use_cuda=True. However, we are currently debugging a deadlock for the use_cuda=True case, fix is being tracked in #48987. To ensure that the tests are no longer flaky, submitted this PR to ci-all: #48947 and ran the test a bunch of times ssh'd into the CI machine. ghstack-source-id: 118330130 Test Plan: Ci Reviewed By: mrzzd Differential Revision: D25368322 fbshipit-source-id: 7d17036248a3dcd855e58addc383bba64d6bc391	2020-12-10 21:09:41 -08:00
Zachary DeVito	cc3b59f6df	[package] use bazel-style glob matching for mock/extern (#49066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49066 This PR tweaks mock_module and extern_module. They are now renamed mock and extern, and now only edit the package when a module matching the pattern specified is required through dependency analysis. save_extern_module and save_mock_module are added to explicitly modify the package, but should not be needed by most users of the API unless they are overriding require_package. mock and extern now use bazel-style glob matching rules (https://docs.bazel.build/versions/master/be/functions.html#glob). i.e. `torch.**` matches `torch` and `torch.bar` but not `torchvision`. mock and extern also now take an exclude list to filter out packages that should not apply to the action. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D25413935 Pulled By: zdevito fbshipit-source-id: 5c06b417bee94ac8e72c13985b5ec42fcbe00817	2020-12-10 21:01:11 -08:00
Ilia Cherniavskii	159f258415	Update Kineto revision (#49200 ) Summary: Updating to a newer revision Pull Request resolved: https://github.com/pytorch/pytorch/pull/49200 Test Plan: USE_KINETO=1 TORCH_CUDA_ARCH_LIST="6.0;7.0" USE_CUDA=1 USE_MKLDNN=1 BUILD_BINARY=1 python setup.py develop install --cmake python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record Fixes #{issue number} Reviewed By: ngimel Differential Revision: D25480439 Pulled By: ilia-cher fbshipit-source-id: bca1f708f5e4a052028304b918a3adae9324318f	2020-12-10 19:51:10 -08:00
Nick Gibson	5469aa5e7f	[NNC] Add a non functional Tensor kind (#48750 ) Summary: Adds the CompoundTensor, a specialisation of the NNC Tensor which allows arbitrary production statements. This will allow lowering of aten ops into specific NNC IR patterns (which don't need to be functional) - allowing us to shortcut to the optimized form of common patterns. This is part 1 of trying to clean up the lowering of aten::cat so it is easier to optimize. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48750 Reviewed By: tugsbayasgalan Differential Revision: D25433517 Pulled By: nickgg fbshipit-source-id: de13c4719f8f87619ab254e5f324f13b5be1c9da	2020-12-10 19:43:50 -08:00
Edward Yang	9b0ffb9fb3	Delete cpp.group_arguments (#49043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49043 Previously, this function had nontrivial algorithmic content, but after #48195, this was just a swiss army knife for pasting together arguments while maintaining structure. I added some more properties for Arguments for convenient access in this way, and then inlined the implementation of group_arguments into all of its call sites, simplifying whenever contextual. This might be controversial, but I think the resulting code is easier to understand. You may notice that there is some modest code duplication between dispatcher.cpparguments_exprs and CppSignature.argument_packs. This is a known problem and I will be attempting to fix it in a follow up PR. Confirmed to be byte-for-byte compatible. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D25455885 Pulled By: ezyang fbshipit-source-id: 8fbe066e8c3cb7ee8adb5b87296ec5bd7b49e01f	2020-12-10 18:20:46 -08:00
Edward Yang	267641a245	Rename positional and kwarg_only to have flat prefix (#49042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49042 I want the names positional and kwarg_only to give the unflat representation (e.g., preserving TensorOptionsArguments in the returned Union). So I regret my original naming choice when I moved grouping to model. This renames them to have flat_ prefix and also adds a flat_non_out argument for cases where you just want to look at non-out arguments. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D25455884 Pulled By: ezyang fbshipit-source-id: f923f8881267a3e3e8e9521519412f7cc25034fc	2020-12-10 18:20:43 -08:00
Edward Yang	0dea76ecda	Delete some dead functions from tools.codegen.api.meta (#49041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49041 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D25455886 Pulled By: ezyang fbshipit-source-id: 5d7834d52f7032820ac2c73358bda77187c17224	2020-12-10 18:16:09 -08:00
Jerry Zhang	882eb0f646	[quant][graphmode][fx] Add support for dynamic quant for RNN and RNNCell (#49126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49126 Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_rnn python test/test_quantization.py TestQuantizeFxOps.test_rnn_cell Imported from OSS Reviewed By: raghuramank100 Differential Revision: D25449047 fbshipit-source-id: 532bf9ad2839958dde8c6f2d9399fac96b2b8bd4	2020-12-10 18:11:40 -08:00
Peng Wu	a47a087a43	[NNC] Add missing data type support for abs and frac (#48679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48679 This addresses the remaining problem reported in issue #48053 Data type supports for aten kernels in SimpleIREvaluator are not consistent w/ aten::native library implementation. In SimpleIREvaluator, - only float/double are supported on aten::abs (integral types and half are missing) - only float/double are supported on aten::frac (half are missing) It is also not clear from kernel.cpp source code what are the expected input data types for an aten kernel, leading to potential missing data type issues down the road. This commit addresses both issues in a limited way by - Added type promotion ops from half/integral input types to float - Added a skeleton support for some type checking for aten kernels, currently, only check for valid data types for frac and abs to limit the scope of the change; but the utility function can be used for consistently adding type checking for all aten functions Known limitations: - abs support for integral types can be made more effective by invoking std::abs for integral tensors (currently kFabs maps to std::fabs). Since that change is a bit more involved (e.g., changing IntrinsicsOp kFabs to kAbs and other code generators accordingly), will leave it to another issue - other aten kernels may need similar type checking and some scrutiny on the use of promoteToFloat to detect invalid data types early on. That is also left for another issue Test Plan: test_jit_fuser_te.test_unary_ops Imported from OSS Reviewed By: asuhan Differential Revision: D25344839 fbshipit-source-id: 95aca04c99b947dc20f11e4b3bae002f0ae37044	2020-12-10 17:47:15 -08:00
Ailing Zhang	7feec06dfe	Only 1 TensorImpl allocation in differentiable views. (#48896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48896 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25380895 Pulled By: ailzhang fbshipit-source-id: 4d565e6312e860a2ff185a3f8b552005ddd29695	2020-12-10 17:39:40 -08:00
Michael Suo	5e8cfec332	Add a newline before dependency graph output (#49127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49127 Small change, but useful: it means that double-clicking the line lets you copy the url easily Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D25450408 Pulled By: suo fbshipit-source-id: 8b13b971b444187a8de59c89cc8f60206035b2ad	2020-12-10 17:03:23 -08:00
Nikita Shulga	57145c910f	Revert D24711613: [pytorch][PR] Preserve submodule with __set_state__ in freezing Test Plan: revert-hammer Differential Revision: D24711613 (`a3e1bd1fb9`) Original commit changeset: 22e51417454a fbshipit-source-id: c2090b15fdba2d6c9dc1fbd987d32229dd898608	2020-12-10 16:26:38 -08:00
James Reed	80f7510d92	[FX] Fix create_arg for NamedTuple (#48986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48986 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D25387156 Pulled By: jamesr66a fbshipit-source-id: 0d38c43e02088fb7afb671683c88b6e463fe7c76	2020-12-10 15:32:04 -08:00
Rong Rong	69522410fa	add user vs internal msg support in common_utils.TestCase (#48935 ) Summary: should fixes https://github.com/pytorch/pytorch/issues/48879. To test the effect of the messages: make test break, such as add `self.assertEqual(1, 2, "user_msg")` to any test * Before: ``` AssertionError: False is not true : user_msg ``` * After ``` AssertionError: False is not true : Scalars failed to compare as equal! Comparing 1 and 2 gives a difference of 1, but the allowed difference with rtol=0 and atol=0 is only 0! user_msg; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48935 Reviewed By: samestep Differential Revision: D25382153 Pulled By: walterddr fbshipit-source-id: 95633a9f664f4b05a28801786b12a10bd21ff431	2020-12-10 15:25:46 -08:00
Nikita Shulga	84fce6d29a	[AARCH64] Fix HAS_VST1 check if compiled by clang (#49182 ) Summary: Use `UL` suffix supported by all C99 compatible compilers instead of `__AARCH64_UINT64_C`, which is a gcc specific extension Before the change this check would have failed as follows with a bug-free clang compiler with the following errors: ``` $ clang has_vst1.c has_vst1.c:5:41: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration] v.val[0] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0))); ^ has_vst1.c:5:79: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration] v.val[0] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0))); ^ has_vst1.c:6:41: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration] v.val[1] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0))); ^ has_vst1.c:6:79: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration] v.val[1] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0))); ^ 4 warnings generated. /tmp/has_vst1-b1e162.o: In function `main': has_vst1.c:(.text+0x30): undefined reference to `__AARCH64_UINT64_C' ``` Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/49182 Reviewed By: walterddr Differential Revision: D25471994 Pulled By: malfet fbshipit-source-id: 0129a6f7aabc46aa117ef719d3a211449cb410f1	2020-12-10 15:19:12 -08:00
Bram Wasti	f4226b5c90	[static runtime] add static subgraph fusion pass (#49185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49185 This diff adds a fusion feature that will let us use static runtime for parts of the graph. This will prove useful in cases where fully eliminating control flow is hard etc. TODO: [x] factor out into separate fusion file [x] add python test case [x] add graph that isn't fully lowered test case [x] add graph that has weird list/tuple outputs test case the loop example looks quite good: ``` graph(%a.1 : Tensor, %b.1 : Tensor, %iters.1 : int): %12 : bool = prim::Constant[value=1]() # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:110:4 %c.2 : Tensor = prim::StaticSubgraph_0(%a.1, %b.1) %c : Tensor = prim::Loop(%iters.1, %12, %c.2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:110:4 block0(%i : int, %c.12 : Tensor): %c.10 : Tensor = prim::StaticSubgraph_1(%a.1, %c.12, %b.1) -> (%12, %c.10) return (%c) with prim::StaticSubgraph_0 = graph(%0 : Tensor, %4 : Tensor): %5 : int = prim::Constant[value=2]() %6 : Tensor = aten::mul(%4, %5) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:109:12 %2 : int = prim::Constant[value=1]() %c.2 : Tensor = aten::add(%0, %6, %2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:109:8 return (%c.2) with prim::StaticSubgraph_1 = graph(%1 : Tensor, %7 : Tensor, %8 : Tensor): %9 : int = prim::Constant[value=1]() %c.4 : Tensor = aten::add(%7, %8, %9) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:111:12 %5 : int = prim::Constant[value=2]() %c.7 : Tensor = aten::mul_(%c.4, %5) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:112:8 %2 : int = prim::Constant[value=1]() %c.10 : Tensor = aten::sub_(%c.7, %1, %2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:113:8 return (%c.10) ``` (Note: this ignores all push blocking failures!) Test Plan: buck test mode/no-gpu //caffe2/benchmarks/static_runtime:static_runtime_cpptest buck test mode/no-gpu caffe2/test:static_runtime Reviewed By: bertmaher Differential Revision: D25385702 fbshipit-source-id: 2f24af4f11d92a959167facd03fbd24f464a6098	2020-12-10 14:03:11 -08:00
Abdelrauf	95a1725a4a	Vsx initial support issue27678 (#41541 ) Summary: ### Pytorch Vec256 ppc64le support implemented types: - double - float - int16 - int32 - int64 - qint32 - qint8 - quint8 - complex_float - complex_double Notes: All basic vector operations are implemented: There are a few problems: - minimum maximum nan propagation for ppc64le is missing and was not checked - complex multiplication, division, sqrt, abs are implemented as PyTorch x86. they can overflow and have precision problems than std ones. That's why they were either excluded or tested in smaller domain range - precisions of the implemented float math functions ~~Besides, I added CPU_CAPABILITY for power. but as because of quantization errors for DEFAULT I had to undef and use vsx for DEFAULT too~~ #### Details ##### Supported math functions + plus sign means vectorized, - minus sign means missing, (implementation notes are added inside braces) (notes). Example: -(both ) means it was also missing on x86 side g( func_name) means vectorization is using func_name sleef - redirected to the Sleef unsupported function_name \| float \| double \| complex float \| complex double \|-- \| -- \| -- \| -- \| --\| acos \| sleef \| sleef \| f(asin) \| f(asin) asin \| sleef \| sleef \| +(pytorch impl) \| +(pytorch impl) atan \| sleef \| sleef \| f(log) \| f(log) atan2 \| sleef \| sleef \| unsupported \| unsupported cos \| +((ppc64le:avx_mathfun) ) \| sleef \| -(both) \| -(both) cosh \| f(exp) \| -(both) \| -(both) \| erf \| sleef \| sleef \| unsupported \| unsupported erfc \| sleef \| sleef \| unsupported \| unsupported erfinv \| - (both) \| - (both) \| unsupported \| unsupported exp \| + \| sleef \| - (x86:f()) \| - (x86:f()) expm1 \| f(exp) \| sleef \| unsupported \| unsupported lgamma \| sleef \| sleef \| \| log \| + \| sleef \| -(both) \| -(both) log10 \| f(log) \| sleef \| f(log) \| f(log) log1p \| f(log) \| sleef \| unsupported \| unsupported log2 \| f(log) \| sleef \| f(log) \| f(log) pow \| + f(exp) \| sleef \| -(both) \| -(both) sin \| +((ppc64le:avx_mathfun) ) \| sleef \| -(both) \| -(both) sinh \| f(exp) \| sleef \| -(both) \| -(both) tan \| sleef \| sleef \| -(both) \| -(both) tanh \| f(exp) \| sleef \| -(both) \| -(both) hypot \| sleef \| sleef \| -(both) \| -(both) nextafter \| sleef \| sleef \| -(both) \| -(both) fmod \| sleef \| sleef \| -(both) \| -(both) [Vec256 Test cases Pr https://github.com/pytorch/pytorch/issues/42685](https://github.com/pytorch/pytorch/pull/42685) Current list: - [x] Blends - [x] Memory: UnAlignedLoadStore - [x] Arithmetics: Plus,Minu,Multiplication,Division - [x] Bitwise: BitAnd, BitOr, BitXor - [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual - [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp - [x] SignManipulation: Absolute, Negate - [x] Interleave: Interleave, DeInterleave - [x] Rounding: Round, Ceil, Floor, Trunc - [x] Mask: ZeroMask - [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal - [x] Trigonometric: Sin, Cos, Tan - [x] Hyperbolic: Tanh, Sinh, Cosh - [x] InverseTrigonometric: Asin, ACos, ATan, ATan2 - [x] Logarithm: Log, Log2, Log10, Log1p - [x] Exponents: Exp, Expm1 - [x] ErrorFunctions: Erf, Erfc, Erfinv - [x] Pow: Pow - [x] LGamma: LGamma - [x] Quantization: quantize, dequantize, requantize_from_int - [x] Quantization: widening_subtract, relu, relu6 Missing: - [ ] Constructors, initializations - [ ] Conversion , Cast - [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex) #### Notes on tests and testing framework - some math functions are tested within domain range - mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions. - some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~ - round was tested against pytorch at::native::round_impl. ~~for double type on Vsx vec_round failed for (even)+0 .5 values~~ . it was solved by using vec_rint - ~~complex types are not tested~~ After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain - ~~quantizations are not tested~~ Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions - the testing framework should be improved further - ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~ Vec256 Test cases will be built for each CPU_CAPABILITY Pull Request resolved: https://github.com/pytorch/pytorch/pull/41541 Reviewed By: zhangguanheng66 Differential Revision: D23922049 Pulled By: VitalyFedyunin fbshipit-source-id: bca25110afccecbb362cea57c705f3ce02f26098	2020-12-10 13:42:39 -08:00
Zino Benaissa	a3e1bd1fb9	Preserve submodule with __set_state__ in freezing (#47308 ) Summary: This PR does the following: - fail freezing if input module has __set_state__ method - preserves attributes of submodules with __set_state__ method. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/47308 Reviewed By: eellison Differential Revision: D24711613 Pulled By: bzinodev fbshipit-source-id: 22e51417454aaf85cc0ae4acb2dc7fc822f149a2	2020-12-10 13:36:34 -08:00
Meghan Lele	a480ca5302	[JIT] Use `is_buffer` in `BufferPolicy::valid` (#49053 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49053 Summary `BufferPolicy::valid` uses `!typ->is_parameter(i)` to check if an attribute is a buffer or not; it should use `type->is_buffer(i)` instead. Test Plan It is difficult to write an additional test that would have failed before this commit because the two booleans `is_parameter` and `is_buffer` are never set to `true` at the same time. Fixes This commit fixes #48746. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25434956 Pulled By: SplitInfinity fbshipit-source-id: ff2229058abbafed0b67d7b26254d406e5f7b074	2020-12-10 13:10:51 -08:00
Sebastian Messmer	c892c3ac9a	remove hacky_wrapper from BackendSelect (#49079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49079 BackendSelect kernels have been changed to be written the new way, so this hacky_wrapper here isn't needed anymore. This PR is not expected to change perf or anything, just simplify the code a bit. The hacky_wrapper here was a no-op and not creating any actual wrappers because it short-cirtuits to not create a wrapper when there is no wrapper needed. ghstack-source-id: 118318436 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D25421633 fbshipit-source-id: 7a6125613f465dabed155dd892c8be6af5c617cf	2020-12-10 12:54:29 -08:00
Sebastian Messmer	21dba8c1ad	Make aten::div.out c10-full (#47793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47793 This migrates aten::div.out to be c10-full (without hacky wrapper) and fixes everything that needed to be fixed to make it work. This is a prerequisite step to making out ops c10-full. Diffs stacked on top of this will introduce a hacky_wrapper for out ops and use it to make more ops c10-full. ghstack-source-id: 118318433 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D24901944 fbshipit-source-id: e477cb41675e477808c76af01706508beee44752	2020-12-10 12:52:50 -08:00
shubhambhokare1	e1c1a7e964	[ONNX] Changes to export API to better handle named arguments (#47367 ) Summary: The args parameter of ONNX export is changed to better support optional arguments such that args is represented as: args (tuple of arguments or torch.Tensor, a dictionary consisting of named arguments (optional)): a dictionary to specify the input to the corresponding named parameter: - KEY: str, named parameter - VALUE: corresponding input Pull Request resolved: https://github.com/pytorch/pytorch/pull/47367 Reviewed By: H-Huang Differential Revision: D25432691 Pulled By: bzinodev fbshipit-source-id: 9d4cba73cbf7bef256351f181f9ac5434b77eee8	2020-12-10 12:31:00 -08:00
Ryan Hileman	0c70585505	fix #49064 (invalid escape) by using raw strings (#49065 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49064 by using raw strings I removed `# noqa: W605` because that's the "invalid escape sequence" check: https://www.flake8rules.com/rules/W605.html I wrote a quick test to make sure the strings are the same before and after this PR. This block should print `True` (it does for me). ``` convolution_notes1 = \ {"groups_note": r"""* :attr:`groups` controls the connections between inputs and outputs. :attr:`in_channels` and :attr:`out_channels` must both be divisible by :attr:`groups`. For example, * At groups=1, all inputs are convolved to all outputs. * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated. * At groups= :attr:`in_channels`, each input channel is convolved with its own set of filters (of size :math:`\frac{\text{out\_channels}}{\text{in\_channels}}`).""", "depthwise_separable_note": r"""When `groups == in_channels` and `out_channels == K * in_channels`, where `K` is a positive integer, this operation is also known as a "depthwise convolution". In other words, for an input of size :math:`(N, C_{in}, L_{in})`, a depthwise convolution with a depthwise multiplier `K` can be performed with the arguments :math:`(C_\text{in}=C_\text{in}, C_\text{out}=C_\text{in} \times \text{K}, ..., \text{groups}=C_\text{in})`."""} # noqa: B950 convolution_notes2 = \ {"groups_note": """* :attr:`groups` controls the connections between inputs and outputs. :attr:`in_channels` and :attr:`out_channels` must both be divisible by :attr:`groups`. For example, * At groups=1, all inputs are convolved to all outputs. * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated. * At groups= :attr:`in_channels`, each input channel is convolved with its own set of filters (of size :math:`\\frac{\\text{out\_channels}}{\\text{in\_channels}}`).""", # noqa: W605 "depthwise_separable_note": """When `groups == in_channels` and `out_channels == K * in_channels`, where `K` is a positive integer, this operation is also known as a "depthwise convolution". In other words, for an input of size :math:`(N, C_{in}, L_{in})`, a depthwise convolution with a depthwise multiplier `K` can be performed with the arguments :math:`(C_\\text{in}=C_\\text{in}, C_\\text{out}=C_\\text{in} \\times \\text{K}, ..., \\text{groups}=C_\\text{in})`."""} # noqa: W605,B950 print(convolution_notes1 == convolution_notes2) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49065 Reviewed By: agolynski Differential Revision: D25464507 Pulled By: H-Huang fbshipit-source-id: 88a65a24e3cc29774af25e09823257b2136550fe	2020-12-10 12:22:49 -08:00
Elias Ellison	3b57be176e	[NNC] Preserve strided output (#48264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48264 Preserves the strided representation of NNC Tensor outputs by transforming them into the right layout at the end of the kernel. Fix for https://github.com/pytorch/pytorch/issues/45604 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D25286213 Pulled By: eellison fbshipit-source-id: 64d94ac463741e2568a1c9d44174e15ea26e511f	2020-12-10 12:19:51 -08:00
Elias Ellison	0b9d5e65e4	Remove inferred from tensor type ctors (#48263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48263 The inferred type is only used once in `getInferred` and is confusing next to the other parameters. It has nothing to do with runtime values, it just means the type was inferred in type-checking. There are a bunch of parameters and overloads of Tensor instantiation as is. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25286211 Pulled By: eellison fbshipit-source-id: 3dfc44ab7ff4fbf0ef286ae8716a4afac646804b	2020-12-10 12:19:49 -08:00
Elias Ellison	71ddc0ba19	[TensorExpr Fuser] Add support for nodes which have tensor constant inputs (#47814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47814 Previously, we would bail completely if a node had a constant tensor input. This PR adds support for this case by lifting the constant out of the fusion graph after we've done fusion. It might be nice to add support for Tensor Constants in NNC itself, but it looked kind of tricky and this is an easy enough temporary solution. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25286215 Pulled By: eellison fbshipit-source-id: 9ff67f92f5a2d43fd3ca087569898666525ca8cf	2020-12-10 12:19:47 -08:00
Elias Ellison	413caa7fd2	[NNC] Compute Tensor Output Properties in ininitialization (#47813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47813 We have some code paths that at kernel invocation seem to handle dynamic sizes, but I'm not sure how well it works because we have other parts of our code base that assume that tenso shapes are always fully specified. https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/kernel.cpp#L1572 As with some other PRs in the stack, I think it would be good to remove the features that aren't on/actively being worked on while they are not used. I initially did this PR to try to speed up perf. I couldn't observe too much of a speed up, so we can decide to keep drop this PR if we want. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25286212 Pulled By: eellison fbshipit-source-id: 4ae66e0af88d649dd4e592bc78686538c2fdbaeb	2020-12-10 12:19:45 -08:00
Elias Ellison	0e666a9f5a	[TensorExpr] Cache use of fallback in kernel invocation (#47812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47812 Previously we were checking the environment every kernel invocation for `tensorExprFuserEnabled`, which checks the environment for `PYTORCH_TENSOREXPR`. This is only a dev-exposed API, so I think it is fine to only check once when the kernel is initialized. The `disable_optimization` flag which is user-exposed more or less covers the same functionality. For fun, some benchmarking. I compared scripted before and after of ``` def foo(x, y): return x + y ``` for x, y = torch.tensor([1]). I also removed the prim::TypeCheck node to better isolate the kernel (I cheated). Here is gist: https://gist.github.com/eellison/39f3bc368f5bd1f25ded4827feecd15e Without Changes Run 1: no fusion: sum 6.416894399004377 min: 0.6101883250012179 median 0.6412974080012646 with fusion: sum 6.437897570998757 min: 0.6350401220006461 median 0.6446951820034883 Without Changes Run2: no fusion: sum 6.601341788002173 min: 0.6292048720024468 median 0.6642187059987918 with fusion: sum 6.734651455997664 min: 0.6365462899993872 median 0.6755226659988693 With Changes Run1: no fusion: sum 6.097717430002376 min: 0.5977709550024883 median 0.613631643998815 with fusion: sum 6.1299369639964425 min: 0.5857932209983119 median 0.6159247440009494 With Changes Run2: no fusion: sum 6.5672018059995025 min: 0.6245676209982776 median 0.6386050750006689 with fusion: sum 6.489086147994385 min: 0.6236886289989343 median 0.6535737619997235 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25286210 fbshipit-source-id: a18b4918a7f7bed8a39112ae04b678e79026d39b	2020-12-10 12:19:42 -08:00
Elias Ellison	70853c5021	Dont use symbolic shapes check (#47810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47810 `bindSymbolicShapes` wasn't checking device or dtype at all, so it wasn't correct. It also isn't being used anywhere (num_profiles is always 1 and we don't use symbolic shapes). We shouldn't have it on until we are actually using symoblic shapes. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25286214 Pulled By: eellison fbshipit-source-id: 10fb175d0c75bd0159fb63aafc3b59cc5fd6c5af	2020-12-10 12:14:58 -08:00
Brian Hirsh	18c03b9f00	make duplicate def() calls an error in the dispatcher (#48098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48098 Test Plan: Imported from OSS *** make duplicate def() calls an error in the dispatcher. Updating all fb operators to use the new dispatcher registration API Reviewed By: ezyang Differential Revision: D25056089 Pulled By: bdhirsh fbshipit-source-id: 8d7e381f16498a69cd20e6955d69acdc9a1d2791	2020-12-10 11:38:52 -08:00
Yuchen Huang	2519348f60	[Binary Push] Update the awscli installation, use conda install rather than brew install (#49175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49175 As title ghstack-source-id: 118306312 Test Plan: CI Reviewed By: xta0 Differential Revision: D25466577 fbshipit-source-id: 67a521947db3744695f0ab5f421483ab96d8ed9f	2020-12-10 11:10:51 -08:00
Yuchen Huang	edbf9263ad	[iOS] Bump up the cocoapods version (#49176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49176 Bump up the cocoapods version ghstack-source-id: 118305636 Test Plan: CI Reviewed By: xta0 Differential Revision: D25466321 fbshipit-source-id: 916adc514c5edc8971445da893362a160cfc092b	2020-12-10 11:07:49 -08:00
Xiong Wei	909a9060e9	[vmap] implement batching rule for fill_ and zero_ (#48516 ) Summary: Fix https://github.com/pytorch/pytorch/issues/47755 - This PR implements batching rules for in-place operators `fill_` and `zero_`. - Testcases are added to the `test/test_vmap.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48516 Reviewed By: H-Huang Differential Revision: D25431557 Pulled By: zou3519 fbshipit-source-id: 437b0534dc0b818fbe05f7fcfcb649aa677483dc	2020-12-10 10:59:05 -08:00
Yixin Bao	840e71f4e6	Check CUDA kernel launches (/fbcode/caffe2/) (#49145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49145 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49105 (1) Add a safety check `C10_CUDA_KERNEL_LAUNCH_CHECK()` after each kernel launch. This diff only changes the files inside the directory /fbsource/fbcode/caffe2/modules/, /fbsource/fbcode/caffe2/fb/, /fbsource/fbcode/caffe2/test/. (2) Get rid of old check `AT_CUDA_CHECK(cudaGetLastError())` when necessary. Test Plan: Test build: ``` buck build mode/dev-nosan //caffe2/modules/detectron: buck test mode/dev-nosan //caffe2/modules/detectron: buck build mode/dev-nosan //caffe2/torch/fb/: buck test mode/dev-nosan //caffe2/torch/fb/: ``` To check for launches without checks: ``` python3 caffe2/torch/testing/check_kernel_launches.py ``` Make sure none of the updated files are in the returned list. Reviewed By: r-barnes Differential Revision: D25452852 fbshipit-source-id: d6657edab612c9e0fa99b29c68460be8b1a20064	2020-12-10 10:43:03 -08:00
Peter Bell	524adfbffd	Use new FFT operators in stft (#47601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47601 Fixes https://github.com/pytorch/pytorch/issues/42175#issuecomment-719933913 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25457217 Pulled By: mruberry fbshipit-source-id: 455d216edd0b962eb7967ecb47cccc8d6865975b	2020-12-10 10:31:50 -08:00
Kurt Mohler	54f0556ee4	Add missing complex support for torch.norm and torch.linalg.norm (#48284 ) Summary: BC-breaking note: Previously, when given a complex input, `torch.linalg.norm` and `torch.norm` would return a complex output. `torch.linalg.cond` would sometimes return a complex output and sometimes return a real output when given a complex input, depending on its `p` argument. This PR changes this behavior to match `numpy.linalg.norm` and `numpy.linalg.cond`, so that a complex input will result in the downgraded real number type, consistent with NumPy. PR Summary: The following cases were previously unsupported for complex inputs, and this commit adds support: - Frobenius norm - Norm order 2 (vector and matrix) - CUDA vector norm Part of https://github.com/pytorch/pytorch/issues/47833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48284 Reviewed By: H-Huang Differential Revision: D25420880 Pulled By: mruberry fbshipit-source-id: 11f6a2f3cad57d66476d30921c3f6ab8f3cd4017	2020-12-10 10:23:45 -08:00
Xiong Wei	25a8397bf3	add additional interpolation modes for torch.quantile (#48711 ) Summary: Fix https://github.com/pytorch/pytorch/issues/48523 Related https://github.com/pytorch/pytorch/issues/38349 BC-breaking Note: This PR updates PyTorch's quantile function to add additional interpolation methods `lower`, `higher`, `nearest`, and `midpoint`, and these interpolation methods are currently supported by NumPy. New parameter `interpolation` is added to the signature for both `torch.quantile` and `torch.nanquantile` functions. - `quantile(input, q, dim=None, interpolation='linear', keepdim=False, , out=None) -> Tensor` - `nanquantile(input, q, dim=None, interpolation='linear', keepdim=False, , out=None) -> Tensor` Function signatures followed the NumPy-like style for the moment, keeping `out` at the end to be consistent with PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48711 Reviewed By: H-Huang Differential Revision: D25428587 Pulled By: heitorschueroff fbshipit-source-id: e98d24f6a651d302eb94f4ff4da18e38bdbf0124	2020-12-10 10:10:51 -08:00
Gao, Xiang	45473ffe23	Refactor cudnn convolution (#49109 ) Summary: cuDNN v7 API has been deprecated, so we need to migrate to cuDNN v8 API. The v8 API does not exist on cuDNN 7, so there will be a long time both API should exist. This is step 0 of adding cuDNN v8 API. There is no real code change in this PR. It just copy-pastes existing code. The original `Conv.cpp` is split into `ConvPlaceholders.cpp`, `ConvShared.cpp`, `ConvShared.h`, `Conv_v7.cpp`, `Conv_v8.cpp`. Currently `Conv_v8.cpp` is empty, and will be filled in the future. The `ConvPlaceholders.cpp` contains placeholder implementation of cudnn convolution when cudnn is not enabled. These operators only raise errors and do no real computation. This file also contains deprecated operators. These operators are implemented using current operators. The `ConvShared.cpp` and `ConvShared.h` contains code that will be shared by the v7 and v8 API, these include the definition of struct `ConvolutionParams` and `ConvolutionArgs`. As well as ATen exposed API like `cudnn_convolution` and intermediate `cudnn_convolution_forward`. These exposed functions will call raw API like `raw_cudnn_convolution_forward_out` in `Conv_v7.cpp` or `Conv_v8.cpp` for the real implementation. The `Conv_v7.cpp`, `Conv_v8.cpp` contains the implementation of raw APIs, and are different for v7 and v8. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49109 Reviewed By: H-Huang Differential Revision: D25463783 Pulled By: ezyang fbshipit-source-id: 1c80de8e5d94d97a61e45687f6193e8ff5481e3e	2020-12-10 10:06:12 -08:00
Jeff Daily	d5c4a80cfd	Allow ROCm CI to use non-default stream. (#48424 ) Summary: Revert https://github.com/pytorch/pytorch/issues/26394. Fixes https://github.com/pytorch/pytorch/issues/27356. Not all MIOpen handles were setting their stream to the current stream prior to running the op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48424 Reviewed By: H-Huang Differential Revision: D25420384 Pulled By: mruberry fbshipit-source-id: 051683ba9e3d264b71162bd344031a0c58bf6a41	2020-12-10 09:55:11 -08:00
Bram Wasti	195b92bfa6	Revert D25441716: [te] Add BitCast to the IR Test Plan: revert-hammer Differential Revision: D25441716 (`3384145418`) Original commit changeset: c97b871697bc fbshipit-source-id: e6eff02e28e1ae8c826dd2cfed79f869839ed2ba	2020-12-10 09:31:35 -08:00
Bram Wasti	3384145418	[te] Add BitCast to the IR Summary: Adds BitCasting to NNC. This will enable fast approximation algorithms implemented directly in TensorExpressions Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr Reviewed By: bertmaher Differential Revision: D25441716 fbshipit-source-id: c97b871697bc5931d09cda4a9cb0a81bb420f4e2	2020-12-10 09:25:46 -08:00
Rong Rong	21c04b4438	make AT_FFTW_ENABLED available to fb internal Summary: follow up on D25375320 (`b89c328493`). Test Plan: buck build Reviewed By: samestep Differential Revision: D25410973 fbshipit-source-id: 6c2627951a98d270d341b33538431644d03bed16	2020-12-10 07:36:35 -08:00
Wang Xu	33bc7918e8	fix some comments in accelerator_partitioner.py (#49104 ) Summary: Fix some comments in accelerator_partittioner.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/49104 Reviewed By: gcatron Differential Revision: D25434999 Pulled By: scottxu0730 fbshipit-source-id: ce83b411cf959aabec119532ad42a892a2223286	2020-12-10 07:06:05 -08:00
Martin Yuan	c7b8f3e2cd	Decouple direct access to native::scalar_tensor from TensorIndexing.h (#48761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48761 Targeting one of the items in https://github.com/pytorch/pytorch/issues/48684. For performance purpose we don't use at::scalar_tensor. Since scalar_tensor_static is available for CPU we could use it at least for CPU. One uncertainty is the CUDA performance. But there's no fast path for CUDA under native::scalar_tensor either, I assume the perf on CUDA may not be affected. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25410975 Pulled By: iseeyuan fbshipit-source-id: 160d21ffeefc9a2e8f00a55043144eebcada2aac	2020-12-10 05:34:35 -08:00
Luca Wehrstedt	2255e68da8	Revert D25433268: [PyTorch Mobile] Preserve bundled input related methods when calling optimize_for_mobile Test Plan: revert-hammer Differential Revision: D25433268 (`95233870f2`) Original commit changeset: 0bf9b4afe64b fbshipit-source-id: bba97e48ce0e72f9d1db5159065bb6495d62666c	2020-12-10 04:39:30 -08:00
Luca Wehrstedt	b5a7e25059	Cache the DataPtrs in CUDAFuture (#48788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48788 CUDAFuture needs to inspect the value it contains in order to first determine what devices its tensors reside on (so that it can record events on those devices), and then to record these tensors with the caching allocator when they are used in other streams. Extracting data ptrs can become somewhat expensive (especially if we resort to using the pickler to do that), hence it's probably a good idea to cache the result the first time we compute it. ghstack-source-id: 118180023 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25303486 fbshipit-source-id: 5c541640f6d19249dfb5489ba5e8fad2502836fb	2020-12-10 03:54:29 -08:00
Luca Wehrstedt	030fa6cfba	Split out reusable CUDAFuture from FutureNCCL (#48506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48506 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- FutureNCCL is now a general-purpose type-agnostic multi-device class, so in this commit I extract it from ProcessGroupNCCL to make it available for wider use (notably by the RPC module). We'll call this new class CUDAFuture. We'll keep FutureNCCL as a subclass of CUDAFuture to deal with some NCCL peculiarity, namely the fact that the future becomes complete immediately upon creation. We can clean this up for good once we're done merging Future and Work. I'm not exactly sure of where to put CUDAFuture. It needs to be available to both c10d and RPC (which lives under torch/csrc). If I figured CMake out correctly (and that's a big if) I think c10d can only depend on ATen (I'll maybe add a comment with how I tracked that down). Hence we cannot put CUDAFuture in torch/csrc. On the other hand, RPC currently depends on c10d, because RPC agents use ProcessGroups internally, so it would be "ok" to put CUDAFuture in c10d. However, we want to get rid of ProcessGroups in RPC, and at that point RPC should in principle not depend on c10d. In that case, the only shared dep between the two that I see is ATen itself. While I'm a bit wary of putting it right in ATen, I think it might actually make sense. CUDAFuture is intended to be a general-purpose component that can be reused in all settings and is not particularly tied to c10d or RPC. Moreover, ATen already contains ivalue::Future, and it contains a lot of CUDA helpers, so CUDAFuture definitely belongs to the "closure" of what's already there. ghstack-source-id: 118180030 Test Plan: Unit tests? Reviewed By: wanchaol Differential Revision: D25180532 fbshipit-source-id: 697f655240dbdd3be22a568d5102ab27691f86d4	2020-12-10 03:54:26 -08:00
Luca Wehrstedt	4c425e8da0	Merge common parts of FutureNCCL into at::ivalue::Future (#48505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48505 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- FutureNCCL isn't just adding CUDA support to ivalue::Future, it's also reimplementing a lot of the latter's logic (by overriding plenty of its methods). That's brittle, as whenever a new method is added to ivalue::Future there's a risk of forgetting to add it to FutureNCCL, and in such a case calling this method on FutureNCCL would defer to the base class and give inconsistent results (e.g., future not being completed when it actually is). This _is already happening_, for example with the waitAndThrow or hasError, which are not implemented by FutureNCCL. In addition, this creates duplication between the two classes, which could lead to inconsistencies of behavior, bugs, missing features, ... The best solution would be to keep the core future logic in ivalue::Future, and have _only_ the CUDA additions in FutureNCCL. That's what we're going to do, in two steps. In the previous commit, I split the CUDA features into separate hooks, which are called by FutureNCCL's other methods. In this commit, I'm removing these latter methods, and invoke the hooks directly from ivalue::Future. ghstack-source-id: 118180032 Test Plan: Unit tests Reviewed By: wanchaol Differential Revision: D25180535 fbshipit-source-id: 19181fe133152044eb677062a9e31e5e4ad3c03c	2020-12-10 03:54:22 -08:00
Luca Wehrstedt	9078088edb	Split FutureNCCL's CUDA-specific parts from generic future logic (#48504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48504 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- FutureNCCL isn't just adding CUDA support to ivalue::Future, it's also reimplementing a lot of the latter's logic (by overriding plenty of its methods). That's brittle, as whenever a new method is added to ivalue::Future there's a risk of forgetting to add it to FutureNCCL, and in such a case calling this method on FutureNCCL would defer to the base class and give inconsistent results (e.g., future not being completed when it actually is). This _is already happening_, for example with the waitAndThrow or hasError, which are not implemented by FutureNCCL. In addition, this creates duplication between the two classes, which could lead to inconsistencies of behavior, bugs, missing features, ... The best solution would be to keep the core future logic in ivalue::Future, and have _only_ the CUDA additions in FutureNCCL. That's what we're going to do, in two steps. In this commit, I'll split the CUDA features into separate hooks, which are called by FutureNCCL's other methods. In the next commit, I'll remove these latter methods, and invoke the hooks directly from ivalue::Future. ghstack-source-id: 118180025 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25180534 fbshipit-source-id: 7b3cd374aee78f6c07104daec793c4d248404c61	2020-12-10 03:54:19 -08:00
Luca Wehrstedt	a6778989d1	Support wider range of types in FutureNCCL (#48502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48502 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- FutureNCCL restricted the values to be tensors, or (singleton) lists of tensors, or Python object that could be converted to either of those types. We need a CUDA future that can handle more generic types though. The main challenge is extracting all DataPtrs from an arbitrary object. I think I found some ways of doing so, but I'd like some JIT experts to look into this and tell me if there are better ways. I'll add inline comments for where their input would be appreciated. ghstack-source-id: 118180026 Test Plan: Unit tests (I should probably add new ones) Reviewed By: wanchaol Differential Revision: D25177562 fbshipit-source-id: 1ef18e67bf44543c70abb4ca152f1610dea4e533	2020-12-10 03:54:15 -08:00
Luca Wehrstedt	9fe3ac3650	Don't store device indices separately on FutureNCCL (#48501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48501 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- FutureNCCL stores a set of devices (on which the tensors in the data reside) and a CUDA event for each of those devices. In fact, each event instance also already contains the device it belongs to, which means we can avoid storing that information separately (with the risk that it'll be mismatched and/or inaccurate). ghstack-source-id: 118180024 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25177554 fbshipit-source-id: 64667c176efc2a7dafe99457a1fbba5d142cb06c	2020-12-10 03:54:12 -08:00
Luca Wehrstedt	e294c2d841	Add multi-GPU support to FutureNCCL (#48500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48500 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- After the previous changes, this is now much simpler than it sounds. For the most part it just consists in repeating some operations multiple times, once for device (e.g., recording and blocking on events). Funnily, we already had a vector of events, even though we only ever stored one element in it (this probably comes from the fact that this is shared with WorkNCCL, which can hold more than one event). Here, we now also store a vector of device indices. Perhaps the only non-trivial part of this is that now, for "follow-up" Futures (for callbacks), we can't know in advance which device the result will be on so we must determine it dynamically when we receive the result, by inspecting it. That's also easier than it sound because we already have a dataptr extractor. ghstack-source-id: 118180022 Test Plan: Unit tests (I should probably add new ones) Reviewed By: mrshenli Differential Revision: D25177556 fbshipit-source-id: 41ef39ec0dc458e341aa1564f2b9f2b573d7fa9f	2020-12-10 03:54:09 -08:00
Luca Wehrstedt	91ad3ed831	Fix FutureNCCL not recording dataptrs with caching alloc in wait() (#48563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48563 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- The CUDA caching allocator requires us to register all streams in which a DataPtr is used. We already do so when we invoke a callback, for which we obtain streams from the ATen pool. However, we didn't do so when the user waits for the Future and then uses the results in their current streams. This was probably fine in most cases, because the outputs of the NCCL ops (which is the tensors we're dealing with here) were user-provided, and thus already registered in some user streams, but in principle the user could use different streams when waiting than the ones they used to create the tensors. (If they use the same streams, registering becomes a no-op). But, more importantly, this change will help us turn FutureNCCL into a more general-purpose class as for example in RPC the tensors of the result are allocated by PyTorch itself and thus we need to record their usage on the user's streams with the caching allocator. ghstack-source-id: 118180033 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25210338 fbshipit-source-id: e0a4ba157653b74dd84cf5665c992ccce2dea188	2020-12-10 03:54:06 -08:00
Luca Wehrstedt	003c30ba82	Fix FutureNCCL's completed() disagreeing with wait() (#48503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48503 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- My impression is that one property of the upstream Future class is that once .wait() returns, or once a callback is invoked, then .completed() should return True. This was not the case for FutureNCCL because .wait() would return immediately, and callbacks would be invoked inline, but .completed() could return False if the CUDA async operations hadn't completed yet. That was odd and confusing. Since there are other ways for users to check the status of CUDA operations (if they really need, and typically I don't think it's so common), perhaps it's best to avoid checking the status of CUDA events in .completed(). ghstack-source-id: 118180028 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25180531 fbshipit-source-id: e1207f6b91f010f278923cc5fec1190d0fcdab30	2020-12-10 03:54:02 -08:00
Luca Wehrstedt	b91b0872a1	Record CUDA events for "follow-up" FutureNCCL inside markCompleted (#48499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48499 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- We can merge and "hide" a whole bunch of CUDA-related logic if we store and record the CUDA events that correspond to the completion of a FutureNCCL when we call markCompleted (rather than splitting it between the constructor, the `then` method, and a wrapper around the callback). A more concrete reason for this change is that soon I'll add support for multi-device, and in that case we can't necessarily know in advance which devices a value will be on until we get that value (and we don't want to record an event on all devices as then we might "over-synchronize"). To me, this also makes more conceptual sense: the moment when we store a value on the future, which is the "signal" that the future is now ready, should also be time at which we record the events needed to synchronize with that value. Though this may just be personal preference. ghstack-source-id: 118180034 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25177557 fbshipit-source-id: 53d4bcdfb89fa0d11bb7b1b94db5d652edeb3b7b	2020-12-10 03:53:59 -08:00
Luca Wehrstedt	6157f8aeb5	Use fresh stream from pool for each FutureNCCL callback (#48498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48498 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- FutureNCCL has a dedicated CUDA stream that it sets as current when running callbacks. This stream is initialized by the ProcessGroupNCCL by extracting it from the global ATen pool. In order to decouple FutureNCCL from that specific ProcessGroup and make it more generic, in this commit we make FutureNCCL extract a fresh stream from the ATen pool each time it needs one. This introduces a functional change, because it removes the implicit synchronization and ordering between the callbacks of a same Future. In fact, such an ordering is hard to guarantee in the general case as, for example, a user could attach a new callback just after the future becomes completed, and thus that callback would be run inline, immediately, out-of-order wrt the other callbacks. (There are ways to "fix" this but they are complicated). NCCL got around this because its futures are already marked complete when they're returned, but in fact it could also run into issues if multiple threads were adding callbacks simultaneously. Note that it remains still possible to enforce ordering between callbacks, but one must now do so explicitly. Namely, instead of this: ``` fut.then(cb1) fut.then(cb2) ``` one must now do: ``` fut.then(cb1).then(cb2) ``` ghstack-source-id: 118180029 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25177559 fbshipit-source-id: 4d4e73ea7bda0ea65066548109b9ea6d5b465599	2020-12-10 03:53:56 -08:00
Luca Wehrstedt	8fb52e7fa2	Make FutureNCCL record events in current stream (#48497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48497 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- When we record the events to mark a "follow-up" future complete (for a callback), we used to record them onto the dedicated stream, but that streams is the current stream at that time, so instead we could just record them onto the current stream. This introduces no functional differences. The reason I'm adding such an additional layer of indirection is so that the dedicated stream is only referenced inside the `addCallback` method, which will later allow us to more easily change how that stream works. ghstack-source-id: 118180035 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25177553 fbshipit-source-id: c6373eddd34bd399df09fd4861915bf98fd50681	2020-12-10 03:53:53 -08:00
Luca Wehrstedt	e4267eb424	Have FutureNCCL record streams w/ allocator in addCallback (#48496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48496 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- There are two ways to add a callback to a Future: `then` and `addCallback` (with the former deferring to the latter). FutureNCCL only "patched" `then`, which caused `addCallback` to be unsupported. By patching `addCallback`, on the other hand, we cover both. The high-level goal of this change though is to remove all CUDA-specific stuff from `then`, and move it to either `markCompleted` or to a wrapper around the callback. This will take a few more steps to achieve. ghstack-source-id: 118180031 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25177558 fbshipit-source-id: ee0ad24eb2e56494c353db700319858ef9dcf32b	2020-12-10 03:53:50 -08:00
Luca Wehrstedt	868a1a48c6	Add some safeguards to FutureNCCL (#48562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48562 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- In this commit I'm adding a few asserts to the constructors of FutureNCCL to make sure that what's passed in is what we expect (fun fact: until two commits ago that wasn't the case, as we were passed some empty events). I'm also making the second constructor private, as it's only supposed to be used by the then() method. ghstack-source-id: 118180036 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25210333 fbshipit-source-id: d2eacf0f7de5cc763e3cdd1ae5fd521fd2eec317	2020-12-10 03:53:47 -08:00
Luca Wehrstedt	b7f5aa9890	Remove NCCL dependency from PythonFutureWrapper (#48495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48495 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- PythonFutureWrapper needs to provide a GIL-aware way to extract tensors from an IValue of type PyObject. Since this was only used by FutureNCCL it was guarded by #ifdef USE_C10D_NCCL. However, we will need to use it with CUDA-aware futures other than the NCCL one. This might have been achieved simply by replacing USE_C10D_NCCL with USE_CUDA, but I wanted to clean this up better. We're dealing with two independent dimensions: C++-vs-Python and CPU-vs-CUDA. To make the code more modular, the two dimensions should be dealt with by orthogonal solutions: the user setting a custom callback to handle Python, and the subclass being CUDA-aware. Mixing these two axes makes it more complicated. Another reason for changing how this works is that later on, when we'll introduce multi-device support, we'll need to extract dataptrs for other reasons too (rather than just recording streams with the caching allocator), namely to inspect the value to determine which devices it resides on. ghstack-source-id: 118180038 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25177560 fbshipit-source-id: 3a424610c1ea191e8371ffee0a26d62639895884	2020-12-10 03:53:44 -08:00
Luca Wehrstedt	7f7f0fa335	Avoid using FutureNCCL before it's ready (#48561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48561 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- WorkNCCL allows to extract a FutureNCCL through getFuture(). There is one instance of this method being called by ProcessGroupNCCL itself, in order to attach a callback to it. This was happening _before_ the work was actually launched, however FutureNCCL does _always_ invoke its callbacks immediately inline. The events that the FutureNCCL was using hadn't been recorded yet, thus blocking on them was a no-op. Moreover, the function that was being called was installed by the generic ProcessGroup superclass, which is not CUDA-aware, and thus probably didn't make any use of the CUDA events or streams. `383abf1f0c/torch/lib/c10d/ProcessGroup.cpp (L66)` In short: I believe that creating a FutureNCCL and attaching a callback was equivalent to just invoking that function directly, without any CUDA-specific thing. I'm thus converting the code to do just that, in order to simplify it. Note that, given the comment, I don't think this was the original intention of that code. It seems that the function was intended to be run once the work finished. However, I am not familiar with this code, and I don't want to introduce any functional changes. ghstack-source-id: 118180037 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25210337 fbshipit-source-id: 54033c814ac77641cbbe79b4d01686dfc2b45495	2020-12-10 03:48:43 -08:00
kshitij12345	eb9516eaa4	[numpy] `torch.exp{2, m1}`: promote integer inputs to float (#48926 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48926 Reviewed By: zhangguanheng66 Differential Revision: D25392344 Pulled By: mruberry fbshipit-source-id: ddbabcfd58cc4c944153b1a224cc232efa022104	2020-12-10 00:14:22 -08:00
Kurt Mohler	27f7d1c286	Port `eig` CPU from TH to ATen (#43215 ) Summary: Also consolidates shared logic between `eig` CPU and CUDA implementations Fixes https://github.com/pytorch/pytorch/issues/24693 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43215 Reviewed By: VitalyFedyunin, zhangguanheng66 Differential Revision: D23862622 Pulled By: ngimel fbshipit-source-id: ca1002428850520cd74cd5b7ed8cb4d12dbd9c52	2020-12-09 23:27:35 -08:00
Xiong Zhang	95233870f2	[PyTorch Mobile] Preserve bundled input related methods when calling optimize_for_mobile Summary: Added an extra step to always preserve the bundled inputs methods if they are present in the input module. Also added a check to see if all the methods in the `preseved_methods` exist. If not, we will now throw an exception. This can hopefully stop hard-to-debug inputs from getting into downstream functions. ~~Add an optional argument `preserve_bundled_inputs_methods=False` to the `optimize_for_mobile` function. If set to be True, the function will now add three additional functions related with bundled inputs to be preserved: `get_all_bundled_inputs`, `get_num_bundled_inputs` and `run_on_bundled_input`.~~ Test Plan: `buck test mode/dev //caffe2/test:mobile -- 'test_preserve_bundled_inputs_methods $test_mobile_optimizer\.TestOptimizer$'` or `buck test caffe2/test:mobile` to run some other related tests as well. Reviewed By: dhruvbird Differential Revision: D25433268 fbshipit-source-id: 0bf9b4afe64b79ed1684a3db4c0baea40ed3cdd5	2020-12-09 22:53:56 -08:00
Xiaohan Wei	9417e92722	op to gen quant params from min-max thresholds Summary: Adding support to gen qparams to quantize a tensor from min and max thresholds of a tensor Test Plan: ``` buck test mode/opt caffe2/caffe2/quantization/server:int8_gen_quant_params_min_max_test ``` ``` Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/5629499573509506 ✓ ListingSuccess: caffe2/caffe2/quantization/server:int8_gen_quant_params_min_max_test - main (2.522) ✓ Pass: caffe2/caffe2/quantization/server:int8_gen_quant_params_min_max_test - test_int8_gen_quant_params_min_max_op (caffe2.caffe2.quantization.server.int8_gen_quant_params_min_max_test.TestInt8GenQuantParamsMinMaxOperator) (1.977) Summary Pass: 1 ListingSuccess: 1 ``` Reviewed By: hx89 Differential Revision: D24485985 fbshipit-source-id: 18dee193f7895295d85d31dc013570e5d5d97357	2020-12-09 19:13:53 -08:00
Nick Gibson	c5bc6b40ab	[NNC] Dead Store Elimination (#49030 ) Summary: Adds a new optimization method to LoopNest which eliminates stores that do not contribute to any output. It's unlikely any of the lowerings of aten operators produce these stores yet, but this creates some wiggle room for transformations in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49030 Reviewed By: tugsbayasgalan Differential Revision: D25434538 Pulled By: nickgg fbshipit-source-id: fa1ead82e6f7440cc783c6116b23d0b7a5b5db4b	2020-12-09 18:49:53 -08:00
Edward Yang	7a2abbd8fd	Revert D25416620: [pytorch][PR] Add version_info tuple Test Plan: revert-hammer Differential Revision: D25416620 (`e69c2f85f6`) Original commit changeset: 20b561a0c76a fbshipit-source-id: 4d73c7ed9191137d5be92236c18c312ce25a1471	2020-12-09 18:41:24 -08:00
Scott Wolchok	3123f878dd	[PyTorch] Avoid storage refcount bump in copy_tensor_metadata (#48877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48877 Setting `Storage` in the TensorImpl ctor only to set it again in `copy_tensor_metadata` wastes one refcount bump. ghstack-source-id: 117937872 Test Plan: internal benchmark. compared results with perf, saw 0.15% reduction in percent of total time spent in `TensorImpl::shallow_copy_and_detach`. Reviewed By: bhosmer Differential Revision: D25353529 fbshipit-source-id: e85d3a139ccd44cbd059c14edb19b22b962881a9	2020-12-09 17:51:07 -08:00
Jonas Haag	e69c2f85f6	Add version_info tuple (#48414 ) Summary: Add a `version_info` similar to `sys.version_info` for being able to make version tests. Example generated `version.py`: ``` __version__ = '1.8.0a0' version_info = (1, 8, 0, 'a0') # or version_info = (1, 8, 0, 'a0', 'deadbeef') if you're in a Git checkout debug = False cuda = None git_version = '671ee71ad4b6f507218d1cad278a8e743780b716' hip = None ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48414 Reviewed By: zhangguanheng66 Differential Revision: D25416620 Pulled By: malfet fbshipit-source-id: 20b561a0c76ac0b16ff92f4bd43f8b724971e444	2020-12-09 17:44:35 -08:00
Guilherme Leobas	5375a479aa	Add type annotations to conv-relu (#47680 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47679 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47680 Reviewed By: zhangguanheng66 Differential Revision: D25416628 Pulled By: malfet fbshipit-source-id: 103bea1e8c300990f74689787a71b1cfe916cfef	2020-12-09 17:12:26 -08:00
Dhruv Matani	e9ef1fe309	[PyTorch Mobile] Add continuous build config for xplat/caffe2 Summary: Currently this folder isn't covered by continuous build and ideally, it should be covered. I've made everything that is actually used build, but there are test failures (commented out). Specifically: ### Build Failures 1. [Resolved] Vulkan stuff doesn't build because codegen doesn't generate files that Vulkan expects. 2. [Resolved] Vulkan relies in Android dev environment being set up, which doesn't exist on sandcastle machines. I think the resolution should be to restrict Vulkan stuff to the ANDROID platform, but will let AshkanAliabadi (who is the expect on all things Vulkan) provide the appropriate resoltion. 3. [Resolved] Older caffe2 stuff didn't have the deps set up correctly for zlib. 4. [Resolved] Some Papaya stuff didn't have the QPL deps set up correctly. 5. [Resolved] Some tests include cuda, which isn't available on xplat PyTorch Mobile. 6. [Resolved] Missing NNPACK dep on platforms other than ANDROID and MACOS. 7. [Resolved] Maskrcnn binary missing header includes. 8. [Resolved] Braces around scalar initializers in Vulkan Tests. 9. [Resolved] Incorrect header `<vulkan/vulkan.h>` and incorrect BUCK glob path to include it - seems like some completely different header was being included by libvulkan-stub. ### Test Failures 1. [Resolved] Memory Leak on exit in multiple (all?) QNNPACK tests. 2. [Unresolved] Lite Trainer test doesn't explicitly specify dep on input `.ptl` file, resulting in the file not being found in the test when the test attempts to open it. 3. [Resolved] Heap Use after free errors in old caffe2 tests. 4. [Resolved] Heap buffer overflow errors in old caffe2 tests. 5. [Unresolved] Something related to an overload of `at::Tensor` accepting C2 Tensor not being found (new PyTorch test I think). Everything marked `[Unresolved]` above results in stuff that is commented out so that it isn't triggered. It is already currently broken, so it doesn't represent a regression - merely an explicit indication of the fact that it's broken. Everything marked `[Resolved]` above means that it was fixed to function as intended based on my understanding of the intent. Test Plan: Sandcastle. Reviewed By: iseeyuan Differential Revision: D25093853 fbshipit-source-id: e0dda4f3d852ef158cd088ae2cfd44019ade1573	2020-12-09 16:58:20 -08:00
Edward Yang	16b8e6ab01	Class-based structured kernels, with migration of add to framework (#48718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48718 This PR rewrites structured kernels to do the class-based mechanism (instead of defining a meta and impl function, they are methods on a class), and adds enough customizability on the class to support TensorIterator. To show it works, add is made a structured kernel. Don't forget to check https://github.com/pytorch/rfcs/pull/9 for a mostly up-to-date high level description of what's going on here. High level structure of this PR (the order you should review files): * TensorMeta.h - TensorMeta is deleted entirely; instead, meta functions will call `set_output` to allocate/resize their outputs. MetaBase gets a new `maybe_get_output` virtual method for retrieving the (possibly non-existent) output tensor in a meta function; this makes it easier to do special promotion behavior, e.g., as in TensorIterator. * TensorIterator.cpp - Two major changes: first, we add TensorIteratorBase::set_output, which is a "light" version of TensorIterator::set_output; it sets up the internal data structures in TensorIterator, but it doesn't do allocation (that is assumed to have been handled by the structured kernels framework). The control flow here is someone will call the subclassed set_output, which will allocate output, and then we will call the parent class (TensorIteratorBase) to populate the fields in TensorIterator so that other TensorIterator phases can keep track of it. Second, we add some tests for meta tensors, and skip parts of TensorIterator which are not necessary when data is not available. * tools/codegen/model.py - One new field in native_functions.yaml, structured_inherits. This lets you override the parent class of a structured meta class; normally it's MetaBase, but you can make it point at TensorIteratorBase instead for TensorIterator based kernels * tools/codegen/gen.py - Now generate all of the classes we promised. It's kind of hairy because this is the first draft. Check the RFC for what the output looks like, and then follow the logic here. There are some complications: I need to continue to generate old style wrapper functions even if an operator is structured, because SparseCPU/SparseCUDA/etc won't actually use structured kernels to start. The most complicated code generation is the instantiation of `set_output`, which by in large replicates the logic in `TensorIterator::set_output`. This will continue to live in codegen for the forseeable future as we would like to specialize this logic per device. * aten/src/ATen/native/UpSampleNearest1d.cpp - The previous structured kernel is ported to the new format. The changes are very modest. * aten/src/ATen/native/BinaryOps.cpp - Add is ported to structured. TODO: * Work out an appropriate entry point for static runtime, since native:: function stubs no longer are generated * Refactor TensorIteratorConfig construction into helper functions, like before * Make Tensor-Scalar addition structured to fix perf regression * Fix `verify_api_visibility.cpp` * Refactor tools/codegen/gen.py for clarity * Figure out why header changes resulted in undefined reference to `at::Tensor::operator[](long) const` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25278031 Pulled By: ezyang fbshipit-source-id: 57c43a6e5df21929b68964d485995fbbae4d1f7b	2020-12-09 15:39:12 -08:00
jiej	a6fa3b2682	adding profile_ivalue (#47666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47666 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25255573 Pulled By: Krovatkin fbshipit-source-id: 5d8753e4040a3d96105d28d26728125947c7a638	2020-12-09 15:29:15 -08:00
peterjc123	f431e47a2e	[collect_env] Acquire windows encoding using OEMCP (#49020 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49010. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49020 Reviewed By: zhangguanheng66 Differential Revision: D25398064 Pulled By: janeyx99 fbshipit-source-id: c7fd1e7d1f3dd82613d7f2031439503188b144fd	2020-12-09 15:22:18 -08:00
Peter Bell	5765bbd78c	Review memory overlap checks for advanced indexing operations (#48651 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45964 Indexing operators e.g. `scatter`/`gather` use tensor restriding so the `TensorIterator` built in overlap checking needs to be disabled. This adds the missing overlap checks for these operators. In addition, some indexing operators don't work will with `MemOverlapStatus::FULL` which is explicitly allowed by `assert_no_partial_overlap`. So, I've introduced `assert_no_overlap` that will raise an error on partial _or_ full overlap. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48651 Reviewed By: zhangguanheng66 Differential Revision: D25401047 Pulled By: ngimel fbshipit-source-id: 53abb41ac63c4283f3f1b10a0abb037169f20b89	2020-12-09 15:10:52 -08:00
Martin Yuan	dfa3808704	[PyTorch] Remove aten::native::empty usage in TensorIndexing (#49074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49074 Try to resolve part of the github issue of https://github.com/pytorch/pytorch/issues/48684 . ```aten::native::empty()``` is referenced in TensorIndexing.h. However, the definition of ```aten::native::empty()``` is nothing but checks and eventually calling ```at::empty()```. In this diff, ```at::empty()``` is directly used to avoid the reference to native symbols. ghstack-source-id: 118165999 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D25417854 fbshipit-source-id: 7e4af411ae63642c8470e78cf8553400dc9a16c9	2020-12-09 14:50:19 -08:00
Brian Hirsh	c7cc8a48c0	migrating some straggler pytorch ops in fbcode to the new registration API (#48954 ) Summary: I already migrated the majority of fbcode ops to the new registration API, but there are a few stragglers (mostly new files that were created in the last two weeks). The goal is mostly to stamp out as much of the legacy registration API usage as possible, so that people only see the new API when they look around the code for examples of how to register their own ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48954 ghstack-source-id: 118140663 Test Plan: Ran buck targets for each file that I migrated Reviewed By: ezyang Differential Revision: D25380422 fbshipit-source-id: 268139a1d7b9ef14c07befdf9e5a31f15b96a48c	2020-12-09 14:42:29 -08:00
Oleg Khabinov	67d12c9582	Pass shape hints for AOT case (#48989 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48989 1. Pass shape hints at model export time. 2. A bit of logging to show if passed shape hints are loaded by OnnxifiOp. From jfix71: > for AOT we skip onnxifi on the predictor side. We do onnxifi at model export time Test Plan: Temporarily added extra logging to verify that we use passed shape hints for AOT scenario. Here are the test results: 1. AOT model generation https://fburl.com/paste/1dtxrdsr shows that pybind_state.cc is called. 2. Running predictor service https://fburl.com/paste/d4qcizya with more logging in onnxifi_op.cc D25344546 shows that we use provided shape hints instead of doing shape inference every time. Reviewed By: jfix71 Differential Revision: D25344546 fbshipit-source-id: 799ca4baea23ed4d81d89d00cb3a52a1cbf69a44	2020-12-09 14:15:57 -08:00
Supriya Rao	bfa95f90a0	Revert D25325039: Check CUDA kernel launches (/fbcode/caffe2/) Test Plan: revert-hammer Differential Revision: D25325039 (`f5e9ffbc27`) Original commit changeset: 2043d6e63c7d fbshipit-source-id: 5377dd2aa7c6f58c8641c956b7642c7c559bbc40	2020-12-09 14:07:16 -08:00
Supriya Rao	7a4a2df225	Revert D25003113: make validate debug-only in Device copy ctr Test Plan: revert-hammer Differential Revision: D25003113 (`4b26cafb8f`) Original commit changeset: e17e6495db65 fbshipit-source-id: fd636c954a97bd80892464feb974a11b9dd96899	2020-12-09 13:58:11 -08:00
Peter Bell	fc0a3a1787	Improve torch.fft n-dimensional transforms (#46911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46911 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25420647 Pulled By: mruberry fbshipit-source-id: bf7e6a2ec41f9f95ffb05c128ee0f3297e34aae2	2020-12-09 12:40:06 -08:00
Yixin Bao	f5e9ffbc27	Check CUDA kernel launches (/fbcode/caffe2/) (#49105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49105 (1) Add a safety check `C10_CUDA_KERNEL_LAUNCH_CHECK()` after each kernel launch. This diff only changes the files inside the directory /fbsource/fbcode/caffe2/modules/, /fbsource/fbcode/caffe2/fb/, /fbsource/fbcode/caffe2/test/. (2) Get rid of old check `AT_CUDA_CHECK(cudaGetLastError())` when necessary. Test Plan: Test build: ``` buck build //caffe2/modules/detectron: buck build //caffe2/torch/fb/: ``` To check for launches without checks: ``` python3 caffe2/torch/testing/check_kernel_launches.py ``` Make sure none of the updated files are in the returned list. Reviewed By: r-barnes Differential Revision: D25325039 fbshipit-source-id: 2043d6e63c7d029c35576d3101c18247ffe92f01	2020-12-09 12:34:55 -08:00
Pritam Damania	7584161dfa	Enhance `new_group` doc to mention using NCCL concurrently. (#48872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48872 Using NCCL communicators concurrently is not safe and this is documented in NCCL docs. However, this is not documented in PyTorch and we should add documentation for ProcessGroupNCCL so that users are aware of this limitation. ghstack-source-id: 118148014 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D25351778 fbshipit-source-id: f7f448dc834c47cc1244f821362f5437dd17ce77	2020-12-09 12:29:15 -08:00
Brian Hirsh	c62f3fc40b	fix clang-tidy warning - make global TorchLibraryInit objects const (#48956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48956 ghstack-source-id: 118140666 Test Plan: GitHub CI Reviewed By: ezyang Differential Revision: D25381418 fbshipit-source-id: 1726ed233b809054cb9e5ba89e02c84fb868c1eb	2020-12-09 12:22:17 -08:00
Bert Maher	b98e62f8eb	[te] Add gflag for fast intrinsic expansion (#49060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49060 TE contains a fast tanh/sigmoid implementation that may be slightly less precise than the eager implementation (I measured 1 ulp in some test cases). We disabled it by default using an #ifdef but that may be too conservative. Adding a gflag allows more testing without recompilation. ghstack-source-id: 118140487 Test Plan: `buck test //caffe2/test:jit` Reviewed By: eellison Differential Revision: D25406421 fbshipit-source-id: 252b64091edfff878d2585e77b0a6896aa096ea5	2020-12-09 12:15:47 -08:00
Bert Maher	44f33596d3	[pe] Add gflags for num_profiled_runs and bailout_depth, laint (#49059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49059 We'd like to be able to change these defaults without rebuilding the library. ghstack-source-id: 118140486 Test Plan: `buck build //caffe2/test:jit` Reviewed By: eellison Differential Revision: D25405568 fbshipit-source-id: 5d0561a64127adc44753e48d3b6c7f560c8b5820	2020-12-09 12:14:00 -08:00
BowenBao	e5a98c5ab0	[ONNX] Remove usage of isCompleteTensor() in symbolic functions (#48162 ) Summary: `isCompleteTensor()` only returns true when both scalar type and shape is present. All dimensions in the shape must be static. This high requirement is unnecessary for many use cases such as when only rank or scalar type needs to be known. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48162 Reviewed By: malfet Differential Revision: D25340823 Pulled By: bzinodev fbshipit-source-id: 1fef61f44918f4339dd6654fb725b18cd58d99cf	2020-12-09 11:37:19 -08:00
Martin Yuan	41fd51d7d8	[PyTorch] Reference to c10::GetCPUAllocator() directly (#49068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49068 TH folder has some kernal implementations referenced by ATen/native. It goes with ATen/native in the follow-up diff for per-app selective build. ATen/Context.cpp stays in the lib level and should not reference to symbols in TH directly. It's a simple change in this diff, as ```getTHDefaultAllocator()``` did nothing but returns ```c10::GetCPUAllocator()```. Use ```c10::GetCPUAllocator()``` instead of going extra route through ```getTHDefaultAllocator()```. ghstack-source-id: 118151905 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D24147914 fbshipit-source-id: 37efb43adc9b491c365df0910234fa6a8a34ec25	2020-12-09 10:37:39 -08:00
kshitij12345	b3ab25aefa	[numpy] `torch.cosh`: promote integer inputs to float (#48923 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48923 Reviewed By: zhangguanheng66 Differential Revision: D25393679 Pulled By: mruberry fbshipit-source-id: 2151ee0467b50175f84ac492c219a46ef6bd66c3	2020-12-09 10:15:58 -08:00
Bert Maher	492580b855	[te] Remove vestigial __init__.py from test/cpp/tensorexpr (#49061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49061 We don't use the python harness for cpp tests anymore. ghstack-source-id: 118140485 Test Plan: Careful thinking. Reviewed By: navahgar Differential Revision: D25410290 fbshipit-source-id: 879e3c6fb296298d567e1d70b18bde96b5cac90d	2020-12-09 10:09:46 -08:00
Mike Ruberry	9f7fb54693	Revert D25111515: Extra sampling of record function events Test Plan: revert-hammer Differential Revision: D25111515 (`09b974c2d5`) Original commit changeset: 0d572a3636fe fbshipit-source-id: d558d8052924d937d86db7dd40dc6388e6d28823	2020-12-09 08:37:17 -08:00
Jeff Daily	73f7178445	remove redundant sccache wrappers from build.sh scripts (#47944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47944 Reviewed By: zhangguanheng66 Differential Revision: D25406873 Pulled By: walterddr fbshipit-source-id: 5441b0a304e0be1213b4e14adf26118b3e7e330b	2020-12-09 08:20:44 -08:00
Brian Hirsh	4b26cafb8f	make validate debug-only in Device copy ctr (#47854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47854 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25003113 Pulled By: bdhirsh fbshipit-source-id: e17e6495db65c48c7daf3429acbd86742286a1f3	2020-12-09 08:11:24 -08:00
Sidney Fletcher	71cfb73755	Add complex support to broadcast_coalesced (#48686 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47330 Add support for DataParallel complex tensors by handling them as `torch.view_as_real` for `broadcast_coalesced`, `scatter` and `gather` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48686 Reviewed By: osalpekar Differential Revision: D25261533 Pulled By: sidneyfletcher fbshipit-source-id: 3a25e05deee43e053f40d1068fc5c7867cfa9686	2020-12-09 05:11:40 -08:00
Ilia Cherniavskii	09b974c2d5	Extra sampling of record function events (#48289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48289 Adding extra sampling step when dispatching RecordFunction. (Note: this ignores all push blocking failures!) Reviewed By: swolchok Differential Revision: D25111515 Pulled By: ilia-cher fbshipit-source-id: 0d572a3636fe649a47ec47901826bbfc08368937	2020-12-09 02:29:13 -08:00
Scott Wolchok	a20d4511e4	[PyTorch] TensorImpl::is_non_overlapping_and_dense_ should default to true (#48625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48625 The default TensorImpl is contiguous. Therefore, it is non-overlapping and dense per refresh_contiguous(). ghstack-source-id: 118035410 Test Plan: CI Reviewed By: ezyang Differential Revision: D25232196 fbshipit-source-id: 1968d9ed444f2ad5414a78d0b11e5d3030e3109d	2020-12-09 00:49:31 -08:00
X Wang	a849f38222	skip cuda test_cholesky_solve_batched_many_batches due to illegal memory access (#48999 ) Summary: See https://github.com/pytorch/pytorch/issues/48996 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48999 Reviewed By: zhangguanheng66 Differential Revision: D25390070 Pulled By: mruberry fbshipit-source-id: cf59130f6189ab8c2dade6a6a4de2f69753a5e36	2020-12-09 00:47:55 -08:00
Jeff Daily	e8b00023b2	[ROCm] restore autograd tests (#48431 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30845. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48431 Reviewed By: zhangguanheng66 Differential Revision: D25393323 Pulled By: mruberry fbshipit-source-id: 339644abf4ad52be306007f4040c692a45998052	2020-12-09 00:40:40 -08:00
Teng Gao	1c31f76297	Add high level profiling trace for dataloading and optimizer (#47655 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47441 To give user more information about python level functions in profiler traces, we propose to instrument on the following functions: ``` _BaseDataLoaderIter.__next__ Optimizer.step Optimizer.zero_grad ``` Because the record_function already uses if (!active) to check whether the profiler is enabled, so we don't explicitly call torch.autograd._profiler_enabled() before each instrument. Acknowledgement: nbcsm, guotuofeng, gunandrose4u , guyang3532 , mszhanyi Pull Request resolved: https://github.com/pytorch/pytorch/pull/47655 Reviewed By: smessmer Differential Revision: D24960386 Pulled By: ilia-cher fbshipit-source-id: 2eb655789e2e2f506e1b8f95ad3d470c83281102	2020-12-09 00:13:56 -08:00
Jerry Zhang	2d9585a6a1	[quant][graphmode][fx] Add test for ResnetBase (#48939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48939 Add numerical test for fx graph mode for resnet base, comparing with eager mode Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D25375342 fbshipit-source-id: 08f49b88daede47d44ee2ea96a02999fea246cb2	2020-12-08 22:27:03 -08:00
Hao Lu	59a3e76641	[pt][quant] Remove contiguous calls in qembeddingbag (#48993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48993 I don't see any reasons that we need to call contiguous on the embedding tables. They should not exist in the first place. The indices and lengths/offsets are actually generated in the model, but they're most likely generated by SigridTransform -> ClipRanges -> GatherRanges -> SigridHash (sometimes) and none of these ops produce non-contiguous tensors. It should be fine to enforce tensor.is_contiguous(). Reviewed By: radkris-git Differential Revision: D25266756 fbshipit-source-id: f15ecb67281c9ef0c7ac6637f439e538e77e30a2	2020-12-08 20:14:20 -08:00
Guilherme Leobas	7c0a3e3a06	Annotate torch._tensor_str (#48584 ) Summary: This is a follow up PR of https://github.com/pytorch/pytorch/issues/48463 > Rather than requiring that users write import numbers and then use numbers.Float etc., this PEP proposes a straightforward shortcut that is almost as effective: when an argument is annotated as having type float, an argument of type int is acceptable; similar, for an argument annotated as having type complex, arguments of type float or int are acceptable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48584 Reviewed By: zhangguanheng66 Differential Revision: D25411080 Pulled By: malfet fbshipit-source-id: e00dc1e9e6e46a8cfae77da4f2cf159c0c2b9bcc	2020-12-08 20:06:40 -08:00
Guilherme Leobas	34cc77a811	Torch onnx (#48980 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45215 This is a follow up PR of https://github.com/pytorch/pytorch/issues/45258 and https://github.com/pytorch/pytorch/issues/48782 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48980 Reviewed By: zhangguanheng66 Differential Revision: D25399823 Pulled By: ezyang fbshipit-source-id: 798055f4abbbffecdfab0325884193c81addecec	2020-12-08 19:41:44 -08:00
peterjc123	5450614cf6	Correctly apply WIN32_LEAN_AND_MEAN to the whole repo (#49025 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49025 Reviewed By: zhangguanheng66 Differential Revision: D25399912 Pulled By: ezyang fbshipit-source-id: 9b7225b0e43511e0b8981c39035d814a4406c523	2020-12-08 19:38:23 -08:00
Supriya Rao	4434c07a2c	[quant][fix] Support quantization of ops where input is quantizable (#49027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49027 For cat followed by linear since the output of cat is not quanitzed, we didnt quantize the linear This checks the uses of the cat op to insert observers Test Plan: python test/test_quantization.py TestQuantizeJitOps.test_cat_linear Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25403412 fbshipit-source-id: 5875db259bf75f08ce672ce341a67005ed2f8a04	2020-12-08 19:21:41 -08:00
Jerry Zhang	993ce4b206	[quant][graphmode][fx] Add MatchAllNode in pattern matching (#48979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48979 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D25385459 fbshipit-source-id: 43adffc9e2242d099cecd38d1902f9900158f51e	2020-12-08 18:53:55 -08:00
Shiyan Deng	107c31f2f5	Add a pass to fetch attributes of nn.Module to fx.node (#47935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47935 Fetch the parameters that are needed for lowering from nn.Module to fx.node for leaf_modules. Test Plan: A test `test_fetch` is added to test_fx_experimental.py. Reviewed By: jfix71 Differential Revision: D24957142 fbshipit-source-id: a349bb718bbcb7f543a49f235e071a079da638b7	2020-12-08 18:06:37 -08:00
Meghan Lele	3f9ff48ebb	[JIT] Allow del statements with multiple targets (#48876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48876 Summary This commit adds support for `del` statements with multiple targets. Targets are deleted left-to-right just like Python. Test Plan This commit updates the `TestBuiltins.test_del_multiple_operands` unit test to actually test that multiple deletion works instead of asserting that an error is thrown. Fixes This commit fixes #48635. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25386285 Pulled By: SplitInfinity fbshipit-source-id: c0fbd8206cf98b2bd1b695d0b778589d58965a74	2020-12-08 15:39:42 -08:00
Vasiliy Kuznetsov	d033e185ed	fx quant: move more functions to utils (#48908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48908 No logic change, improving readability Test Plan: CI Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25363080 fbshipit-source-id: 1d73a875bd7abf671b544ebc835432fea5306dc3	2020-12-08 15:37:04 -08:00
Vasiliy Kuznetsov	2668ea8087	fx quant: move qconfig utils to utils file (#48907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48907 Improving readability Test Plan: CI Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25363078 fbshipit-source-id: 6b0161db14ccf8c3b47edf4fc760ca9a399254b2	2020-12-08 15:37:00 -08:00
Vasiliy Kuznetsov	17e71509a6	fx quant: quick cleanup for model_device (#48906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48906 As titled, removing some code which is no longer needed after refactors. Test Plan: CI Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25363079 fbshipit-source-id: 9e4bcf63f4f1c2a2d3fb734688ba593d72495349	2020-12-08 15:35:18 -08:00
peterjc123	e538bd6695	[collect_env] Add candidate paths for nvidia-smi on Windows (#49021 ) Summary: Recently, Nvidia tries to put nvidia-smi under SystemRoot. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49021 Reviewed By: zhangguanheng66 Differential Revision: D25399831 Pulled By: ezyang fbshipit-source-id: b1ea12452012e0a3fb4703996b6104e7115a8a7f	2020-12-08 15:02:15 -08:00
Stas Bekman	02b63858f2	[CUDAExtension] support all visible cards when building a cudaextension (#48891 ) Summary: Currently CUDAExtension assumes that all cards are of the same type on the same machine and builds the extension with compute capability of the 0th card. This breaks later at runtime if the machine has cards of different types. Specifically resulting in: ``` RuntimeError: CUDA error: no kernel image is available for execution on the device ``` when the cards of the types that weren't compiled for are used. (and the error is far from telling what the problem is to the uninitiated) My current setup is: ``` $ CUDA_VISIBLE_DEVICES=0 python -c "import torch; print(torch.cuda.get_device_capability())" (8, 6) $ CUDA_VISIBLE_DEVICES=1 python -c "import torch; print(torch.cuda.get_device_capability())" (6, 1) ``` but the extension was getting built with `-gencode=arch=compute_80,code=sm_80`. This PR: * [x] introduces a loop over all visible at build time devices to ensure the extension will run on all of them (it sorts the new list generated by the loop, so that the output is easier to debug should a card with lower capacity come last) * [x] adds `+PTX` to the last entry of ccs derived from local cards (`if not _arch_list:`) to support other archs * [x] adds a digest of my conversation with ptrblck on slack in the form of docs which hopefully can help others know which archs to support, how to override defaults, when and how to add PTX, etc. Please kindly review that my prose is clear and easy to understand. ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/48891 Reviewed By: ngimel Differential Revision: D25358285 Pulled By: ezyang fbshipit-source-id: 8160f3adebffbc8e592ddfcc3adf153a9dc91557	2020-12-08 14:57:10 -08:00
Wang Xu	6000481473	add a unit test for large node error (#48938 ) Summary: add a unit test to test the situation where a node is too large to fit into any device Pull Request resolved: https://github.com/pytorch/pytorch/pull/48938 Reviewed By: zhangguanheng66 Differential Revision: D25402967 Pulled By: scottxu0730 fbshipit-source-id: a2e2a3dc70d139fa678865ef03e67fa57eff4a1d	2020-12-08 14:45:44 -08:00
Xiang Gao	5960581148	CUDA BFloat16 batchnorm (non-cuDNN) (#44994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44994 Reviewed By: ailzhang Differential Revision: D25377525 Pulled By: ngimel fbshipit-source-id: 42d583bbc364532264a4d3ebaa6b4ae02a0413de	2020-12-08 14:25:42 -08:00
Hao Lu	e8ec84864f	[StaticRuntime] Add aten::narrow (#48991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48991 Add native impl of `aten::narrow` to skip dispatcher, because `aten::narrow` calls `aten::slice` in its implementation, here we reduce the dispatcher overhead by two-fold by calling the native impl of `aten::slice`. Reviewed By: bwasti Differential Revision: D25387119 fbshipit-source-id: c020da2556a35bc57a8a2e21fa45dd491ea516a0	2020-12-08 13:48:21 -08:00
Sam Estep	d1fb4b4ffc	Put Flake8 requirements into their own file (#49032 ) Summary: This PR moves the list of Flake8 requirements/versions out of `.github/workflows/lint.yml` and into its own file `requirements-flake8.txt`. After (if) this PR is merged, I'll modify the Flake8 installation instructions on [the "Lint as you type" wiki page](https://github.com/pytorch/pytorch/wiki/Lint-as-you-type) (and its internal counterpart) to just say to install from that new file, rather than linking to the GitHub Actions YAML file and/or giving a command with a set of packages to install that keeps becoming out-of-date. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49032 Test Plan: Either look at CI, or run locally using [act](https://github.com/nektos/act): ```sh act -P ubuntu-latest=nektos/act-environments-ubuntu:18.04 -j flake8-py3 ``` Reviewed By: janeyx99 Differential Revision: D25404037 Pulled By: samestep fbshipit-source-id: ba4d1e17172a7808435df06cba8298b2b91bb27c	2020-12-08 13:29:10 -08:00
Mikhail Zolotukhin	2b70bcd014	[TensorExpr] Enable inlining for output tensors too. (#48967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48967 We previously didn't inline output tensors which resulted in correctness issues like #48533. This PR allows inlining for output tensors too - this could result in duplicated computations, but we can address that later once correctness is ensured. Performance results on FastRNNS: Before the fix: ``` Benchmarking LSTMs... name avg_fwd std_fwd avg_bwd std_bwd cudnn 10.09 0.05431 17.55 0.2108 aten 21.52 0.1276 26.7 1.471 jit 13.25 0.8748 22.47 1.73 jit_premul 11.43 0.3226 19.43 2.231 jit_premul_bias 11.84 0.2245 20.33 2.205 jit_simple 13.27 0.9906 22.15 0.9724 jit_multilayer 13.38 0.8748 22.82 1.01 py 33.55 4.837 46.41 6.333 ``` After the fix: ``` Benchmarking LSTMs... name avg_fwd std_fwd avg_bwd std_bwd cudnn 10.09 0.05979 17.45 0.1987 aten 21.21 0.144 26.43 0.7356 jit 13.01 0.2925 23.21 0.8454 jit_premul 11.4 0.3905 19.62 2.448 jit_premul_bias 11.85 0.2461 20.29 0.6592 jit_simple 13.08 0.8533 22.81 1.315 jit_multilayer 12.93 0.1095 23.57 1.459 py 31.21 2.783 44.63 6.073 ``` Differential Revision: D25383949 Test Plan: Imported from OSS Reviewed By: SplitInfinity Pulled By: ZolotukhinM fbshipit-source-id: 16f5727475109a278499bef7905f6aad18c8527a	2020-12-08 13:24:40 -08:00
Sam Estep	0fb9d36660	Delete ATen mirror stuff (#49028 ) Summary: These files refer to https://travis-ci.org/github/zdevito/ATen and https://github.com/zdevito/ATen which were last updated in 2018 and 2019 respectively. According to zdevito: > yeah, all of that stuff can be deleted > was from a time when ATen was a separate repo from pytorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/49028 Reviewed By: zdevito Differential Revision: D25401810 Pulled By: samestep fbshipit-source-id: a8eea7382f91e1aee6f45552645e6d53825fe5a7	2020-12-08 13:19:30 -08:00
neerajprad	dee82ef3ea	Add LKJCholesky distribution (#48798 ) Summary: As a follow up to https://github.com/pytorch/pytorch/issues/48041, this adds the `LKJCholesky` distribution that samples the Cholesky factor of positive definite correlation matrices. This also relaxes the check on `tril_matrix_to_vec` so that it works for 2x2 matrices with `diag=-2`. cc. fehiepsi Pull Request resolved: https://github.com/pytorch/pytorch/pull/48798 Reviewed By: zhangguanheng66 Differential Revision: D25364635 Pulled By: neerajprad fbshipit-source-id: 4abf8d83086b0ad45c5096760114a2c57e555602	2020-12-08 11:27:48 -08:00
James Reed	c92c8598a3	[FX][2/2] Make docstrings pretty when rendered (#48871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48871 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D25351588 Pulled By: jamesr66a fbshipit-source-id: 4c6fd341100594c204a35d6a3aab756e3e22297b	2020-12-08 11:14:43 -08:00
Rong Rong	b89c328493	Add fftw3 cmake as alternative for FFT/DFT (#48808 ) Summary: added cmake discovery in Dependencies.cmake for fftw3. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48808 Reviewed By: janeyx99 Differential Revision: D25375320 Pulled By: walterddr fbshipit-source-id: cde3afc51eef9c621c7d19be7ad7573fc8b838c2	2020-12-08 10:35:33 -08:00
Jokeren	b0e919cf60	Avoid initializing gradInput twice in the backward phase of replication (#48890 ) Summary: https://github.com/pytorch/pytorch/issues/48889 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48890 Reviewed By: zhangguanheng66 Differential Revision: D25375697 Pulled By: ezyang fbshipit-source-id: fd6f6089be44e68c4557b923550c7cadb90d739a	2020-12-08 10:15:24 -08:00
Bram Wasti	274ce26fd8	[static runtime] Add Internal Ops to the registry (#48616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48616 This adds a couple of _out variants and then registers them to the registry. I also added the concept of "canReuse{Input,Output}" so that we can annotate tensors that are not optimizable (specifically, non-float tensors). In the future we can change this (with this D25062301) after removing `RecordFunction`, we see these results ``` BS=20 --- caffe2: 0.651617 ~ 0.666354 static runtime: 0.753481 pytorch: 0.866658 BS=1 --- caffe2: 0.0858684 ~ 0.08633 static runtime: 0.209897 pytorch: 0.232694 ``` Test Plan: standard internal test of ads model against caffe2 reference (see the scripts in this quip: https://fb.quip.com/ztERAYjuzdlr) Reviewed By: hlu1 Differential Revision: D25066823 fbshipit-source-id: 25ca181c62209a4c4304f7fe73832b13e314df80	2020-12-08 09:32:38 -08:00
Nikita Shulga	ad3fed8b90	[BE] Fix signed-unsigned warnings (#48848 ) Summary: Switch to range loops when possible Replace `ptrdiff_t`(signed type) with `size_t`(unsigned type) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48848 Reviewed By: walterddr Differential Revision: D25376591 Pulled By: malfet fbshipit-source-id: 9835f83b7a17b6acc20731cc89c1c11c2aa01a78	2020-12-08 08:58:11 -08:00
Nikita Shulga	c29f51642e	Modify NEON check for ARM64 on OS X (#48982 ) Summary: Use CMAKE_SYSTEM_PROCESSOR rather than run sysctl Fixes https://github.com/pytorch/pytorch/issues/48874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48982 Reviewed By: walterddr Differential Revision: D25385883 Pulled By: malfet fbshipit-source-id: 47b6dc5be8d75f6d4a66a11c564abdfe31ac90b4	2020-12-08 07:58:22 -08:00
Rong Rong	58c13cf685	Back out "Revert D25375885: [pytorch][PR] Reenable some BF16 tests on CUDA" Summary: Revert D25397144 69829f3fff4d4a2d1a71bb52e90d3c7f16b27fa3 Test Plan: Revert Hammer Reviewed By: janeyx99 Differential Revision: D25397572 fbshipit-source-id: 625ca2a32e4558ae4582a15697b6e1cc57cc1573	2020-12-08 07:52:59 -08:00
Jane Xu	e2befb84bc	minor README change to fix #25464 (#48970 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25464 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48970 Reviewed By: walterddr Differential Revision: D25396284 Pulled By: janeyx99 fbshipit-source-id: 8355c417b5c8b8865f208d7d8e8154048423afd9	2020-12-08 07:48:52 -08:00
Rong Rong	39445f718c	Revert D25375885: [pytorch][PR] Reenable some BF16 tests on CUDA Test Plan: revert-hammer Differential Revision: D25375885 (`e3893b867f`) Original commit changeset: 2e19fe725ae9 fbshipit-source-id: 69829f3fff4d4a2d1a71bb52e90d3c7f16b27fa3	2020-12-08 07:05:33 -08:00
Ansha Yu	07978bd62e	[static runtime] fuse inference ops (1) (#48948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48948 Fuse inference ops for the following inside static runtime: ConcatAddMulReplaceNaNClip CastedBatchOneHotLengths ConcatBatchMatMulBatchGather TODO: 1. add unit tests 2. add more restrictions on the graph transform (e.g. check inputs, check outputs not used elsewhere) Test Plan: Run adindexer model with static runtime and fusion; check ops ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adindexer/traced_precomputation2.pt --pt_inputs=/data/users/ansha/tmp/adindexer/merge/container_precomputation_bs1.pt --iters=3000 --warmup_iters=10000 --num_threads=1 --pred_net=/data/users/ansha/tmp/adindexer/precomputation_merge_net.pb --c2_inputs=/data/users/ansha/tmp/adindexer/merge/c2_inputs_precomputation_bs1.pb --c2_sigrid_transforms_opt=1 --c2_use_memonger=1 --c2_weights=/data/users/ansha/tmp/adindexer/merge/c2_weights_precomputation.pb --pt_enable_static_runtime ``` transformed model graph contains the fused ops: P151559641 Results before fusion: P151567611 Results after fusion: P151566783 (8% speedup for bs=20, 14% speedup for bs=1) Reviewed By: hlu1 Differential Revision: D25224107 fbshipit-source-id: c8442e8ceb018879c61ce564367b1c1b9412601b	2020-12-08 05:54:49 -08:00
Sebastian Messmer	b643dbb8a4	VariableType calls faithful C++ API for c10-full out ops (#47792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47792 For operators with out arguments, VariableType previously called the out overload of the C++ API because that's all we had. We introduced a faithful C++ API that takes out arguments in schema-order in D24835252 and this PR changes VariableType to use that API instead. Note that this only applies to c10-full ops. Non-c10-full ops still call the unfaithful API. There aren't any c10-full out ops at the moment. So this PR can only be tested and evaluated together with PRs on top that make ops with out arguments c10-full. ghstack-source-id: 118068088 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D24901945 fbshipit-source-id: a99db7e4d96fcc421f9664504f87df68fe1c482f	2020-12-08 03:48:45 -08:00
Sebastian Messmer	3ef36dca8e	Faithful out arguments (#47712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47712 This adds a faithful API for ops with out arguments, as described in https://docs.google.com/document/d/1h7nBibRwkRLQ8rsPhfALlwWR0QbkdQm30u4ZBwmaps8/edit# . After this, an op will generate the following overloads for the C++ API: ```cpp // Generated from the aten::abs operator (NOT from aten::abs.out) Tensor at::abs(Tensor& self) // Generated from the aten::abs.out operator Tensor& at::abs(Tensor& self, Tensor& out) Tensor& at::abs_out(Tensor& out, Tensor& self) ``` This is an important step towards making those ops c10-full (it allows VariableType, XLA and other backends to ignore reordering and just call through with the same argument order), but this does not make any of those ops c10-full yet. It enables the faithful API independent from c10-fullness. That means the API is more consistent with the same API for all ops and making an op c10-full in the future will not trigger future C++ API changes. ghstack-source-id: 118068091 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D24835252 fbshipit-source-id: dedfabd07140fc8347bbf16ff219aad3b20f2870	2020-12-08 03:48:42 -08:00
Sebastian Messmer	046ea6696d	Enable faithful API for all ops (#47711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47711 Seems we generated the declaration but the definition only for c10-full ops. We should also generate the definition for non-c10-full ops. This makes future migrations of ops from non-c10-full to c10-full have a lower impact on the C++ API. ghstack-source-id: 118064755 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D24835006 fbshipit-source-id: 8f5c3c0ffcdc9b479ca3785d57da16db508795f5	2020-12-08 03:43:48 -08:00
Richard Barnes	32b098baf9	Add and adjust kernel launch checks (#46727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46727 This adds kernel launch safety checks to a number of kernels. See D24309971 (`353e7f940f`) for context. Test Plan: The existing pre-commit test rigs are used. Reviewed By: ngimel Differential Revision: D24334303 fbshipit-source-id: b6433f6be109fc8dbe789e91f3cbfbc31fd15951	2020-12-08 00:36:56 -08:00
Richard Barnes	cb6233aa53	Fix some convoluted(?) code (#48893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48893 This simplifies some code which is written in an interesting way. It may be that this was intentional, but I don't recognize the pattern being used. Test Plan: N/A - Sandcastle Reviewed By: igorsugak Differential Revision: D25358283 fbshipit-source-id: 19bcf01cbb117843e08df0237e6a03ea77958078	2020-12-07 22:48:23 -08:00
Jiatong Zhou	c3a90bedd4	Move aten::__contains__.int_list for lite jit (#48950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48950 Needed by noise suppression model Test Plan: build Reviewed By: linbinyu Differential Revision: D25321582 fbshipit-source-id: fbc67fc35087c5f44b7ab68d1485b2b916747723	2020-12-07 21:27:34 -08:00
Eli Uriegas	881e9583b2	docker: Add make variable to add docker build args (#48942 ) Summary: Adds an extra make variable 'EXTRA_DOCKER_BUILD_FLAGS' that allows us to add extra docker build flags to the docker build command. Example: make -f docker.Makefile EXTRA_DOCKER_BUILD_FLAGS=--no-cache devel-image Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/48942 Reviewed By: walterddr Differential Revision: D25376288 Pulled By: seemethere fbshipit-source-id: 9cf2c2a5e01d505fa54447604ecd653dcbdd42e1	2020-12-07 20:15:24 -08:00
Xiang Gao	5533be5170	CUDA BF16 backwards (#48809 ) Summary: Looks like there's no test? Pull Request resolved: https://github.com/pytorch/pytorch/pull/48809 Reviewed By: mruberry Differential Revision: D25378998 Pulled By: ngimel fbshipit-source-id: d16789892902b5a20828e8c7b414b478de33c4a5	2020-12-07 19:48:53 -08:00
Bharat123rox	3aeb9cc85d	[DOCS]Correct docs for torch.lu_solve (#47762 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43498 by correcting the function signature of `torch.lu_solve` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47762 Reviewed By: ljk53 Differential Revision: D24900259 Pulled By: ailzhang fbshipit-source-id: 2a43170bde57e03d44025b23e3abcda169cfc9e2	2020-12-07 19:35:23 -08:00
Ivan Yashchuk	bea88ee1d0	Added entry for torch.linalg.cond to linalg.rst (#48941 ) Summary: This PR makes documentation for `cond` available at https://pytorch.org/docs/master/linalg.html I forgot to include this change in https://github.com/pytorch/pytorch/issues/45832. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48941 Reviewed By: ngimel Differential Revision: D25379244 Pulled By: mruberry fbshipit-source-id: c8c0a0b8a05c17025d6c3cea405b2add369e2019	2020-12-07 19:01:05 -08:00
Yi Wang	c876d4f477	[Gradient Compression] Let the dtype of created low-rank tensors P and Q be the same type as the input tensor (#48902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48902 Previously if the dtype of input gradients is FP16, matrix multiplications will fail, because the created low-rank tensors P and Q use FP32 dtype. Now let the dtype of P and Q be the same as the input tensor. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117962078 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl Reviewed By: rohan-varma Differential Revision: D25362071 fbshipit-source-id: e68753ff23bb480605b02891e128202ed0f8a587	2020-12-07 17:40:06 -08:00
Peter Bell	533c837833	Register OpInfos for torch.fft transforms (#48427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48427 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25266218 Pulled By: mruberry fbshipit-source-id: 406e7ed5956bc7445daf8c027c9b4d2c8ff88fa1	2020-12-07 17:19:29 -08:00
Zachary DeVito	adbb74ded9	[package] pre-emptively install submodules (#48799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48799 Python's IMPORT_FROM bytecode will bypass the import infrastructure when a packaging being loaded as part of a cirular dependency is being accessed from the module _before_ that package has finished loading and is installed on the module. Since we cannot override the lookup on sys.modules, this PR pre-emptively does the module assignment before running the submodules initialization code. Note: this appears to work, but it is not clear to me why python doesn't do this by default. It is possible that the logic for creating modules is flexible enough in generic python that this interception between creating the module and running its code is not always possible. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D25312467 Pulled By: zdevito fbshipit-source-id: 6fe3132af29364ccb2b3cabdd2b847d0a09eb515	2020-12-07 17:12:04 -08:00
Xiang Gao	e3893b867f	Reenable some BF16 tests on CUDA (#48805 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/48805 Reviewed By: agolynski Differential Revision: D25375885 Pulled By: ailzhang fbshipit-source-id: 2e19fe725ae9450bd1a2bc4e2d308c59b9f94fac	2020-12-07 16:16:07 -08:00
Howard Huang	7629612f9f	Update torch.randint documentation to include missing note (#48787 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46497 Includes note about returning dtype torch.int64. Current documentation: https://pytorch.org/docs/stable/generated/torch.randint.html?highlight=randint#torch.randint New documentation: ![image](https://user-images.githubusercontent.com/14858254/101196939-48977d00-3616-11eb-90a5-a7b706e8505f.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48787 Test Plan: Built documentation and checked generated docs Reviewed By: ailzhang Differential Revision: D25339421 Pulled By: H-Huang fbshipit-source-id: c2ecaacaeb57971fe7fba0d9d54f3c61b0fd04ce	2020-12-07 16:11:28 -08:00
Nikita Shulga	f67259fe89	Fix CI by removing gen_pyi from mypy-stirct.ini (#48961 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/48961 Reviewed By: janeyx99 Differential Revision: D25383152 Pulled By: malfet fbshipit-source-id: ce0226398522342256d0d701edc13955d1095a0d	2020-12-07 15:26:27 -08:00
Rohan Varma	b77ca9e829	[Docs] Add examples for new object-based c10d APIs (#43932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43932 Adds some basic examples to the documentation for each of the newly added object-based collectibves. ghstack-source-id: 117965966 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23441838 fbshipit-source-id: 91344612952cfcaa71f08ccf2a2c9ed162ca9c89	2020-12-07 14:35:14 -08:00
Rohan Varma	d6b5f3ad98	Add object-based collective APIs to public docs (#48909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48909 Adds these new APIs to the documentation ghstack-source-id: 117965961 Test Plan: CI Reviewed By: mrshenli Differential Revision: D25363279 fbshipit-source-id: af6889d377f7b5f50a1a77a36ab2f700e5040150	2020-12-07 14:30:25 -08:00
Edward Yang	88ebf6f894	Revert D25304229: [pytorch][PR] Add type annotations to torch.onnx.* modules Test Plan: revert-hammer Differential Revision: D25304229 (`8bc6023d7a`) Original commit changeset: b01b21ddbf86 fbshipit-source-id: bc3308176e2c70423f29f694e9db94828213e7d6	2020-12-07 11:58:03 -08:00
Heitor Schueroff	d307601365	Revert D24923679: Fixed einsum compatibility/performance issues (#46398 ) Test Plan: revert-hammer Differential Revision: D24923679 (`ea2a568cca`) Original commit changeset: 47e48822cd67 fbshipit-source-id: 52f17b66a4aa075d0159bdf1c98616e6098091b8	2020-12-07 11:48:36 -08:00
Hui Guo	924b001b71	#48733 added logging statements to LLVM codegen using JIT logging (#48758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48758 Test Plan: PYTORCH_JIT_LOG_LEVEL=">>llvm_codegen" python test/test_jit_fuser_te.py -k test_lerp Reviewed By: ZolotukhinM Differential Revision: D25295995 Pulled By: huiguoo fbshipit-source-id: 8927808932ef3657da26508d0f6574c9e5fbbb25	2020-12-07 11:14:53 -08:00
Iurii Zdebskyi	dad74e58fc	[WIP] Added foreach_trunc, foreahc_reciprocal, foreach_sigmoid APIs (#47385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47385 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D24737051 Pulled By: izdeby fbshipit-source-id: ed259d9184b2b784d8cc1983a8b85cc6cbf930ba	2020-12-07 10:47:23 -08:00
Brian Hirsh	ba6511b304	pyi codegen update - remove Declarations.yaml (#48754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48754 The goal of this PR is to kill Declarations.yaml in the pyi codegen, in favor of native_functions + the existing python object model. High-level design Since the python signatures used by the `python_arg_parser` are “supposed” to resemble the corresponding pyi type hint signatures, I re-used the existing python object model that Jiakai defined in `tools/codegen/api/python.py`. This means that the pyi codegen now reads `native_functions.yaml`, parses it into a bunch of `PythonSignatureGroup` objects, and emits corresponding method + function variants of type-hint signatures for each one, respectively into `__init__.pyi` and `_VariableFunctions.pyi`. What makes this uglier is that pyi and the python arg parser have a number of differences in how they’re emitted. I expressed that through a `pyi` flag on the `PythonSignature` dataclass, that tells it whether or not to print itself as a pyi vs. arg_parser signature. One thing worth noting is how pyi generates signatures differently for native / deprecated op signatures. For native ops: - The pyi codegen fuses functional and out variants of each op into a single signature with an optional `out` argument. Ops without an `out` variant just get an ordinary functional signature. - Some ops that fit certain criteria also get a second “varargs” signature - basically ops with a single positional argument of type List[int]. For deprecated signatures: - Functional and out variants are not fused - they each get their own signature entry - There are no varargs signatures This is currently implemented through the `signature_str()` and `signature_str_vararg()` methods on the `PythonSignature`/`PythonSignatureDeprecated` classes. `signature_str()` knows how to print itself with/without out arguments, differently for native/deprecated ops. `signature_str_vararg()` optionally returns a vararg variant of the signature if one exists. Calling out the gap between python_arg_parser vs. pyi The two formats are notably different, so I don’t think we can expect to unify them completely. That said, I encountered a number of differences in the pyi codegen that looked wrong- I tried to call them out in the PR, to be removed later. Just as an example, looking at the `svd` signature in the python_arg_parser vs. the pyi type hint: python_arg_parser ``` Static PythonArgParser parser({ “svd(Tensor input, bool some=True, bool compute_uv=True, , TensorList[3] out=None”, }, /traceable=/true); ``` Pyi ``` def svd(input: Tensor, some: _bool=True, compute_uv: _bool=True, , out: Optional[Tensor]=None) -> namedtuple_U_S_V: … ``` The two have obvious syntactic differences that we probably don’t plan on changing: the python_arg_parser doesn’t include `def` or return types, and it includes the type hint before the variable name. But the type of `out` in pyi is probably wrong, since `svd` has multiple output params. I tried to clearly call out any instances of the pyi codegen diverging in a way that looks buggy, so we can clean it up in a later PR (see the comments for details). Another particularly ugly “bug” that I kept in to maintain byte-for-byte compatibility is the fact that the pyi codegen groups operator overloads together. It turns out that the only reason it does this (as far as I can tell) is because is tacks on an out argument to signatures that don’t have one, if ANY overloads of that op have an out variant. E.g. consider the pyi type hints generated for `nanmedian` in `_VF.pyi`: ``` overload def nanmedian(input: Tensor, , out: Optional[Tensor]=None) -> Tensor: ... overload def nanmedian(input: Tensor, dim: _int, keepdim: _bool=False, , out: Optional[Tensor]=None) -> namedtuple_values_indices: ... overload def nanmedian(input: Tensor, dim: Union[str, ellipsis, None], keepdim: _bool=False, , out: Optional[Tensor]=None) -> namedtuple_values_indices: ... ``` And the corresponding native_functions.yaml entries: ``` - func: nanmedian(Tensor self) -> Tensor - func: nanmedian.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices) - func: nanmedian.dim_values(Tensor self, int dim, bool keepdim=False, , Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices) - func: nanmedian.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices) - func: nanmedian.names_dim_values(Tensor self, Dimname dim, bool keepdim=False, , Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) ``` Signature 2 corresponds to entries 2 and 3 in native_functions, and Signature 3 corresponds to entries 4 and 5. But signature 1 has an optional out argument, even though entry 1 in native_functions.yaml has no out variant. I’d like to delete that logic in a later PR- that will also have the added benefit no longer requiring to group overloads together in the pyi codegen. We can just operate independently on each PythonSignatureGroup. More detailed accounting of the changes* Per file: gen_python_functions.py - `load_signatures()` can now skip deprecated signatures. Needed because pyi only includes deprecated functions, and skips their method variants (maybe we should add them in…?) - Moved `namedtuple_fieldnames` into python.cpp - `group_overloads()` can now opt to not sort the overloads (needed for byte-for-byte compact, pyi doesn’t sort for some reason) Python.py: - Gave `PythonSignature`and `PythonSignatureDeprecated` a `pyi` flag that tells it whether or not to print itself in pyi vs. python_arg_parser format - Added a `PythonReturns` dataclass , which is now a member of PythonSignature. It is only used by pyi. I found this useful because python returns need to know how to deal with named tuple returns properly. I also moved `namedtuple_fieldnames` into this file from gen_python_functions gen_pyi.py - Merged `get_py_torch_functions` and `get_py_variable_methods` into a single function, since they’re very similar - Lifted out all of the pyi type hint type-mapping mess and dropped it into python.py. This required updating the mapping to deal with NativeFunction objects instead of the outputs of Declarations.yaml (this was most of the logic in `type_to_python`, `arg_to_type_hint`, and `generate_type_hints`). `generate_type_hints` is now a small orchestration function that gathers the different signatures for each PythonSignatureGroup. - NamedTuples are now generated by calling `PythonReturn.named_tuple()` (in `generate_named_tuples()`), rather than appending to a global list A lot of hardcoded pyi signatures still live in `gen_pyi.py`. I didn’t look to closely into whether or not any of that can be removed as part of this PR. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25343802 Pulled By: bdhirsh fbshipit-source-id: f73e99e1afef934ff41e4aca3dabf34273459a52	2020-12-07 10:39:38 -08:00
mariosasko	f2c3efd51f	Fix generator exhaustion in SparseAdam (#47724 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47594 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47724 Reviewed By: heitorschueroff Differential Revision: D25304131 Pulled By: albanD fbshipit-source-id: 67c058b0836b9b4fba4f7b966396e4f3fa61f939	2020-12-07 09:38:07 -08:00
Ivan Kobzarev	21ba48fe49	[vulkan] test_app for mobilenetV2 on vulkan api (#48924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48924 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D25365000 Pulled By: IvanKobzarev fbshipit-source-id: 79295b5781d2494681dbb4e4a741de49ff9c058c	2020-12-07 08:44:43 -08:00
X Wang	36df25334f	Fix incorrect usage of CUDACachingAllocator [v2] (#48817 ) Summary: This is similar to https://github.com/pytorch/pytorch/issues/46605, where the c10::complex part of the code was not merged yet at that moment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48817 Reviewed By: malfet Differential Revision: D25333179 Pulled By: ezyang fbshipit-source-id: a92bdad5ad4b36bef7f050b21a59676c38e7b1fc	2020-12-07 08:27:59 -08:00
Guilherme Leobas	8bc6023d7a	Add type annotations to torch.onnx.* modules (#48782 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45215 This is a follow up PR of https://github.com/pytorch/pytorch/issues/45258 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48782 Reviewed By: heitorschueroff Differential Revision: D25304229 Pulled By: ezyang fbshipit-source-id: b01b21ddbf86f908ca08173e68b81fb25851bc81	2020-12-07 08:23:02 -08:00
Nikita Shulga	1febd2225b	Add explicit cast to cuda_atomic_ops_test.cu (#48886 ) Summary: Should fix linking error reported in https://github.com/pytorch/pytorch/issues/48870 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48886 Reviewed By: walterddr Differential Revision: D25356601 Pulled By: malfet fbshipit-source-id: 25282d4606251b27d047917f096868ddb662a723	2020-12-07 08:07:10 -08:00
Meng Zhang	00f01791a3	[Caffe2]Add more error message in ComputeBinaryBroadcastForwardDims Summary: Add more error message in ComputeBinaryBroadcastForwardDims Test Plan: buck test mode/opt caffe2/caffe2/python/operator_test:gather_ranges_op_test buck test mode/opt caffe2/caffe2/python/operator_test:reduce_ops_test buck test mode/opt caffe2/caffe2/python/operator_test:elementwise_ops_test Reviewed By: BIT-silence Differential Revision: D24949525 fbshipit-source-id: 762d913a6615a6394072f5bebbcb5cc36f0b8603	2020-12-07 07:42:49 -08:00
Gao, Xiang	a39398b9e5	CUDA BF16 norm (#48806 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/48806 Reviewed By: mruberry Differential Revision: D25358465 Pulled By: ngimel fbshipit-source-id: 1a2afd86f39e96db0754d04bf81de045b1e1235c	2020-12-06 23:41:05 -08:00
Liang Liu	19f4c5110e	Add another torch::jit::load API to load PyTorch model with shared_ptr PyTorchStreamReader input (#48802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48802 Current torch::jit::load API only supports unique_ptr ReadAdaptInterface input, but for some cases, torch::jit::load may not be the only consumer of the reader adapter. This diff enables an overload of torch::jit::load to load shared_ptr PyTorchStreamReader. Reviewed By: malfet, houseroad Differential Revision: D25241904 fbshipit-source-id: aa403bac9ed820cc0e94342aebfe524a1d5bf913	2020-12-06 18:09:25 -08:00
Vasilis Vryniotis	e429d05015	Fixing error: "member may not be initialized" due to constexpr at Windows (#48836 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48835 Fixes https://github.com/pytorch/pytorch/issues/48716 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48836 Reviewed By: malfet Differential Revision: D25335829 Pulled By: datumbox fbshipit-source-id: 807182e9afa3bb314dbb85bfcd9589a2c319a7db	2020-12-06 10:22:48 -08:00
Heitor Schueroff	ea2a568cca	Fixed einsum compatibility/performance issues (#46398 ) (#47860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47860 This PR makes torch.einsum compatible with numpy.einsum except for the sublist input option as requested here https://github.com/pytorch/pytorch/issues/21412. It also fixed 2 performance issues linked below and adds a check for reducing to torch.dot instead of torch.bmm which is faster in some cases. fixes #45854, #37628, #30194, #15671 fixes #41467 with benchmark below ```python import torch from torch.utils.benchmark import Timer a = torch.randn(10000, 100, 101, device='cuda') b = torch.randn(10000, 101, 3, device='cuda') c = torch.randn(10000, 100, 1, device='cuda') d = torch.randn(10000, 100, 1, 3, device='cuda') print(Timer( stmt='torch.einsum("bij,bjf->bif", a, b)', globals={'a': a, 'b': b} ).blocked_autorange()) print() print(Timer( stmt='torch.einsum("bic,bicf->bif", c, d)', globals={'c': c, 'd': d} ).blocked_autorange()) ``` ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413850> torch.einsum("bij,bjf->bif", a, b) Median: 4.53 ms IQR: 0.00 ms (4.53 to 4.53) 45 measurements, 1 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413700> torch.einsum("bic,bicf->bif", c, d) Median: 63.86 us IQR: 1.52 us (63.22 to 64.73) 4 measurements, 1000 runs per measurement, 1 thread ``` fixes #32591 with benchmark below ```python import torch from torch.utils.benchmark import Timer a = torch.rand(1, 1, 16, 2, 16, 2, 16, 2, 2, 2, 2, device="cuda") b = torch.rand(729, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, device="cuda") print(Timer( stmt='(a * b).sum(dim = (-3, -2, -1))', globals={'a': a, 'b': b} ).blocked_autorange()) print() print(Timer( stmt='torch.einsum("...ijk, ...ijk -> ...", a, b)', globals={'a': a, 'b': b} ).blocked_autorange()) ``` ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de28850> (a * b).sum(dim = (-3, -2, -1)) Median: 17.86 ms 2 measurements, 10 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de286a0> torch.einsum("...ijk, ...ijk -> ...", a, b) Median: 296.11 us IQR: 1.38 us (295.42 to 296.81) 662 measurements, 1 runs per measurement, 1 thread ``` TODO - [x] add support for ellipsis broadcasting - [x] fix corner case issues with sumproduct_pair - [x] update docs and add more comments - [x] add tests for error cases Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24923679 Pulled By: heitorschueroff fbshipit-source-id: 47e48822cd67bbcdadbdfc5ffa25ee8ba4c9620a	2020-12-06 08:02:37 -08:00
Yi Wang	17f53bffef	[Gradient Compression] Replace the key of error_dict in PowerSGD state with bucket index (#48867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48867 Previously the key of error_dict is the hashcode of tensor. Now replaced with bucket index. Bucket index can have a few advantages over the hashcode of tensor. 1) Error dict in the state never removes any key. If the bucket rebuild process occurs frequently, the size of error dict can increase. For now, such rebuild process is infrequent, so it is probably fine. 2) Integer index has a better readability than hashcode, and it can facilitate debugging. If the user wants to debug the tensor values, usually only a specific bucket needs to be targeted. It's easy to specify such condition (e..g, bucket_index = 0), but it's hard to specify a hashcode in advance, as it can only be determined at runtime. Note that sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape. Therefore, `and state.error_dict[bucket_index].shape[0] == padded_total_length` is added to the condition of applying the local error from the previous iteration. Deleted the arg type of `dist._GradBucket` in powerSGD_hook.py, because somehow test_run_mypy - TestTypeHints failed: AssertionError: mypy failed: torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py:128: error: "_GradBucket" has no attribute "get_index" [attr-defined] Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117951402 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl Reviewed By: rohan-varma Differential Revision: D25346347 fbshipit-source-id: 8348aa103002ec1c69e3ae759504b431140b3b0d	2020-12-05 23:53:27 -08:00
kshitij12345	2e600feda9	[numpy] `torch.sinh`: promote integer inputs to float (#48644 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48644 Reviewed By: heitorschueroff Differential Revision: D25298436 Pulled By: mruberry fbshipit-source-id: 675ad8e3c34e61fbbab77eca15048df09b09c1ed	2020-12-05 22:04:31 -08:00
Nikolay Korovaiko	195ab5e864	remove non-default settings in fuser.py (#48862 ) Summary: I've noticed we are setting `_jit_set_num_profiled_runs` to 2 (which isn't our default) and sometimes we don't. We are also setting `_jit_set_bailout_depth` to 20 which is our default. I suggest we remove this logic altogether. I did a quick run to see if there's any impact and thankfully, the numbers seem to be consistent, but we should try avoding testing configurations that aren't default or aren't considered to become default. numactl -C 3 python -m fastrnns.bench --fuser=te --executor=profiling non-defaults: ``` Namespace(cnns=None, cuda_pointwise_block_count=None, cuda_pointwise_block_size=None, cuda_pointwise_loop_level=None, device='cuda', executor='profiling', fuser='te', group=['cnns', 'rnns'], hiddenSize=512, inputSize=512, miniBatch=64, nloops=100, numLayers=1, print_json=None, rnns=None, sep=' ', seqLength=100, variable_lstms=False, warmup=10) Benchmarking LSTMs... name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd cudnn 5.057 0.06287 None 7.322 0.07404 None aten 5.602 0.06303 None 13.64 0.4078 None jit 7.019 0.07995 None 13.77 0.554 None jit_premul 5.324 0.06203 None 12.01 0.2996 None jit_premul_bias 5.148 0.08061 None 11.62 0.4104 None jit_simple 6.69 0.2317 None 13.37 0.3791 None jit_multilayer 7.006 0.251 None 13.67 0.2239 None py 19.05 0.1119 None 28.28 0.6346 None Benchmarking ResNets... name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd resnet18 8.712 0.01628 None 19.93 0.03512 None resnet18_jit 8.688 0.01374 None 19.79 0.07518 None resnet50 31.04 0.08049 None 66.44 0.08187 None resnet50_jit 31.11 0.07171 None 66.45 0.09157 None ``` defaults: ``` Namespace(cnns=None, cuda_pointwise_block_count=None, cuda_pointwise_block_size=None, cuda_pointwise_loop_level=None, device='cuda', executor='profiling', fuser='te', group=['cnns', 'rnns'], hiddenSize=512, inputSize=512, miniBatch=64, nloops=100, numLayers=1, print_json=None, rnns=None, sep=' ', seqLength=100, variable_lstms=False, warmup=10) Benchmarking LSTMs... name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd cudnn 5.086 0.115 None 7.394 0.1743 None aten 5.611 0.2559 None 13.54 0.387 None jit 7.062 0.3358 None 13.24 0.3688 None jit_premul 5.379 0.2086 None 11.57 0.3987 None jit_premul_bias 5.202 0.2127 None 11.13 0.06748 None jit_simple 6.648 0.05794 None 12.84 0.3047 None jit_multilayer 6.964 0.1104 None 13.24 0.3283 None py 19.14 0.09959 None 28.17 0.4946 None Benchmarking ResNets... name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd resnet18 8.713 0.01563 None 19.93 0.02759 None resnet18_jit 8.697 0.01792 None 19.78 0.06916 None resnet50 31.14 0.07431 None 66.57 0.07418 None resnet50_jit 31.21 0.0677 None 66.56 0.08655 None ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48862 Reviewed By: bertmaher Differential Revision: D25342097 Pulled By: Krovatkin fbshipit-source-id: 8d2f72c2770793ec8cecee9dfab9aaaf2e1ad2b1	2020-12-05 20:58:39 -08:00
Ivan Yashchuk	85121a7a0f	Added CUDA support for complex input for torch.cholesky_solve (#47047 ) Summary: `torch.cholesky_solve` now works for complex inputs on GPU. I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes. Differentiation also works correctly with complex inputs now. Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47047 Reviewed By: ngimel Differential Revision: D24730020 Pulled By: mruberry fbshipit-source-id: 95402da5789c56e5a682019790985207fa28fa1f	2020-12-05 20:18:30 -08:00
Mike Ruberry	5de22d3f69	Removes redundant method_test entries (#48828 ) Summary: Now that Lilyjjo's [stack of OpInfo updates](https://github.com/pytorch/pytorch/pull/48627) is landed, we can port method_test entries to OpInfos. This PR doesn't port any method_test entries, but it removes redundant entries. These entries previously tested both multi-dim and zero-dim tensors, so a new zero-dim tensor input is added to UnaryUfuncInfo's sample inputs. To recap, this PR: - removes method_test() entries that are redundant with OpInfo entries - adds a new sample input to unary ufunc OpInfos that tests them on 0d tensors cc kshitij12345 as an fyi. Going forward we should have a goal of not only porting all the MathTestMeta objects to use the OpInfo pattern but also all the current method_test entries. For each entry the function needs to be added as an OpInfo and the inputs need to be added as sample inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48828 Reviewed By: malfet Differential Revision: D25336071 Pulled By: mruberry fbshipit-source-id: 6b3f6c347195233d6b8ad57e2be68fd772663d9b	2020-12-05 19:25:29 -08:00
Mike Ruberry	0185a05ceb	Revert D25338250: [pytorch][PR] [BE] Fix signed-unsigned warnings Test Plan: revert-hammer Differential Revision: D25338250 (`6317e0b2f1`) Original commit changeset: e840618b113b fbshipit-source-id: dbecb068892dc118f257fe5c50692ede2b2462ca	2020-12-05 18:08:22 -08:00
James Reed	ae9f39eb58	[FX][1/2] Make docstrings pretty when rendered (#48738 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48738 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D25280867 Pulled By: jamesr66a fbshipit-source-id: d08641c19a6c69b4042389c800a48e699f0be628	2020-12-05 17:23:40 -08:00
Newsha Ardalani	0fb58d76a1	Support ArgMin in c2_pt_converter Summary: + Add ArgMin support to Caffe2 to PyTorch converter + Using hypothesis to parameterize different conditions for test Test Plan: buck test //caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test Reviewed By: houseroad Differential Revision: D25016203 fbshipit-source-id: 94489fcf1ed3183ec96f9796a5b4fb348fbde5bc	2020-12-05 16:35:34 -08:00
Ashkan Aliabadi	251398acca	Force a sync on non-CPU tensors for the benchmark to reflect the timing accurately. (#48856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48856 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D25339803 Pulled By: AshkanAliabadi fbshipit-source-id: fdfd9a0e0cc37245d7671419f492e445396fbdb8	2020-12-05 10:47:44 -08:00
Vasiliy Kuznetsov	0923d19601	fx quant: add types to quantization_patterns (#48851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48851 Adding typing to improve readability. Note: this uncovered a few missing return statements, we should fix that before landing. Test Plan: ``` mypy torch/quantization/ ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25338644 fbshipit-source-id: 0ac4405db05fdd2737bc3415217bc1937c2db684	2020-12-05 08:47:18 -08:00
Vasiliy Kuznetsov	fa5f7d87bf	fx quant: add typing for fuser (#48844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48844 Add types to function I/O for `Fuser` to improve readability Test Plan: ``` mypy torch/quantization/ ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25337314 fbshipit-source-id: e5074d71c7834f24975169d36bf49357e53650ff	2020-12-05 08:44:32 -08:00
Jeff Daily	63a71a82cf	[ROCm] add 3.10 to nightly builds (#48866 ) Summary: Depends on https://github.com/pytorch/builder/pull/603. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48866 Reviewed By: malfet, janeyx99 Differential Revision: D25345895 Pulled By: walterddr fbshipit-source-id: 5d1c754b36fa7ebd60832af58cbcbed2bc0da3bd	2020-12-05 06:56:17 -08:00
Wang Xu	799b700ada	add a unit test for lack of devices (#48858 ) Summary: add a unit test for the situation where devices have no enough memory Pull Request resolved: https://github.com/pytorch/pytorch/pull/48858 Reviewed By: malfet, gcatron Differential Revision: D25341254 Pulled By: scottxu0730 fbshipit-source-id: c0524c22717b6c8afd67f5b0ad0f1851b973e4b7	2020-12-05 06:09:04 -08:00
Peter Bell	5180caeeb4	Remove deprecated spectral ops from torch namespace (#48594 ) Summary: Ref https://github.com/pytorch/pytorch/issues/42175 This removes the 4 deprecated spectral functions: `torch.{fft,rfft,ifft,irfft}`. `torch.fft` is also now imported by by default. The actual `at::native` functions are still used in `torch.stft` so can't be full removed yet. But will once https://github.com/pytorch/pytorch/issues/47601 has been merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48594 Reviewed By: heitorschueroff Differential Revision: D25298929 Pulled By: mruberry fbshipit-source-id: e36737fe8192fcd16f7e6310f8b49de478e63bf0	2020-12-05 04:12:32 -08:00
Yi Wang	7439bc4dd6	[Gradient Compression] Add an index field to GradBucket for PowerSGD (#48757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48757 Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor. The replacement will be done in a separate diff, as the definition of this new method somehow couldn't be recognized in the OSS version. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117939208 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl Reviewed By: rohan-varma Differential Revision: D25288496 fbshipit-source-id: 6f71977809690a0367e408bd59601ee62c9c03ea	2020-12-05 01:39:58 -08:00
Nikita Shulga	6317e0b2f1	[BE] Fix signed-unsigned warnings (#48848 ) Summary: Switch to range loops when possible Replace `ptrdiff_t`(signed type) with `size_t`(unsigned type) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48848 Reviewed By: walterddr Differential Revision: D25338250 Pulled By: malfet fbshipit-source-id: e840618b113b8bc0d8bb067c2fdf06e3ec9233d4	2020-12-04 23:15:28 -08:00
Scott Wolchok	55b93735ac	[PyTorch] Save refcount decrements in StaticRuntime::deallocate_registers (#48859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48859 Code comment should explain what's going on. If not, please request changes. ghstack-source-id: 117889942 Test Plan: Internal benchmarks Reviewed By: hlu1 Differential Revision: D25288842 fbshipit-source-id: 6bddebb99c4744e2f7aceb279fdf995821404606	2020-12-04 21:47:00 -08:00
James Donald	af30a89068	[caffe2][a10] Remove unreferenced local variable e (#48601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48601 Fix this spurious warning: ``` caffe2\aten\src\aten\core\ivalue_inl.h(412): warning C4101: 'e': unreferenced local variable ``` Test Plan: Local build & continuous integration Reviewed By: gmagogsfm Differential Revision: D25194281 fbshipit-source-id: 3ba469d1cbff6f16394b95c4c33d95efcaea5e3e	2020-12-04 21:14:25 -08:00
Scott Wolchok	f0f315c33b	[PyTorch] Inline RecordFunctionCallback::shouldRun (#48286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48286 RecordFunction initialization is a hot path. shouldRun often does little enough work that the function prologue takes a significant proportion of its time. So, this diff forces it to be inline. ghstack-source-id: 117892387 Test Plan: FB-internal benchmarks Reviewed By: ezyang Differential Revision: D25108879 fbshipit-source-id: 7121413e714c5ca22c8bf10c1d2535a878c15aec	2020-12-04 20:48:39 -08:00
Rohan Varma	02d89f9f1d	scatter_object_list API for c10d (#43930 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43930 Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as https://github.com/pytorch/pytorch/pull/42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. ghstack-source-id: 117904065 Reviewed By: mrshenli Differential Revision: D23430686 fbshipit-source-id: f033b89cd82dadd194f2b036312a98423449c26b	2020-12-04 18:55:57 -08:00
Yanan Cao	a3298c2f64	Implement JIT serialization of ProcessGroup (#48544 ) Summary: This diff enables JIT serialization of `ProcessGroup`, including both base `ProcessGroup` class and derived classes like `ProcessGroupNCCL`. If a `ProcessGroup` is created via high-level APIs like `dist_c10d.frontend().new_process_group_helper()`, they are automatically serializable. If a `ProcessGroup` is created via its derived class TorchBind APIs like `dist_c10d.ProcessGroupNCCL()`, then it has to be given a name and registered with `dist_c10d.frontend().register_process_group_name` to be uniquely identifiable and serializable. * Fixed a minor bug in new dist_c10d frontend which fails to check whether a process group is used or not * Fixed an issue where `test_jit_c10d.py` wasn't really run due to a configuration bug. Now tests are run as a slow test (need ci-all/* branch) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48544 Reviewed By: wanchaol Differential Revision: D25298309 Pulled By: gmagogsfm fbshipit-source-id: ed27ce37373c88277dc0c78704c48d4c19d46d46	2020-12-04 18:44:38 -08:00
Scott Wolchok	3f10518def	[PyTorch] Add VariableVersion&& overload for TensorImpl::shallow_copy_and_detach (#48681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48681 This should reduce reference counting traffic when creating views. The code duplication here is unfortunate and I'm open to suggestions on how to reduce it. It's especially regrettable that we create a footgun for subclasses of TensorImpl: they can accidentally override only one of the two overloads and get confusing behavior. ghstack-source-id: 117896685 Test Plan: internal benchmarks Reviewed By: ezyang Differential Revision: D25259741 fbshipit-source-id: 55f99b16b50f9791fdab85cbc81d7cd14e31c4cf	2020-12-04 18:41:43 -08:00
Scott Wolchok	9e10e3b74f	[PyTorch] Move TensorImpl::shallow_copy_and_detach to .cpp file (#48680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48680 It seems a bit long to put into the header (and is virtual anyway). ghstack-source-id: 117894350 Test Plan: CI Reviewed By: bhosmer Differential Revision: D25259848 fbshipit-source-id: e3eed1f2483fc3c1ff51459159bf3bfed9d6f363	2020-12-04 18:36:56 -08:00
Horace He	092e52a4da	[fx]added prototype of to_folder (#47544 ) Summary: What this does is that given a `FxModule foo`, you can call `foo.to_folder('foo_folder', 'Foo')` and dump the current FX module into runnable Python code. That is ``` foo = <fxModule> foo = foo.to_folder('bar', 'Foo') from bar import Foo foo2 = Foo() forall x, foo2(x) == Foo(x) ``` This has several use cases, largely lifted from jamesr66a's doc here: https://fb.quip.com/U6KHAFaP2cWa (FB-internal). 1. As we apply more heavy-weight function transformations with FX, figuring out what's going on can be quite a difficult experience. In particular, things that can typically be used for debugging (like `print` or `import pdb; pdb.set_trace()`) no longer work. This is particularly necessary if you're using a FX transform like `grad` or `vmap. With this, you simply open up the dumped file, and add `print`/`pdb` statements wherever you'd like. 2. This also provides an immense amount of user control. Some potential use-cases: - Let's say an existing FX transform has some bug, or generates suboptimal code. Instead of needing to modify that FX transform, writing another FX pass that fixes the suboptimal code, or simply giving up on FX, they can workaround it by simply modifying the resulting code themselves. - This allows users to check in their FX modules into source control. - You could even imagine using this as part of some code-gen type workflow, where you write a function, `vmap` it to get the function you actually want, and then simply copy the output of the `vmap` function without needing FX at all in the final code. An example: ```python class Test(nn.Module): def __init__(self): super(Test, self).__init__() self.W = torch.nn.Parameter(torch.randn(2)) self.linear = nn.Linear(2, 2) self.attr = torch.randn(2) self.attr2 = torch.randn(2) def forward(self, x): return self.linear(self.W + (self.attr + self.attr2) + x) mod = fx.symbolic_trace(Test()) mod.to_folder('foo', 'Foo') ``` results in ```python import torch class Foo(torch.nn.Module): def __init__(self): super().__init__() state_dict = torch.load('foo/state_dict.pt') self.linear = torch.load('foo/linear.pt') # Linear(in_features=2, out_features=2, bias=True) self.__tensor_constant0 = state_dict['__tensor_constant0'] self.W = torch.nn.Parameter(state_dict['W']) def forward(self, x): w = self.W tensor_constant0 = self.__tensor_constant0 add_1 = w + tensor_constant0 add_2 = add_1 + x linear_1 = self.linear(add_2) return linear_1 ``` Some current issues: 1. How do you actually ... save things like modules or parameters? I don't think FX is in the business of tracking initializations and such. Thus, the only way I see to do it is to dump the parameters/modules as blobs, and then load them in the generated initialization. This is a somewhat subpar user experience, and perhaps prevents it from being in some use cases (ie: you would need to check in the blobs into source control to save the model). 2. Currently, the only "atomic" modules we have are those in `torch.nn`. However, if we want to allow flexibility in this, and for example, allow "atomic" modules that are user-defined, then it's not clear how to allow those to be dumped in a way that we can then load elsewhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47544 Reviewed By: jamesr66a Differential Revision: D25232917 Pulled By: Chillee fbshipit-source-id: fd2b61a5f40e614fc94256a2957ed1d57fcf5492	2020-12-04 18:33:27 -08:00
Jagadish Krishnamoorthy	03abd81b8d	[ROCm] Enable skipped distributed global tests (#48023 ) Summary: The PR https://github.com/pytorch/pytorch/issues/47898 fixes the global tests. Hence enabling the tests. Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/48023 Reviewed By: malfet, H-Huang Differential Revision: D25347289 Pulled By: rohan-varma fbshipit-source-id: 2b519a3046eae1cf1bfba98a125c09b4a6b01fde	2020-12-04 18:16:02 -08:00
Bert Maher	9bb87fa58b	[te] Fix spacing in graph dump (#48829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48829 The first line was a run-on. ghstack-source-id: 117845927 Test Plan: visual inspection Reviewed By: ZolotukhinM Differential Revision: D25326136 fbshipit-source-id: 3f46ad20aee5ed523b64d852d382eb06f4d60369	2020-12-04 18:10:44 -08:00
Bert Maher	2d07d5b50a	[te] Don't fuse integer fmod or remainder (#48700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48700 fmod and remainder on int tensors will raise ZeroDivisionError if their divisors are 0. I don't think we should try to generate code that raises exceptions. If at some point we really wanted to fuse these, I might lean towards calling a C++ helper function from the generated code. ghstack-source-id: 117845642 Test Plan: `buck test //caffe2/test:jit -- test_binary_ops` Reviewed By: eellison Differential Revision: D25265792 fbshipit-source-id: 0be56ba3feafa1dbf3c37f6bb8c1550cb6891e6d	2020-12-04 18:02:29 -08:00
Nikita Shulga	5654fc8edd	Revert D25293474: [pytorch][PR] Server connects to its listen socket addr Test Plan: revert-hammer Differential Revision: D25293474 (`7c9ba62130`) Original commit changeset: 15f75dab48a4 fbshipit-source-id: 71ca136f2aa3204ad49f76c604f51c477cba270a	2020-12-04 17:08:03 -08:00
Pritam Damania	4b8d965f18	Revert D25292656: [pytorch][PR] Support torch.distributed.irecv(src=None, ...) Test Plan: revert-hammer Differential Revision: D25292656 (`4eb4db7c30`) Original commit changeset: beb018ba0b67 fbshipit-source-id: 5a13055e50ed90731fee431e81c09a1871f6cc03	2020-12-04 16:57:06 -08:00
Lu Fang	212ec07cb7	Support torchbind as attribute in torch.fx symbolic tracing (#48732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48732 add support for ScriptObject as attributes in symbolic trace. Test Plan: OSS CI Reviewed By: jamesr66a Differential Revision: D25116185 fbshipit-source-id: c61993c84279fcb3c91f1d44fb952a8d80d0e552	2020-12-04 16:21:44 -08:00
Nikita Shulga	b9cd774e29	Get rid of printf in cuda fuser debugPrint() (#46994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46994 Reviewed By: raghuramank100, mruberry Differential Revision: D25342954 Pulled By: malfet fbshipit-source-id: 549b5b072f7f70877261a155e989a21072ec49d8	2020-12-04 15:13:26 -08:00
Ruichao Xiao	ca3ae7dc73	[DI] create a new key for threadLocalDebugInfo (#48762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48762 In distributed inference, we want to use a new type info to pass some information to operators. add a new key to threadLocalDebugInfo to unblock the development. Test Plan: Only add a new key. Should have not effect on current build. Reviewed By: dzhulgakov Differential Revision: D25291242 fbshipit-source-id: c71565ff7a38cc514d7cd65246c7d5f6b2ce3b8b	2020-12-04 15:05:45 -08:00
Scott Wolchok	0f9823d888	[PyTorch] Save some space in ProcessedNode (#48861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48861 `std::function` already has an empty state; no need to wrap it in `c10::Optional`. ghstack-source-id: 117891382 Reviewed By: hlu1 Differential Revision: D25296912 fbshipit-source-id: 8291bcf11735d49db17415b5de915591ee65f781	2020-12-04 14:42:20 -08:00
Rahul Manghwani	142b21fd44	Add SparseLengthsSum4BitRowwiseSparse in c2_pt_converter (#48240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48240 Adds the support for converting the SparseLengthsSum4BitRowwiseSparse operator from caffe2 to pytorch as a part of c2_pt_converter Test Plan: Added a unit tested buck test //caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test Tests Passed : https://our.intern.facebook.com/intern/testinfra/testrun/2251799856412296 Reviewed By: houseroad Differential Revision: D25067833 fbshipit-source-id: 45cbc331ca35bee27e083714e65a1e87a2a2d2e0	2020-12-04 14:16:25 -08:00
Tom Birch	4eb4db7c30	Support torch.distributed.irecv(src=None, ...) (#47137 ) Summary: Calling torch.distributed.irecv(src=None) fails with "The global rank None is not part of the group". This change calls recv_anysource if src is None. Tested locally with MPI backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47137 Reviewed By: heitorschueroff Differential Revision: D25292656 fbshipit-source-id: beb018ba0b676924aeaabeb4a4d6acf96e4a1926	2020-12-04 13:56:36 -08:00
James Reed	e1f9542d00	Revert D23898398: [Mask R-CNN]Add Int8 AABB Generate proposals Op Test Plan: revert-hammer Differential Revision: D23898398 (`714c7020ee`) Original commit changeset: fb5f6d6ed8a5 fbshipit-source-id: 05284ff4db6c05fff3f4a6bb80f665e87c0bf085	2020-12-04 13:34:55 -08:00
Zrss	7c9ba62130	Server connects to its listen socket addr (#46801 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46800 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46801 Reviewed By: heitorschueroff Differential Revision: D25293474 fbshipit-source-id: 15f75dab48a4360645436360c216885cf3bd5667	2020-12-04 13:21:57 -08:00
jsrozner	42e6951e62	Remove save_state_warning in LambdaLR (#46813 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46405, https://github.com/pytorch/pytorch/issues/43352 I updated the docstring in the local file (function level comments). Do I also need to edit somewhere else or recompile docstrings? Also, though I didn't change any types here, how is typing (for IDE type checking) documentation generated / used)? Pull Request resolved: https://github.com/pytorch/pytorch/pull/46813 Reviewed By: ezyang Differential Revision: D24923112 Pulled By: vincentqb fbshipit-source-id: be7818e0d4593bfc5d74023b9c361ac2a538589a	2020-12-04 13:19:59 -08:00
Anshul Jain (B*8)	714c7020ee	[Mask R-CNN]Add Int8 AABB Generate proposals Op Summary: Adds support for additional Eigen Utils for custom type defs. Reviewed By: vkuzo Differential Revision: D23898398 fbshipit-source-id: fb5f6d6ed8a56e6244f4f0cb419140b365ff7a82	2020-12-04 13:00:34 -08:00
Oleg Khabinov	ba3962f5f0	[Onnxifi] Warmup cache of output shapes (#48346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48346 Onnxifi now accepts output shape info for all possible batch sizes. This is used to avoid doing shape inference inside `OnnxifiOp::extractOutputBatchSizes()`. FB: In this diff we try to pre-calculate output shapes for all possible batch sizes inside `PredictorContainer` where we supposedly have enough data to do so. This data is then passed down to OnnxifiOp. Here is the dependency graph that I built manually trying to understand the entire flow. https://pxl.cl/1rQRv Test Plan: Strobelight data https://fburl.com/strobelight/jlhhgt21 shows that `OnnxifiOp::RunOnDevice()` now takes only 2.17% of CPU instead of ~20% CPU with the current implementation. Also, the current implementation takes dozens of milliseconds according to ipiszy: > After adding more logs, I found each shapeinference call actually takes 40~50ms. I also added added time measurements temporarily for `OnnxifiOp::extractOutputBatchSizes()`. New impenentation typically consumes 1 to 4 microseconds, and, when data for current bs is not present yet in `output_reshape_info_`, it takes 20-40 microseconds which is still much better than the current implementation. AF canary https://www.internalfb.com/intern/ads/canary/431357944274985799 AI canary https://www.internalfb.com/intern/ads/canary/431365503038313840 Verifying using test tier https://pxl.cl/1sZ4S Reviewed By: yinghai, ipiszy Differential Revision: D25047110 fbshipit-source-id: 872dc1578a1e8e7c3ade5f5e2711e77ba290a671	2020-12-04 12:54:41 -08:00
Elias Ellison	0a42003f8f	[TensorExpr Fuser] Handle fusing values with un-profiled uses (#48689 ) Summary: Copying myself from the code comments: A value can be profiled with differently typed uses. This can occur from: - having a use which is not executed, so the type will be TensorType::get() - control-flow that depends on tensor type: if x.size() == 2 op(x) else op(x) - mutation of the value on a field represented in the tensor type op(x); x.resize_([...]); op(x) The most common case today with num_profiles = 1 is from the first case. Here we can just ignore non-profiled uses, and choose any of the profiled uses. Because we guard all tensor types in the runtime, even if we set a Value to have a profiled type from one use and then execute a use with a different profiled type, we will still be correct. In the future we could consider unifying the types of uses, or adding a type refinement node so uses can have the correct corresponding type. Fix for https://github.com/pytorch/pytorch/issues/48043 I think there's probably too much context required for that to be a good bootcamp task... There was an observed missed fusion opportunity in detectron2 because of this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48689 Reviewed By: ngimel Differential Revision: D25278791 Pulled By: eellison fbshipit-source-id: 443e5e1254446a31cc895a275b5f1ac3798c327f	2020-12-04 12:48:10 -08:00
Michael Carilli	31808dcdd8	[RELAND] [CUDA graphs] Make CUDAGeneratorImpl capturable (ci-all edition) (#48694 ) Summary: Resubmission of https://github.com/pytorch/pytorch/pull/47989 with attempted fix for the unexpected context creation that caused revert (https://github.com/pytorch/pytorch/pull/47989#issuecomment-736689145). Submitting from a ci-all branch because the failing test isn't public. Diffs relative to master should be the same as https://github.com/pytorch/pytorch/pull/47989 's approved diffs, aside from the fix itself `a5c80f63d3`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48694 Reviewed By: mruberry Differential Revision: D25291431 Pulled By: ngimel fbshipit-source-id: 8c27f85c64eecaf1f5cb925020fa6d38a07ff095	2020-12-04 12:35:46 -08:00
Wang Xu	9af627fda1	fix some typos in the fx ir test_fx_experiemntal (#48847 ) Summary: fix some typos in test_fx_experimental.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/48847 Reviewed By: malfet, gcatron Differential Revision: D25339391 Pulled By: scottxu0730 fbshipit-source-id: 388d9da94259d2b306d59f3f4a167e486ac06d60	2020-12-04 12:18:36 -08:00
Rohan Varma	a5fb12d168	RRef proxy support for ScriptModule methods (#48339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48339 Closes https://github.com/pytorch/pytorch/issues/48294 https://github.com/pytorch/pytorch/pull/48293 added creation and transfer of ScriptModule over RPC in python, but it did not work with ScriptModule. This PR makes the above work with ScriptModule as per a discussion with mrshenli: 1) We remove the `hasattr()` check and just let Python throw the exception as it would when accessing the py function with `getattr` 2) We condition on `issubclass(type, ScriptModule)` when checking if it is wrapped with async_function, because `ScriptModule` does not have getattr implemented (this is because ScriptModule forward/function is not a python function, it is a torchscript specific function): ``` torch/jit/_script.py", line 229, in __get__ return self.__getattr__("forward") # type: ignore AttributeError: '_CachedForward' object has no attribute '__getattr__' ``` ghstack-source-id: 117631795 Test Plan: Modified ut Reviewed By: wanchaol Differential Revision: D25134423 fbshipit-source-id: 918ca88891c7b0531325f046b61f28947575cff0	2020-12-04 11:33:16 -08:00
Jerry Zhang	fadec77c30	[quant][fx][graphmode] Renable torchvision test (#48602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48602 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D25224917 fbshipit-source-id: efc73f425253c4eb7ae51064b6760416097f0437	2020-12-04 10:13:38 -08:00
Jeff Daily	07d185ef05	[ROCm] add 3.10 docker image (#48791 ) Summary: Add a ROCm 3.10 docker image for CI. Keep the 3.9 image and remove the 3.8 image. Plan is to keep two ROCm versions at a time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48791 Reviewed By: janeyx99 Differential Revision: D25307102 Pulled By: walterddr fbshipit-source-id: 88371aafd07db7c5d0dd210759bb7c3aac1f0187	2020-12-04 08:37:31 -08:00
Peng Wu	bc2352e8c3	[NNC] Complete SimpleIREvaluator support for bitwise ops (#48053 ) (#48179 ) Summary: Add missing types for bitwise_ops in `SimpleIREvaluator` This is the first part of fixes for issue https://github.com/pytorch/pytorch/issues/48053. - Original implementation of bitwise_ops supports only int operands, the fix all support for integral types supported by the IR Pull Request resolved: https://github.com/pytorch/pytorch/pull/48179 Test Plan: `python test/test_jit_fuser_te.py TestTEFuser.test_bitwise_ops` Reviewed By: ZolotukhinM Differential Revision: D25126944 Pulled By: penguinwu fbshipit-source-id: 04dc7fc00c93b2bf1bd9f9cd09f7252357840b85	2020-12-04 08:10:18 -08:00
Emile van Krieken	3a0d4240c3	Fix broadcast_all crashing on Tensor-likes (#48169 ) Summary: This ensures Tensor-likes that implement `__torch_function__` are properly handled by `torch.distributions.utils.broadcast_all`. See Issue https://github.com/pytorch/pytorch/issues/37141 . In this implementation, Number's will not be cast to the dtype of Tensor-likes. Fixes https://github.com/pytorch/pytorch/issues/37141 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48169 Reviewed By: izdeby Differential Revision: D25091414 Pulled By: walterddr fbshipit-source-id: c5c99374b02409393a68dcb85e2f8feab154318f	2020-12-04 07:32:22 -08:00
Edward Yang	eb43e12ee4	Revert D25277886: [pytorch][PR] Replace constexpr with CONSTEXPR_EXCEPT_WIN_CUDA Test Plan: revert-hammer Differential Revision: D25277886 (`0484b048d0`) Original commit changeset: eb845db35d31 fbshipit-source-id: 133b938ff8ae1aa54878a03ea5a7e732c6bd5901	2020-12-04 07:08:35 -08:00
kiyosora	6ab84ca0f3	Implement NumPy-like function torch.msort() (#48440 ) Summary: - Related with https://github.com/pytorch/pytorch/issues/38349 - Implementing the NumPy-like function `torch.msort()` . Pull Request resolved: https://github.com/pytorch/pytorch/pull/48440 Reviewed By: bdhirsh Differential Revision: D25265753 Pulled By: mruberry fbshipit-source-id: 7709ac5e5667e7541a3dc9048b9c9896b1a6dfa1	2020-12-04 04:32:09 -08:00
Ivan Yashchuk	cb285080b0	Added computing matrix condition numbers (linalg.cond) (#45832 ) Summary: This PR adds `torch.linalg.cond` for NumPy compatibility. Ref https://github.com/pytorch/pytorch/issues/42666. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45832 Reviewed By: ngimel Differential Revision: D25183690 Pulled By: mruberry fbshipit-source-id: a727959bfec2bc2dc36df59d9ef79c0534b68194	2020-12-04 02:23:57 -08:00
Huamin Li	4cc163f8ec	Add deadline to fakelowp tests (#48823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48823 deadline=None is not good because Sandcastle tests will return success for tests timeout (default flag), and we cannot efficiently detect broken tests if there is any. In addition, the return signal for timeout is 64, which is same as skip test. Test Plan: Sandcastle, and run tests on card Reviewed By: hyuen Differential Revision: D25318184 fbshipit-source-id: de1b55a259edb2452fb51ba4c598ab8cca9e76b7	2020-12-04 00:45:33 -08:00
Ivan Kobzarev	2181ff89bb	[vulkan][test] Not use non 1 dilation for conv2d (#48800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48800 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D25312276 Pulled By: IvanKobzarev fbshipit-source-id: edb36c284ddb79969cbc4e774f11d85f14b39343	2020-12-03 23:45:01 -08:00
shubhambhokare1	5fd61de99e	[ONNX] Added hardswish symbolic in opset 9 (#48423 ) Summary: Adds support for torch.nn.Hardswish operator in Export Fixes https://github.com/pytorch/pytorch/issues/43665 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48423 Reviewed By: heitorschueroff Differential Revision: D25309868 Pulled By: bzinodev fbshipit-source-id: f5583eb01b1b0e8f0bc95d5054941dd29605d6a5	2020-12-03 23:22:21 -08:00
neginraoof	15bc21c280	[ONNX] Track and list model params for scripting (#47348 ) Summary: List model parameters as inputs following freezing script module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47348 Reviewed By: heitorschueroff Differential Revision: D25309756 Pulled By: bzinodev fbshipit-source-id: cbe679ece934d5e6c418a22f08c1662256914c4c	2020-12-03 23:07:28 -08:00
David	f065087567	[ONNX] Handle dynamic input axes for prim_ConstantChunk (#48176 ) Summary: When converting a model that uses `torch.chunk`, it does not work when we have a dynamic input axes, because `Split` split attr is static for opset 11. Therefore, we convert it using `Slice` (support opset 11+). This PR also handles the cases that the input axes cannot be divided by the number of outputs. Pytorch works a way that fit the first (n-1) outputs for the same dim, and remaining for the last one. Added UT for it. The existing code on `sequence` `split` cannot be leveraged here, because `start`, `end` of `Slice` are static there, but dynamic here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48176 Reviewed By: bdhirsh Differential Revision: D25274862 Pulled By: bzinodev fbshipit-source-id: 7d213a7605ad128aca133c057d6dd86c65cc6de9	2020-12-03 21:59:26 -08:00
Tongzhou Wang	86540dbf41	Fix jit doc model loading example (#48104 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/48104 Reviewed By: jamesr66a Differential Revision: D25028353 Pulled By: suo fbshipit-source-id: aaf74a40e7150a278d100e129740cfe1cef99af2	2020-12-03 20:47:20 -08:00
Anthony Shoumikhin	c55d45f04b	[qnnpack] Fix unused var warning when building for different archs. (#48730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48730 . Test Plan: CI Reviewed By: kimishpatel Differential Revision: D25273068 fbshipit-source-id: 3a0cea633bf1c02fa3176b3b3f43db46d2beb861	2020-12-03 19:46:06 -08:00
Vasiliy Kuznetsov	f5d94244b2	fx quant: more typehints, part 3 (#48794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48794 Adds typehints to function I/O in `torch/quantization/quantize_fx.py`, for readability. Test Plan: ``` mypy torch/quantization/ ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25307084 fbshipit-source-id: 67bdf95b78836dcabc7d829e1854ca5b8ceb8346	2020-12-03 19:28:16 -08:00
Vasiliy Kuznetsov	54da2dadd8	fx quant: more typehints, part 2 (#48792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48792 Adds some more typehints throughout quantization/fx/quantize.py, to help with readability. Test Plan: ``` mypy torch/quantization/fx/quantize.py ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25306683 fbshipit-source-id: fc38b885a2cb5bf2c6d23b6305658704c6eb7811	2020-12-03 19:28:12 -08:00
Vasiliy Kuznetsov	f5bcf45e3b	fx quant: add more typehints (#48774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48774 Adds some more typehints throughout `quantization/fx/quantize.py`. More are needed, ran out of time for now, we can continue in future PRs. Test Plan: ``` mypy torch/quantization/fx/quantize.py ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25295836 fbshipit-source-id: 4029aa8ea5b07ce9a57e4be6a66314d7a4e19585	2020-12-03 19:28:09 -08:00
Vasiliy Kuznetsov	c98c617b44	fx quant: clean up functions in _prepare (#48773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48773 Makes util functions in `_prepare` have no side effects, all dependencies are now in arguments. Note: arg names are added in order as they appeared in function code. It's not the most readable, but the lowest risk. This can be cleaned up in future PRs if needed. ``` python test/test_quantization.py TestQuantizeFx ``` Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25295839 fbshipit-source-id: 60c687f6b64924473f969541c8703118e4f7d16e	2020-12-03 19:28:06 -08:00
Vasiliy Kuznetsov	536352e86f	fx quant: clean up functions in _generate_qconfig_map (#48772 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48772 Makes util functions in `_generate_qconfig_map` have no side effects, all dependencies are now in arguments. Test Plan: ``` python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25295837 fbshipit-source-id: 49399abef626234e34bb5ec8c6d870da3c1760e7	2020-12-03 19:25:38 -08:00
shubhambhokare1	16fd1c32c5	[ONNX] Update batch_norm symbolic to handle track_running_stats=False (#47903 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45333 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47903 Reviewed By: ejguan Differential Revision: D25097509 Pulled By: bzinodev fbshipit-source-id: 5584dac1150b13d4e0a6e0c39ac2f2caf41d3b38	2020-12-03 17:31:03 -08:00
Gao, Xiang	cf1e5d7d2b	Ignore MSVC's pdb file (#47963 ) Summary: These files are generated by MSVC when building with debug symbols `REL_WITH_DEB_INFO=1`: ``` PS C:\Users\Xiang Gao\source\repos\pytorch> git status On branch master Your branch is up to date with 'origin/master'. Untracked files: (use "git add <file>..." to include in what will be committed) torch/lib/asmjit.pdb torch/lib/c10.pdb torch/lib/c10_cuda.pdb torch/lib/caffe2_detectron_ops_gpu.pdb torch/lib/caffe2_module_test_dynamic.pdb torch/lib/caffe2_observers.pdb torch/lib/fbgemm.pdb torch/lib/shm.pdb torch/lib/torch_cpu.pdb torch/lib/torch_cuda.pdb nothing added to commit but untracked files present (use "git add" to track) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47963 Reviewed By: heitorschueroff Differential Revision: D25311564 Pulled By: malfet fbshipit-source-id: 1a7125f3c6ff296b4bb0975ee97b59c23586b1cb	2020-12-03 16:11:24 -08:00
Stephen Jia	cc1c3063c5	Add test binary to compare torch model outputs (#47933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47933 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D25309199 Pulled By: SS-JIA fbshipit-source-id: adc3fc7db33c251f6b661916265b86b7b8c68fc2	2020-12-03 15:29:56 -08:00
Meghan Lele	b3ac628081	[JIT] Fix bug in get_annotation_str for ast.Subscript (#48741 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48741 Summary This commit fixes a bug in the handling of `ast.Subscript` inside `get_annotation_str`. `annotation.value` (which contains the AST node representing the container name) should also be processed using `get_annotation_str`. Test Plan This commit adds a unit test to `TestClassType` based on the test case from the issue that reported this bug. Fixes This commit fixes #47570. Test Plan: Imported from OSS Reviewed By: ppwwyyxx Differential Revision: D25286013 Pulled By: SplitInfinity fbshipit-source-id: 61a9e5dc16d9f87b80578f78d537f91332093e52	2020-12-03 14:41:02 -08:00
Richard Barnes	e7038a7725	Improve an autograd warning (#48765 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48764 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48765 Reviewed By: heitorschueroff Differential Revision: D25304145 Pulled By: albanD fbshipit-source-id: e818413bf92ad0aa382eda77448183b9fd7d5e77	2020-12-03 12:39:10 -08:00
pinzhenx	1eed54d17a	Upgrade oneDNN (mkl-dnn) to v1.7 (#47853 ) Summary: Bump oneDNN (mkl-dnn) to 1.7 for bug fixes and performance optimizations - Fixes https://github.com/pytorch/pytorch/issues/42115. Fixed build issue on Windows for the case when oneDNN is built as submodule - Fixes https://github.com/pytorch/pytorch/issues/45746. Fixed segmentation fault for convolution weight gradient on systems with Intel AVX512 support This PR also contains a few changes in ideep for follow-up update (not enabled in current PR yet): - Performance improvements for the CPU path of Convolution - Channel-last support Pull Request resolved: https://github.com/pytorch/pytorch/pull/47853 Reviewed By: bdhirsh Differential Revision: D25275268 Pulled By: VitalyFedyunin fbshipit-source-id: 75a589d57e3d19a7f23272a67045ad7494f1bdbe	2020-12-03 11:54:31 -08:00
x00480351	47aa253632	[Feature] Allow user to specify a fraction of the GPU memory. (#48172 ) Summary: Add a new function, torch.cuda.set_per_process_memory_fraction(fraction, device), to torch.cuda. Related: https://github.com/pytorch/pytorch/issues/18626 The fraction (float type, from 0 to 1) is used to limit memory of cashing allocator on GPU device . One can set it on any visible GPU. The allowed memory equals total memory * fraction. It will raise an OOM error when try to apply GPU memory more than the allowed value. This function is similar to Tensorflow's per_process_gpu_memory_fraction Note， this setting is just limit the cashing allocator in one process. If you are using multiprocess, you need to put this setting in to the subprocess to limit its GPU memory, because subprocess could have its own allocator. ## usage In some cases, one needs to split a GPU device as two parts. Can set limitation before GPU memory using. Eg. device: 0, each part takes half memory, the code as follows: ``` torch.cuda.set_per_process_memory_fraction(0.5, 0) ``` There is an example to show what it is. ```python import torch torch.cuda.set_per_process_memory_fraction(0.5, 0) torch.cuda.empty_cache() total_memory = torch.cuda.get_device_properties(0).total_memory # less than 0.5 will be ok: tmp_tensor = torch.empty(int(total_memory * 0.499), dtype=torch.int8, device='cuda') del tmp_tensordel tmp_tensor torch.cuda.empty_cache() # this allocation will raise a OOM: torch.empty(total_memory // 2, dtype=torch.int8, device='cuda') """ It raises an error as follows: RuntimeError: CUDA out of memory. Tried to allocate 5.59 GiB (GPU 0; 11.17 GiB total capacity; 0 bytes already allocated; 10.91 GiB free; 5.59 GiB allowed; 0 bytes reserved in total by PyTorch) """ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48172 Reviewed By: bdhirsh Differential Revision: D25275381 Pulled By: VitalyFedyunin fbshipit-source-id: d8e7af31902c2eb795d416b57011cc8a22891b8f	2020-12-03 11:45:56 -08:00
Heitor Schueroff	c134f32835	Implemented torch.inner (#46716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46716 Implemented torch.inner similar to [numpy.inner](https://numpy.org/doc/stable/reference/generated/numpy.inner.html). For now it's implemented as a composite op. TODO - [x] Add documentation Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24860351 Pulled By: heitorschueroff fbshipit-source-id: de5c82f285893495491fdba73b35634f4d00bac8	2020-12-03 11:37:55 -08:00
Hector Yuen	b726a1bbf8	quantize bias of the quantization parameters (#48749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48749 this change reverts D25179863 (`55e225a2dc`) because in 1.0.0.14 this behavior got reintroduced we believe this was already working pre 1.0.0.9, then intel regressed which is why we had to remove this quantization section, and in 1.0.0.14 they fixed it Test Plan: we tested ctr_instagram_5x which now passes with bitwise matching hl475 will test the top6 models and if they match, we will use this point to lock any further changes in the future Reviewed By: venkatacrc Differential Revision: D25283605 fbshipit-source-id: 33aa9af008c113d4d61e3461a44932b502bf42ea	2020-12-03 11:20:56 -08:00
jiej	dabc286ab3	Remove output used only by sizes (#448 ) (#47665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47665 Re-enabled the pass to remove outputs from fusion that is only used by aten::size; Added size computation for reduction op via new operator prim::ReductionSizes; Test Plan: Imported from OSS Reviewed By: navahgar, jamesr66a Differential Revision: D25254675 Pulled By: Krovatkin fbshipit-source-id: e9a057b0287ed0ac93b415647fd8e5e836ba9856	2020-12-03 11:14:30 -08:00
Kurt Mohler	2cb9204159	Add nondeterministic alert to index_copy, median CUDA and kthvalue CUDA (#46942 ) Summary: Also fixes issue where skipped tests did not properly restore deterministic flag. Fixes https://github.com/pytorch/pytorch/issues/46743 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46942 Reviewed By: heitorschueroff Differential Revision: D25298020 Pulled By: mruberry fbshipit-source-id: 14b1680e1fa536ec72018d0cdb0a3cf83b098767	2020-12-03 11:03:07 -08:00
James Reed	c2ad3c4e6a	Add scary comment in cpp_custom_type_hack.h (#48737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48737 Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D25280542 Pulled By: jamesr66a fbshipit-source-id: 67c3b8c82def848ba3059dd6f6a23f9c5e329c0f	2020-12-03 10:58:12 -08:00
Chen Lai	416dc68341	[Pytorch][Annotation] Update inlined callstack with module instance info (#47416 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47416 Test Plan: Imported from OSS Reviewed By: kimishpatel Differential Revision: D24752846 Pulled By: cccclai fbshipit-source-id: 94d3c18c56161d1de3a16bb7c93502fedf71644c	2020-12-03 10:44:46 -08:00
kshitij12345	5c9cef9a6c	[numpy] Add `torch.moveaxis` (#48581 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/38349 #36048 https://github.com/pytorch/pytorch/pull/41480#issuecomment-734398262 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48581 Reviewed By: bdhirsh Differential Revision: D25276307 Pulled By: mruberry fbshipit-source-id: 3e3e4df1343c5ce5b71457badc43f08c419ec5c3	2020-12-03 10:34:33 -08:00
David	befab0d9d4	[ONNX] Cast Gather index to Long if needed (#47653 ) Summary: Onnx op Gather index need be int32 or int64. However, we don't have this Cast in our converter. Therefore, it fails the following UT (for opset 11+) `seq_length.type().scalarType()` is None, so `_arange_cast_helper()` cannot treat it as all integral, then it will cast all to float. Then this float value will be used as Gather index, hence it throws error in ORT about float type index. The fix is that we need cast Gather index type to Long if it is not int/long. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47653 Reviewed By: heitorschueroff Differential Revision: D25298056 Pulled By: mruberry fbshipit-source-id: 05e3a70ccfd74612233c63ec5bb78e060b211909	2020-12-03 09:34:59 -08:00
Joe Zhu	92f376147c	Enable TCPStore on Windows (#47749 ) Summary: Enable TcpStore for DDP on Windows platform, in order to improve running DDP cross machines performance. Related RFC is https://github.com/pytorch/pytorch/issues/47659 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47749 Reviewed By: bdhirsh Differential Revision: D25220401 Pulled By: mrshenli fbshipit-source-id: da4b46b42296e666fa7d8ec8040093de7443a529	2020-12-03 08:32:01 -08:00
Edward Yang	93973ee699	Header cleanup (#48728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48728 Mostly removing unnecessary includes so that TensorIterator.h can be included from NativeFunctions.h without causing cycles. There some cases where I moved code around so that I didn't have to pull in other unnecessary stuff. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25278030 Pulled By: ezyang fbshipit-source-id: 5f6b95a6bc734e452e9bd7bee8fe5278f5e45be2	2020-12-03 08:26:20 -08:00
Edward Yang	f9a0abfc43	Fix code review from #48659 and #48116 (#48731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48731 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25278034 Pulled By: ezyang fbshipit-source-id: 73652311b48d8d80c06e9385b7ff18ef3a158ae8	2020-12-03 08:26:17 -08:00
Edward Yang	d6f9e8562b	Generalize some TensorIterator consumers to take TensorIteratorBase (#48727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48727 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25278033 Pulled By: ezyang fbshipit-source-id: 77f125ddb8446edf467a22130227d90583884bca	2020-12-03 08:24:48 -08:00
Kimish Patel	c01e5b8827	Simplify CachingAllocator. (#48752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48752 Reviewed By: linbinyu Differential Revision: D25285292 fbshipit-source-id: 17679ccda5279ab426e50e4266c50aac74f92a13	2020-12-03 07:30:01 -08:00
Rong Rong	ef50c94e7c	reenabling MPI test (#48725 ) Summary: fixes https://github.com/pytorch/pytorch/issues/47443. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48725 Reviewed By: mrshenli Differential Revision: D25278758 Pulled By: walterddr fbshipit-source-id: a02d0fef99a7941c8e98da16a45d840e12b8b0c3	2020-12-03 06:50:36 -08:00
Vasilis Vryniotis	0484b048d0	Replace constexpr with CONSTEXPR_EXCEPT_WIN_CUDA (#48717 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48716 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48717 Reviewed By: ezyang Differential Revision: D25277886 Pulled By: datumbox fbshipit-source-id: eb845db35d31b64d3e4401ed56843814192ce5a6	2020-12-03 05:36:38 -08:00
neerajprad	5489a98cd3	Add support for CorrCholeskyTransform (#48041 ) Summary: This adds a transform to convert a real vector of (D * (D-1))/2 dimension into the cholesky factor of a D x D correlation matrix. This follows the implementation in [NumPyro](https://github.com/pyro-ppl/numpyro/blob/master/numpyro/distributions/transforms.py) by fehiepsi. This is needed for the LKJDistribution which will be added in a subsequent PR. Also in line with the ongoing effort to refactor distributions test, this moves the transforms test into its own file that uses pytest with parametrized fixtures. For review: fehiepsi - could you help review the math? fritzo - do you have any suggestions for what to do about the event dimension (more details are in the comment below)? ezyang - could you review the changes in `run_test.py`? Instead of a separate `PYTEST_TESTS`, I have clubbed these tests in `USE_PYTEST_LIST` to avoid duplicate logic. The only difference is that we do not anymore check if pytest is not installed and exclude the tests in the list. I figured that if existing tests are already using pytest, this should not matter. TODOs (probably not all can be satisfied at the same time): - [x] Use operations that are JIT friendly, i.e. the transform works with different sized input under JIT. - [x] Resolve test failures - currently `arange(scalar_tensor)` fails on certain backends but this is needed for JIT. Maybe we should only support same sized tensor under JIT? - [x] Add tests to check that the transform gives correct gradients and is in agreement with the `log_det_jacobian`. - [x] Add `input_event_dim` and `output_event_dim` to `CorrCholeskyTransform`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48041 Reviewed By: zhangguanheng66 Differential Revision: D25262505 Pulled By: neerajprad fbshipit-source-id: 5a57e1c19d8230b53592437590b9169bdf2f71e9	2020-12-03 03:21:08 -08:00
Fritz Obermeyer	313e77fc06	Add broadcast_shapes() function and use it in MultivariateNormal (#43935 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43837 This adds a `torch.broadcast_shapes()` function similar to Pyro's [broadcast_shape()](`7c2c22c10d/pyro/distributions/util.py (L151)`) and JAX's [lax.broadcast_shapes()](https://jax.readthedocs.io/en/test-docs/_modules/jax/lax/lax.html). This helper is useful e.g. in multivariate distributions that are parameterized by multiple tensors and we want to `torch.broadcast_tensors()` but the parameter tensors have different "event shape" (e.g. mean vectors and covariance matrices). This helper is already heavily used in Pyro's distribution codebase, and we would like to start using it in `torch.distributions`. - [x] refactor `MultivariateNormal`'s expansion logic to use `torch.broadcast_shapes()` - [x] add unit tests for `torch.broadcast_shapes()` - [x] add docs cc neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/43935 Reviewed By: bdhirsh Differential Revision: D25275213 Pulled By: neerajprad fbshipit-source-id: 1011fdd597d0a7a4ef744ebc359bbb3c3be2aadc	2020-12-03 02:42:04 -08:00
Mike Ruberry	c7746adbc6	Revert D24874754: [pytorch][PR] Add test for empty tensors for batch matmuls Test Plan: revert-hammer Differential Revision: D24874754 (`5f105e2aa6`) Original commit changeset: 41ba837740ff fbshipit-source-id: d6cb31cbc4a2a386aab0a5f24710f218f9a561ca	2020-12-03 00:29:07 -08:00
Edvard Ghazaryan	79b9c03465	Optimize torch zeros (#45636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45636 After creating empty tensor 'memset' used to zero out items of tensor Test Plan: pytorch benchmark tool results: timer = benchmark_utils.Timer(stmt="torch.zeros((1024, 4096))") Before: 1007 us After: 841.26 us 1 measurement, 10000 runs , 1 thread timer = benchmark_utils.Timer(stmt="torch.zeros((128))") Before: 4 - 7.6 us After: 2.4 - 2.8 us 1 measurement, 10000 runs , 1 thread torch.int8 \| 1 \| 4096 \| 8192 \| 16384 \| 32768 \| 1 threads: -------------------------------------------------------------- (PR #45636) x.zero_() \| 500 \| 500 \| 600 \| 700 \| 2000 \| (Reference) x.zero_() \| 800 \| 1000 \| 1000 \| 2000 \| 2000 \| 2 threads: -------------------------------------------------------------- (PR #45636) x.zero_() \| 500 \| 500 \| 600 \| 700 \| 2000 \| (Reference) x.zero_() \| 800 \| 1000 \| 1000 \| 2000 \| 3000 \| 4 threads: -------------------------------------------------------------- (PR #45636) x.zero_() \| 500 \| 500 \| 600 \| 700 \| 2000 \| (Reference) x.zero_() \| 800 \| 1000 \| 1000 \| 2000 \| 3000 \| torch.int32 \| 1 \| 4096 \| 8192 \| 16384 \| 32768 \| 1 threads: -------------------------------------------------------------- (PR #45636) x.zero_() \| 400 \| 700 \| 2000 \| 2900 \| 5500 \| (Reference) x.zero_() \| 800 \| 2000 \| 3000 \| 4400 \| 7300 \| 2 threads: -------------------------------------------------------------- (PR #45636) x.zero_() \| 500 \| 700 \| 2000 \| 3000 \| 5600 \| (Reference) x.zero_() \| 900 \| 2000 \| 2000 \| 3600 \| 7200 \| 4 threads: -------------------------------------------------------------- (PR #45636) x.zero_() \| 400 \| 700 \| 2000 \| 3000 \| 5700 \| (Reference) x.zero_() \| 800 \| 2000 \| 3100 \| 4300 \| 9000 \| torch.float16 \| 1 \| 4096 \| 8192 \| 16384 \| 32768 \| 1 threads: -------------------------------------------------------------- (PR #45636) x.zero_() \| 500 \| 500 \| 700 \| 2000 \| 3000 \| (Reference) x.zero_() \| 800 \| 1000 \| 2000 \| 2000 \| 3300 \| 2 threads: -------------------------------------------------------------- (PR #45636) x.zero_() \| 500 \| 600 \| 700 \| 2000 \| 3000 \| (Reference) x.zero_() \| 800 \| 1000 \| 2000 \| 2000 \| 4300 \| 4 threads: -------------------------------------------------------------- (PR #45636) x.zero_() \| 500 \| 600 \| 700 \| 2000 \| 3300 \| (Reference) x.zero_() \| 900 \| 1000 \| 2000 \| 2000 \| 4400 \| torch.float32 \| 1 \| 4096 \| 8192 \| 16384 \| 32768 \| 1 threads: -------------------------------------------------------------- (PR #45636) x.zero_() \| 500 \| 700 \| 2000 \| 3200 \| 6100 \| (Reference) x.zero_() \| 800 \| 2000 \| 2000 \| 3500 \| 6100 \| 2 threads: -------------------------------------------------------------- (PR #45636) x.zero_() \| 500 \| 700 \| 2000 \| 3100 \| 5600 \| (Reference) x.zero_() \| 800 \| 2000 \| 2000 \| 3300 \| 7000 \| 4 threads: -------------------------------------------------------------- (PR #45636) x.zero_() \| 500 \| 700 \| 2000 \| 3000 \| 5600 \| (Reference) x.zero_() \| 900 \| 2000 \| 2000 \| 3600 \| 7500 \| Reviewed By: ngimel Differential Revision: D23925113 fbshipit-source-id: 04e97ff6d67c52a8e7a21449113e1a0a7443098f	2020-12-02 23:25:30 -08:00
Tongzhou Wang	1112773cf5	Fix unintended error when worker force kill happens #43455 (#43462 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43455 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43462 Reviewed By: bdhirsh Differential Revision: D25277759 Pulled By: VitalyFedyunin fbshipit-source-id: 0bb0d87374c0403853d71aac2c242374bfc7acf2	2020-12-02 21:42:16 -08:00
Lemo	85c1e8acdc	Replace kernel resource strings with real .cu source files (#48283 ) Summary: Convert the NVFUSER's runtime CUDA sources (under `.../jit/codegen/cuda/runtime`) to string literals, then include the headers with the generated literals. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48283 Reviewed By: mrshenli Differential Revision: D25163362 Pulled By: ngimel fbshipit-source-id: 4e6c181688ddea78ce6f3c754fee62fa6df16641	2020-12-02 21:22:29 -08:00
Xiang Gao	5f105e2aa6	Add test for empty tensors for batch matmuls (#47700 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/47700 Reviewed By: malfet Differential Revision: D24874754 Pulled By: ngimel fbshipit-source-id: 41ba837740ff7d5bd49d5f7277ad2064985aba2f	2020-12-02 20:45:59 -08:00
Jerry Zhang	ea573ea944	[qunat][graphmode][fx] Standalone module takes float as input and output (#48671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48671 Standalone module might be called separately so it's better to use float as interface. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D25256184 fbshipit-source-id: e209492a180ce1f81f31c8d6057956a74bad20b1	2020-12-02 20:34:25 -08:00
pbialecki	22c3ae8b57	Disable autocast cache for tensor views as fix for #48049 (#48696 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48049 Root cause of the issue explained [here](https://github.com/pytorch/pytorch/issues/48049#issuecomment-736701769). This PR implements albanD's suggestion to add the `!t.is_view()` check and disable autocast caching for views of tensors. The added test checks for an increase in memory usage by comparing the initially allocated memory with the memory after 3 iterations using a single `nn.Linear` layer in a `no_grad` and `autocast` context. After this PR the memory usage in the original issue doesn't grow anymore and yields: ```python autocast: True 0: 0MB (peak 1165MB) 1: 0MB (peak 1264MB) 2: 0MB (peak 1265MB) 3: 0MB (peak 1265MB) 4: 0MB (peak 1265MB) 5: 0MB (peak 1265MB) 6: 0MB (peak 1265MB) 7: 0MB (peak 1265MB) 8: 0MB (peak 1265MB) 9: 0MB (peak 1265MB) ``` CC ngimel mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/48696 Reviewed By: bdhirsh Differential Revision: D25276231 Pulled By: ngimel fbshipit-source-id: e2571e9f166c0a6f6f569b0c28e8b9ca34132743	2020-12-02 20:25:13 -08:00
Lillian Johnson	0e4f9a7872	Refactored OpInfo testing to support custom SampleInputs, added addmm to op_db to test (#48627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48627 Several changes to the OpInfo testing suite: - Changed test_ops.py to support sample.inputs that are longer than a single element - Changed OpInfo class to use custom sample_input generator functions, changed UnaryUfuncInfo to use new format - Added mvp addmm op to operator database to test out sample.inputs with a length greater than a single element Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25234178 Pulled By: Lilyjjo fbshipit-source-id: cca2c60af7e6deb849a1cc3770c04ed88865016c	2020-12-02 19:59:40 -08:00
Lillian Johnson	90faf43151	Support for OpInfo-based testing for operators in JIT (#47696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47696 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25212436 Pulled By: Lilyjjo fbshipit-source-id: 1fd2884d86b2afd6321ae1599d755b4beae4670a	2020-12-02 19:59:37 -08:00
Lillian Johnson	9c35a68094	Refactored assertAutodiff test to have better error message (#48567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48567 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D25212435 Pulled By: Lilyjjo fbshipit-source-id: eab3933bf4248dbfa20cd956d4a0106b10db5fc4	2020-12-02 19:59:33 -08:00
Lillian Johnson	c465602d78	Refactor existing JIT testing utils to enable new OpInfo test suite to reuse existing logic (#47695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47695 The method_tests from common_methods_invoations.py are being migrated into a new OpInfo class-based testing framework. The work in this commit pulls out the functions embedded in the old method_tests logic and places them in a location that both the old method_tests and OpInfo tests can use Specifically: created torch/testing/_internal/common_jit.py from functions and methods in torch/testing/_internal/jit_utils.py and test/test_jit.py. Also created new intermediate class JitCommonTestCase to house moved methods. Also slightly modified jit_metaprogramming_utils.py to work for OpInfo tests Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25212437 Pulled By: Lilyjjo fbshipit-source-id: 97bc52c95d776d567750e7478fac722da30f4985	2020-12-02 19:54:30 -08:00
Elias Ellison	1195403915	[NNC] Add cpu fusion gflag (#48682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48682 Reviewed By: Krovatkin, ngimel Differential Revision: D25260205 Pulled By: eellison fbshipit-source-id: df1655fd75f2a13bcf7c025b1f0a7becc2fd126a	2020-12-02 19:47:18 -08:00
Xiao Wang	0d39bd47cf	only enable cudnn persistent RNN when batchsize % 8 == 0 (#48070 ) Summary: On A100, cuDNN persistent RNN algo doesn't work quite well when batch size is not a multiple of 8, so we need to disable it. Related: https://github.com/pytorch/pytorch/pull/43165 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48070 Reviewed By: bdhirsh Differential Revision: D25283953 Pulled By: ngimel fbshipit-source-id: d7f33b1f43e2e3c46dc89ae046779175f6992569	2020-12-02 18:40:18 -08:00
Xiao Wang	a1daf1e678	Use fastAtomicAdd in GPU upsampling trilinear (#48675 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44206 This PR basically follows the diff in https://github.com/pytorch/pytorch/pull/21879 for upsampling bilinear. For the script provided in https://github.com/pytorch/pytorch/issues/44206 , on my 2070 super GPU, the total timing I got (time in second) \| \| non-amp \| amp \| \|---\|---\|---\| \| before PR \| 2.88 \| 9.6 \| \| after PR \| 1.5 \| 1.6 \| kernel time after PR \| \| time \| kernel \| \| --- \| --- \| --- \| \| non-amp \| 0.37 ms \| `void at::native::(anonymous namespace)::upsample_trilinear3d_backward_out_frame<float, float>(unsigned long, int, int, int, int, int, int, float, float, float, bool, float, float const) ` \| \| amp \| 0.61 ms \| `void at::native::(anonymous namespace)::upsample_trilinear3d_backward_out_frame<c10::Half, float>(unsigned long, int, int, int, int, int, int, float, float, float, bool, c10::Half, c10::Half const)` \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/48675 Reviewed By: bdhirsh Differential Revision: D25284853 Pulled By: ngimel fbshipit-source-id: 30f0d92e73050edd36013ce528d2e131effa3542	2020-12-02 18:25:28 -08:00
Jithun Nair	5f62308739	Hipify revamp [REDUX] (#48715 ) Summary: [Refiled version of earlier PR https://github.com/pytorch/pytorch/issues/45451] This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself. Correspondingly, changes are made to cpp_extension.py to match these improvements. The list of improvements to hipify is as follows: 1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda". 2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file. 3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use. 4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created. 5. Update header rewrite logic so that it accounts for the previous improvement. 6. Update header rewrite logic so it respects the rules for finding header files depending on whether "" or <> is used. 7. Return a dictionary of mappings of original file paths to hipified file paths from hipify function. 8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed. 9. Update cuda_to_hip_mappings.py to account for the ROCm component subdirectories inside /opt/rocm/include. This also results in cleanup of the Caffe2_HIP_INCLUDE path to remove unnecessary additions to the include path. The list of changes to cpp_extension.py is as follows: 1. Call hipify when building a CUDAExtension for ROCm. 2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list) 3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically cc jeffdaily sunway513 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/48715 Reviewed By: bdhirsh Differential Revision: D25272824 Pulled By: ezyang fbshipit-source-id: 8bba68b27e41ca742781e1c4d7b07c6f985f040e	2020-12-02 18:03:23 -08:00
Eli Uriegas	780f2b9a9b	torch: Stop using _nt_quote_args from distutils (#48618 ) Summary: They removed the specific function in Python 3.9 so we should just remake the function here and use our own instead of relying on hidden functions from the stdlib Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Fixes https://github.com/pytorch/pytorch/issues/48617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48618 Reviewed By: samestep Differential Revision: D25230281 Pulled By: seemethere fbshipit-source-id: 57216af40a4ae4dc8bafcf40d2eb3ba793b9b6e2	2020-12-02 16:53:25 -08:00
Ashkan Aliabadi	95311add49	Vulkan linear memory allocator. (#48569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48569 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D25277091 Pulled By: AshkanAliabadi fbshipit-source-id: 0530832ce61432237976088cb72a8b7c3aee949c	2020-12-02 16:18:22 -08:00
kshitij12345	90a3049a9a	[fix] repr(torch.device) (#48655 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48585 In the following commit `4c9eb57914`, type of `DeviceIndex` was changed from `uint16_t` to `uint8_t`. `uint8_t` is treated as ascii chars by std::cout and other stream operators. Hence the broken `repr` Stackoverflow Reference: https://stackoverflow.com/questions/19562103/uint8-t-cant-be-printed-with-cout Pull Request resolved: https://github.com/pytorch/pytorch/pull/48655 Reviewed By: bdhirsh Differential Revision: D25272289 Pulled By: ezyang fbshipit-source-id: a1549f5f8d417138cf38795e4c373e3a487d3691	2020-12-02 15:48:17 -08:00
Qifan Lu	b006c7a132	Add reparameterization support to `OneHotCategorical` (#46610 ) Summary: Add reparameterization support to the `OneHotCategorical` distribution. Samples are reparameterized based on the straight-through gradient estimator, which is proposed in the paper [Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation](https://arxiv.org/abs/1308.3432). Pull Request resolved: https://github.com/pytorch/pytorch/pull/46610 Reviewed By: neerajprad Differential Revision: D25272883 Pulled By: ezyang fbshipit-source-id: 8364408fe108a29620694caeac377a06f0dcdd84	2020-12-02 15:39:32 -08:00
Stephen Jia	de46369af7	[vulkan] Distribute weight prepacking along y dimension for conv2d (#48266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48266 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D25222752 Pulled By: SS-JIA fbshipit-source-id: 973e7956cd372c657dbbc6c7835e77b5f4e35f01	2020-12-02 14:54:36 -08:00
Guilherme Leobas	a4e13fcf3f	add type annotations to common_nn.py (#48190 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48189 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48190 Reviewed By: walterddr, zhangguanheng66 Differential Revision: D25245261 Pulled By: malfet fbshipit-source-id: 0eabaed54996be83ead0fd7668f4d2be20adfc17	2020-12-02 14:46:00 -08:00
Nikita Shulga	a49e2c5ce6	Remove "-b" option from `pip install` command (#48742 ) Summary: It has been deprecated for a while and was finally removed in 20.3 Followup after https://github.com/pytorch/pytorch/pull/48722 Fixes ONNX build failures after docker image update Pull Request resolved: https://github.com/pytorch/pytorch/pull/48742 Reviewed By: walterddr Differential Revision: D25282017 Pulled By: malfet fbshipit-source-id: 1dfa4eb57398f979107ca1544aafbc6d7b5e68a4	2020-12-02 14:28:18 -08:00
Meghan Lele	fc1153a8be	[JIT] Fix clang-tidy warnings in jit/runtime (#47992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47992 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25258645 Pulled By: SplitInfinity fbshipit-source-id: b3e4576400c101b247e80cb4044fc04471f39a47	2020-12-02 12:35:42 -08:00
Meghan Lele	a25d52f4e6	[JIT] Fix clang-tidy warnings in jit/serialization (#47991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47991 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25258639 Pulled By: SplitInfinity fbshipit-source-id: 2492c5e3bfbe87600512988b7f31f11b7b014f5a	2020-12-02 12:35:40 -08:00
Meghan Lele	34b2304e34	[JIT] Fix clang-tidy warnings in jit/testing (#47986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47986 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25258642 Pulled By: SplitInfinity fbshipit-source-id: 468b3751d6737c3262e72dfaa0cd7a1699e988a3	2020-12-02 12:35:38 -08:00
Meghan Lele	18eccfbe42	[JIT] Fix clang-tidy warnings in jit/python (#47985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47985 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25258644 Pulled By: SplitInfinity fbshipit-source-id: dfc15dc62c148f79f4e99fd058a6bf2d071ccbb5	2020-12-02 12:35:36 -08:00
Meghan Lele	8746e1a1cc	[JIT] Fix clang-tidy warnings in jit/passes (#47984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47984 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25258638 Pulled By: SplitInfinity fbshipit-source-id: 0ed5ef6984ba988a2c67407efcc77355ca25bbee	2020-12-02 12:35:34 -08:00
Meghan Lele	9b973eb275	[JIT] Fix clang-tidy warnings jit/ir (#47983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47983 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25258643 Pulled By: SplitInfinity fbshipit-source-id: b8e0ecfb3cc9ed928c564fb198b32c615e30eb5a	2020-12-02 12:35:31 -08:00
Meghan Lele	3039d24f4a	[JIT] Fix clang-tidy warnings for jit/frontend (#47982 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47982 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25258640 Pulled By: SplitInfinity fbshipit-source-id: e2cf27130311904aa5b18e3232349604d01701a0	2020-12-02 12:35:28 -08:00
Meghan Lele	4aa5d68874	[JIT] Fix clang-tidy warnings for jit/api (#47981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47981 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25258641 Pulled By: SplitInfinity fbshipit-source-id: 2cf2c1f5f02b7a64104d736f582ff6a15ba9b876	2020-12-02 12:30:39 -08:00
Zachary DeVito	83c76611d5	[package] Support glob matching (#48633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48633 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D25236016 Pulled By: zdevito fbshipit-source-id: 5eca7b7f344a6c2f6a047bfabdb4da8cdd0dc7ec	2020-12-02 12:24:46 -08:00
Zachary DeVito	88735f2cc9	[package] move importer logic into import pickler (#48632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48632 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D25236017 Pulled By: zdevito fbshipit-source-id: 57fd80d36ddf390ae35c58adf6dddbf15a1347c1	2020-12-02 12:24:44 -08:00
Zachary DeVito	ce3484595e	[packaging] missing quotation in graphviz printout (#48344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48344 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D25236018 Pulled By: zdevito fbshipit-source-id: cb69ec35b86228dfcd1f2823db2b2150a9d3e8b9	2020-12-02 12:23:09 -08:00
jjsjann123	15fc66d6c8	fix nvrtc PTX architecture cap for CUDA toolkit (#48455 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48200 CUDA 11.0 only supports < sm_80 (https://docs.nvidia.com/cuda/archive/11.0/nvrtc/#group__options) Note: NVRTC documentation is not a reliable source to query supported architecture. Rule of thumb is that nvrtc supports the same set of arch for nvcc, so the best way to query that is something like `nvcc -h \| grep -o "compute_[0-9][0-9]" \| sort \| uniq` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48455 Reviewed By: zhangguanheng66 Differential Revision: D25255529 Pulled By: ngimel fbshipit-source-id: e84cf51ab50519b4c97dad063cc43c9194942bb2	2020-12-02 11:50:22 -08:00
Jeff Daily	bdb68d9b0b	[reland] [ROCm] remove versions less than 3.8 (#48723 ) Summary: First attempt to land https://github.com/pytorch/pytorch/issues/48118 failed due to the problem fixed by https://github.com/pytorch/pytorch/issues/48722. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48723 Reviewed By: bdhirsh Differential Revision: D25274287 Pulled By: malfet fbshipit-source-id: 3ff0be3b522012c647448e5173b3ae38446d4120	2020-12-02 11:24:49 -08:00
Kimish Patel	4d26941a9b	Fix lite interpreter record function issue. (#47457 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47457 This fixes two issues. 1. lite interpreter record_function is intended to be used only for root op profiling. At the moment if RECORD_FUNCTION is enabled via Dispatcher then it logs not just root ops but all ops. 2. Because interpreter sets op index that later gets picked up elsewhere (decoupled design), op index that is set in lite interpreter ends up getting used by all the record function calls not just root op. Thus we dont really get correct per op profiling. This diff also fixes this issue. Reviewed By: ilia-cher Differential Revision: D24763689 fbshipit-source-id: 6c1f8bcaec9fb5ebacb2743a5dcf7090ceb176b9	2020-12-02 11:24:45 -08:00
Brian Hirsh	4fcdbb824b	Updating all call-sites of the legacy dispatcher registration API in fbcode to the new API. (#48178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48178 I migrated all call sites that used the legacy dispatcher registration API (RegisterOperators()) to use the new API (TORCH_LIBRARY...). I found all call-sites by running `fbgs RegisterOperators()`. This includes several places, including other OSS code (nestedtensor, torchtext, torchvision). A few things to call out: For simple ops that only had one registered kernel without a dispatch key, I replaced them with: ``` TORCH_LIBRARY_FRAGMENT(ns, m) { m.def("opName", fn_name); } ``` For ops that registered to a specific dispatch key / had multiple kernels registered, I registered the common kernel (math/cpu) directly inside a `TORCH_LIBRARY_FRAGMENT` block, and registered any additional kernels from other files (e.g. cuda) in a separate `TORCH_LIBRARY_IMPL` block. ``` // cpu file TORCH_LIBRARY_FRAGMENT(ns, m) { m.def("opName(schema_inputs) -> schema_outputs"); m.impl("opName", torch::dispatch(c10::DispatchKey::CPU, TORCH_FN(cpu_kernel))); } // cuda file TORCH_LIBRARY_IMPL(ns, CUDA, m) { m.impl("opName", torch::dispatch(c10::DispatchKey::CUDA, TORCH_FN(cuda_kernel))); } ``` Special cases: I found a few ops that used a (legacy) `CPUTensorId`/`CUDATensorId` dispatch key. Updated those to use CPU/CUDA- this seems safe because the keys are aliased to one another in `DispatchKey.h` There were a handful of ops that registered a functor (function class) to the legacy API. As far as I could tell we don't allow this case in the new API, mainly because you can accomplish the same thing more cleanly with lambdas. Rather than delete the class I wrote a wrapper function on top of the class, which I passed to the new API. There were a handful of ops that were registered only to a CUDA dispatch key. I put them inside a TORCH_LIBRARY_FRAGMENT block, and used a `def()` and `impl()` call like in case two above. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25056090 Pulled By: bdhirsh fbshipit-source-id: 8f868b45f545e5da2f21924046e786850eba70d9	2020-12-02 11:19:31 -08:00
Taylor Robie	022c929145	Revert "Revert D25199264: Enable callgrind collection for C++ snippets" (#48720 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48720 This reverts commit 6646ff122d3215b77909f669fc26cf6a927030db. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D25273994 Pulled By: malfet fbshipit-source-id: 61743176dc650136622e1b8f2384bbfbd7a46294	2020-12-02 11:10:11 -08:00
ashishfarmer	b2ec21a05a	[ROCm] Enable deterministic rocBLAS mode (#48654 ) Summary: The PR adds a feature to disable atomics in rocblas calls thereby making the output deterministic when it is expected in pyTorch. This mode of rocBLAS can be exercised using the global setting `torch.set_deterministic(True)` cc: ezyang jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48654 Reviewed By: bdhirsh Differential Revision: D25272296 Pulled By: ezyang fbshipit-source-id: 70400572b0ab37c6db52636584de0ae61bb5270a	2020-12-02 10:23:32 -08:00
Jerry Zhang	52f0af03f8	[reland][quant][fix] Add bias once in conv_fused (#48593 ) (#48661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48661 Previously _conv_forward will add self.bias to the result, so bias is added twice in qat ConvBn module this PR added a bias argument to _conv_forward and _conv_forward is called with zero bias in ConvBn module fixes: https://github.com/pytorch/pytorch/issues/48514 Test Plan: Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D25249175 fbshipit-source-id: 4536c7545d3dcd7e8ea254368ffb7cf15118d78c	2020-12-02 10:17:43 -08:00
Lei Mao	0db73460db	[quantization] fix run_arg tiny bug (#48537 ) Summary: This fix allows the calibration function to take in more than one positional argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48537 Reviewed By: zhangguanheng66 Differential Revision: D25255764 Pulled By: jerryzh168 fbshipit-source-id: 3ce20dbed95fd26664a186bd4a992ab406bba827	2020-12-02 10:07:33 -08:00
Ailing Zhang	f61de25dfa	Fix index_put doc. (#48673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48673 fixes #48642 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D25257078 Pulled By: ailzhang fbshipit-source-id: e5ebd6e07aafb262989fc12131546037fed8ebf6	2020-12-02 10:01:11 -08:00
Nikita Shulga	071344debe	Fix index parsing on Python-3.9 (#48676 ) Summary: In 3.9, `ast.Index` and `ast.ExtSlice` are deprecated, so: - `ast.parse('img[3]', model='eval')` evaluates to `Expression(body=Subscript(value=Name(id='img'), slice=Constant(value=3)))` by 3.9, but was previously evaluated to `Expression(body=Subscript(value=Name(id='img'), slice=Index(value=Num(n=3))))` - and `ast.parse('img[..., 10:20]', mode='eval')` is evaluated to ` Subscript(value=Name(id='img'),slice=Tuple(elts=[Constant(value=Ellipsis),Slice(lower=Constant(value=10), upper=Constant(value=20))])) ` , but was evaluated to ` Subscript(value=Name(id='img'), slice=ExtSlice(dims=[Index(value=Ellipsis()), Slice(lower=Num(n=10), upper=Num(n=20), step=None)])) ` Fixes https://github.com/pytorch/pytorch/issues/48674 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48676 Reviewed By: seemethere, gmagogsfm Differential Revision: D25261323 Pulled By: malfet fbshipit-source-id: cc818ecc596a062ed5f1a1d11d3fdf0f22bf7f4a	2020-12-02 09:56:20 -08:00
peter	3c5db30eaa	Update magma to 2.5.4 for Windows (#48656 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48527 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48656 Reviewed By: zhangguanheng66 Differential Revision: D25261601 Pulled By: malfet fbshipit-source-id: 4ba0036ca882bccd1990108d13596455d179d06e	2020-12-02 09:45:21 -08:00
Erjia Guan	c98c98d77d	Migrate `fmod` and `fmod_` from TH to ATen (CUDA) (#47323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47323 Fixes #24565 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D24763086 Pulled By: ejguan fbshipit-source-id: fa004baea19bbbdbeb44814903db29226805ef0e	2020-12-02 09:38:29 -08:00
Nikita Shulga	dc367e7903	Delete "-b" flag from pip install command (#48722 ) Summary: "--build <dir>" flag has been deprecated for a while and finally removed in pip-20.3 Before this PR is applied, every change to docker images would result in ONNX failures Pull Request resolved: https://github.com/pytorch/pytorch/pull/48722 Reviewed By: janeyx99 Differential Revision: D25274020 Pulled By: malfet fbshipit-source-id: 9e0f9daba58ceeec5474d649d1b22bfeca91d7bc	2020-12-02 09:16:20 -08:00
SsnL	4abca9067b	Fix dataloader hang with large sampler (#48669 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48669 Reviewed By: zhangguanheng66 Differential Revision: D25255763 Pulled By: VitalyFedyunin fbshipit-source-id: d06421f52bb1d00cdf8025f1a2ba0d1f9284731a	2020-12-02 09:07:30 -08:00
Nikita Vedeneev	3b25af02a4	matrix_exp + matrix_exp.backward complex support (#48363 ) Summary: As per title. Fixes https://github.com/pytorch/pytorch/issues/48299. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48363 Reviewed By: ejguan Differential Revision: D25224498 Pulled By: albanD fbshipit-source-id: 0c80ffb03ccfc46ab86398911edfba0b09049e55	2020-12-02 08:35:14 -08:00
Ivan Yashchuk	e41e780f7a	Added support for complex input for torch.lu_solve #2 (#48028 ) Summary: Relanding https://github.com/pytorch/pytorch/pull/46862 There was an issue with the simultaneous merge of two slightly conflicting PRs. This PR adds `torch.lu_solve` for complex inputs both on CPU and GPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48028 Reviewed By: linbinyu Differential Revision: D25003700 Pulled By: zou3519 fbshipit-source-id: 24cd1babe9ccdbaa4e2ed23f08a9153d40d0f0cd	2020-12-02 08:13:02 -08:00
Edward Yang	6d6e9abe49	Delete NativeFunctions.h include from Functions.h (#48687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48687 Only one header needed to be updated to now include NativeFunctions.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25261845 Pulled By: ezyang fbshipit-source-id: de778b5e014c812c52a307841827193ce823afcc	2020-12-02 07:57:25 -08:00
Edward Yang	e097f8898c	Move var and std overloads to Functions.cpp and remove native:: reference (#48683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48683 I want to delete NativeFunctions.h from Functions.h header. To do this I must remove all references to native:: However; I also must avoid trampling over iseeyuan's work of making ATen compilable without reference to ATen_cpu. In this particular case, I fix the Functions.h problem by moving it to a cpp file, and removing the native:: short-circuit (ostensibly there for performances). This also fixes a hypothetical correctness bug where these would not dispatch properly if the underlying functions no longer uniformly used a single native:: implementation. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25261843 Pulled By: ezyang fbshipit-source-id: 05ca6555fbf1062f9b22d868c8cb88fdf8e4c24b	2020-12-02 07:57:20 -08:00
Edward Yang	6ba7709415	Refactor TensorIterator to do allocations via MetaBase::set_output (#48659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48659 Detailed RFC at https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#handling-tensoriterator What this diff does: * Refactor allocation of outputs in TensorIterator into a call to a single function TensorIterator::set_output. This nicely centralizes restriding logic and mostly eliminates the need for a separate named tensor propagation pass. The one exception is for inplace operations (`add_`), where previously we never actually call `set_output` when we determine resizing is not necessary; there's an extra propagate names in `allocate_or_resize_outputs` to handle this case (I audited all other `set_output` sites and found that we always hit this path in that situation). Although hypothetically this could cause problems for structured kernels (which require a `set_output` call in all cases), this codepath is irrelevant for structured kernels as a TensorIterator will never be constructed with an explicit out argument (remember, structured kernels handle out/functional/inplace variants). There's also a tricky case in `compute_types`; check the comments there for more details. * Split TensorIterator into a TensorIteratorBase, which contains most of the logic but doesn't define `set_output`. A decent chunk of the diff is just the mechanical rename of TensorIterator to TensorIteratorBase. However, there are a few cases where we create fresh TensorIterator objects from another TensorIterator. In those cases, we always construct a fresh TensorIterator (rather than preserving the subclass of TensorIteratorBase that induced this construction). This makes sense, because a structured function class will contain metadata that isn't relevant for these downstream uses. This is done by intentionally permitting object slicing with the `TensorIterator(const TensorIteratorBase&)` constructor. * Introduce a new `MetaBase` class which contains the canonical virtual method definition for `set_output`. This will allow structured classes to make use of it directly without going through TensorIterator (not in this PR). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25261844 Pulled By: ezyang fbshipit-source-id: 34a9830cccbc07eaaf7c4f75114cd00953e3db7d	2020-12-02 07:57:15 -08:00
Edward Yang	742903c0df	Move argument grouping into FunctionSchema (#48195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48195 The general approach is to change Arguments, splitting `positional`, `kwarg_only` and `out`, into `pre_self_positional`, `self_arg`, `post_self_positional`, and `pre_tensor_options_kwarg_only`, `tensor_options` and `post_tensor_options_kwarg_only`. The splits are as you'd expect: we extract out the self argument and the tensor options arguments, and record the other arguments that came before and after. To do this, we move the logic in `group_arguments` to the parsing process. Some fuzz in the process: * I renamed `ThisArgument` to `SelfArgument`, since we don't actually use the terminology "this" outside of C++ (and the model is Python biased) * I kept the `group_arguments` function, which now just reads out the arguments from the structured model in the correct order. In the long term, we should get rid of this function entirely, but for now I kept it as is to reduce churn. * I decided to arbitrarily say that when self is missing, everything goes in "post-self", but when tensor options is missing, everything goes in "pre-tensor-options". This was based on where you typically find the argument in question: self is usually at front (so most args are after it), while tensor options are typically at the end (so most args go before it). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zhangguanheng66 Differential Revision: D25231166 Pulled By: ezyang fbshipit-source-id: 25d77ad8319c4ce0bba4ad82e451bf536ef823ad	2020-12-02 07:57:11 -08:00
Edward Yang	ba5686f8c5	Refactor argument fields in FunctionSchema to Arguments (#48182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48182 I'm planning to add a bunch more argument fields following https://github.com/pytorch/pytorch/pull/45890#discussion_r503646917 and it will be a lot more convenient if the arguments get to live in their own dedicated struct. Type checker will tell you if I've done it wrong. No change to output. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25057897 Pulled By: ezyang fbshipit-source-id: dd377181dad6ab0c894d19d83408b7812775a691	2020-12-02 07:57:06 -08:00
Edward Yang	b4f5efa7b2	Structured kernels generate Meta registrations (#48116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48116 If you port kernels to be structured, you get Meta kernels automatically generated for you. This is one payoff of structured kernels. Code generation was mercifully really simple, although at risk of "swiss cheese" syndrome: there's two new conditionals in the codegen to tweak behavior when generating for meta keys. It's not too bad right now but there's a risk of things getting out of hand. One way to rationalize the logic here would be to transmit "TensorMeta-ness" inside the TensorOptions (so tensor_from_meta can deal with it); then the "Meta" kernel magic would literally just be generating empty out_impls to call after all the scaffolding is done. But I didn't do this because it seemed like it would be more annoying short term. Also had to teach resize_ to work on meta tensors, since we use them to implement the out kernels. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer, ailzhang Differential Revision: D25056640 Pulled By: ezyang fbshipit-source-id: f8fcfa0dbb58a94d9b4196748f56e155f83b1521	2020-12-02 07:54:48 -08:00
Vishwak Srinivasan	47db191f0c	Implement Kumaraswamy Distribution (#48285 ) Summary: This PR implements the Kumaraswamy distribution. cc: fritzo alicanb sdaulton Pull Request resolved: https://github.com/pytorch/pytorch/pull/48285 Reviewed By: ejguan Differential Revision: D25221015 Pulled By: ezyang fbshipit-source-id: e621b25a9c75671bdfc94af145a4d9de2f07231e	2020-12-02 07:46:45 -08:00
Yi Wang	9c6979a266	[Gradient Compression] Error feedback for PowerSGD (still need to fix the key in error_dict) (#48670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48670 Support an optional error feedback for PowerSGD -- storing the difference (i.e., the local error caused by compression) between the input gradient (adjusted by the existing error) and the gradient after decompression, and reinserting it at the next iteration. Still need to add an index field to GradBucket as the key of error_dict. This is because the current key, input tensor of the bucket, can change across steps, as the buckets may be rebuilt in forward pass in order to save peak memory usage. This is halfway of error feedback. Plan to add the new index field in a separate PR. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117636492 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl Reviewed By: rohan-varma Differential Revision: D25240290 fbshipit-source-id: 5b6e11e711caccfb8984ac2767dd107dbf4c9b3b	2020-12-02 06:39:30 -08:00
Yan Li	463e5d2f12	Disable pruning on embedding look up operators when compressed_indices_mapping = {0} (#48672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48672 When a user specifies `pruned_weights = True`, compressed_indices_mapping = {0}, it means they have not pruned the weights. In this case, we need to go through non-sparse kernels for embedding bag lookup. Test Plan: buck test //caffe2/test:quantization https://www.internalfb.com/intern/testinfra/testconsole/testrun/3377699760676256/ Reviewed By: radkris-git Differential Revision: D25252904 fbshipit-source-id: 3a97dfd41ec8113d61135f02d9f534df3419e81f	2020-12-02 06:28:03 -08:00
Ivan Yashchuk	74330e0497	Added linalg.matrix_rank (#48206 ) Summary: This PR adds `torch.linalg.matrix_rank`. Changes compared to the original `torch.matrix_rank`: - input with the complex dtype is supported - batched input is supported - "symmetric" kwarg renamed to "hermitian" Should I update the documentation for `torch.matrix_rank`? For the input with no elements (for example 0×0 matrix), the current implementation is divergent from NumPy. NumPy stumbles on not defined max for such input, here I chose to return appropriately sized tensor of zeros. I think that's mathematically a correct thing to do. Ref https://github.com/pytorch/pytorch/issues/42666. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48206 Reviewed By: albanD Differential Revision: D25211965 Pulled By: mruberry fbshipit-source-id: ae87227150ab2cffa07f37b4a3ab228788701837	2020-12-02 03:29:25 -08:00
Mike Ruberry	6646ff122d	Revert D25199264: Enable callgrind collection for C++ snippets Test Plan: revert-hammer Differential Revision: D25199264 (`ff097299ae`) Original commit changeset: 529244054e4c fbshipit-source-id: 7429d7154f92e097089bf51dc81042b766de9cc3	2020-12-02 02:26:58 -08:00
Mike Ruberry	6299c870ee	Revert D25254920: [pytorch][PR] Add type annotations to torch.onnx.* modules Test Plan: revert-hammer Differential Revision: D25254920 (`40a2dd7e1e`) Original commit changeset: dc9dc036da43 fbshipit-source-id: c17cb282ebf90ecbae4023aa63ecbb443a87037d	2020-12-02 02:25:31 -08:00
kshitij12345	bcc85a363e	[numpy] `torch.sigmoid` : promote integer inputs to float (#47551 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47551 Reviewed By: ngimel Differential Revision: D25211953 Pulled By: mruberry fbshipit-source-id: 9174cda401aeba0fd585a4c9bda166dbcf64f42f	2020-12-01 23:28:57 -08:00
Nikita Shulga	44016e66c4	Revert D25097324: [pytorch][PR] [ONNX] Cast Gather index to Long if needed Test Plan: revert-hammer Differential Revision: D25097324 (`55fc0e9e53`) Original commit changeset: 42da1412d1b9 fbshipit-source-id: 491994a35a8aaf207dd5905191847171586aa4b7	2020-12-01 20:59:28 -08:00
Anshul Jain (B*8)	15abf18b67	[MaskR-CNN] Add int8 aabb bbox_transform op Summary: Adds support for Eigen Utils for custom type defs. Reviewed By: vkuzo Differential Revision: D23753697 fbshipit-source-id: de1cfb1c8176a08dd418364f2fce003344fe25bb	2020-12-01 20:51:50 -08:00
Guilherme Leobas	40a2dd7e1e	Add type annotations to torch.onnx.* modules (#45258 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45215 Still need to resolve a few mypy issues before a review. In special, there is an error which I don't know how to solve, see: ```python torch/onnx/utils.py:437: error: Name 'is_originally_training' is not defined [name-defined] if training is None or training == TrainingMode.EVAL or (training == TrainingMode.PRESERVE and not is_originally_training): ``` `is_originally_training` is used but never defined/imported on [`torch/onnx/utils.py`](`ab5cc97fb0/torch/onnx/utils.py (L437)`), Pull Request resolved: https://github.com/pytorch/pytorch/pull/45258 Reviewed By: zhangguanheng66 Differential Revision: D25254920 Pulled By: ezyang fbshipit-source-id: dc9dc036da43dd56b23bd6141e3ab92e1a16e3b8	2020-12-01 20:41:39 -08:00
Taylor Robie	ff097299ae	Enable callgrind collection for C++ snippets (#47865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47865 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25199264 Pulled By: robieta fbshipit-source-id: 529244054e4cc01e4703b7b9720833d991452943	2020-12-01 20:03:17 -08:00
Taylor Robie	0225d3dc9d	Add support for timing C++ snippets. (#47864 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47864 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25199262 Pulled By: robieta fbshipit-source-id: 1c2114628ed543fba4f403bf49c065f4d71388e2	2020-12-01 20:03:14 -08:00
Taylor Robie	17ea11259a	Rework compat bindings. (#47863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47863 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25199261 Pulled By: robieta fbshipit-source-id: 0a4a0409ddb75c1bf66cd31d67b55080227b1679	2020-12-01 20:03:11 -08:00
Taylor Robie	07f038aa9d	Add option for cpp_extensions to compile standalone executable (#47862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47862 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25199265 Pulled By: robieta fbshipit-source-id: eceb04dea60b82eb10434099639fa3afa61000ca	2020-12-01 20:03:08 -08:00
Taylor Robie	27905dfe9c	Expose CXX_FLAGS through __config__ (#47861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47861 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25199263 Pulled By: robieta fbshipit-source-id: 3cfdb0485d686a03a68dd0907d1733634857963f	2020-12-01 19:58:29 -08:00
Kurtis David	b824fc4de2	[pytorch] [PR] Rename cuda kernel checks to C10 (#48615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48615 Convert the macro from `TORCH_CUDA_KERNEL_LAUNCH_CHECK` to `C10_CUDA_KERNEL_LAUNCH_CHECK`, since it is now accessible through c10, not just torch. Test Plan: ``` buck build //caffe2/caffe2:caffe2_cu buck build //caffe2/aten:ATen-cu buck test //caffe2/test:kernel_launch_checks -- --print-passing-details ``` Reviewed By: jianyuh Differential Revision: D25228727 fbshipit-source-id: 9c65feb3d0ea3fbd31f1dcaecdb88ef0534f9121	2020-12-01 18:19:07 -08:00
Raghuraman Krishnamoorthi	25e367ec48	Revert D25246563: [pytorch][PR] [ROCm] remove builds for versions less than 3.8 Test Plan: revert-hammer Differential Revision: D25246563 (`c5f1117be2`) Original commit changeset: cd6142286813 fbshipit-source-id: fec302da9802736cb88ae25c3b58705d93cd9920	2020-12-01 17:50:13 -08:00
Rohan Varma	8b2ca28c1d	Add an option to run RPC tests with TCP init (#48248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48248 We have found a few bugs where initializing/de-initializing/re-initializing RPC, and using RPC along with process groups does not work as expected, usually under TCP/env initialization (which is used over `file` which is the init that we use in our test in multi-machine scenarios). Due to this motivation, this PR adds an environment variable `RPC_INIT_WITH_TCP` that allows us to run any RPC test with TCP initialization. To avoid port collisions, we use `common.find_free_port()`. ghstack-source-id: 117553039 Test Plan: CI Reviewed By: lw Differential Revision: D25085458 fbshipit-source-id: b5dbef2ff8ae88fa5bc1bb85a9e0fe077dbb552c	2020-12-01 17:42:32 -08:00
Mikhail Zolotukhin	d0e9523c4f	[TensorExpr] Add more operator tests. (#48677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48677 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D25258656 Pulled By: ZolotukhinM fbshipit-source-id: 173b87568f3f29f04d06b8621cbfbd53c38e4771	2020-12-01 17:34:09 -08:00
James Reed	f7986969af	[FX] Delete values after their last use (#48631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48631 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D25235981 Pulled By: jamesr66a fbshipit-source-id: f79d8873d3ad1ad90b5bd6367fc6119925f116e9	2020-12-01 17:20:49 -08:00
Richard Barnes	cff1ff7fb6	Suppress unsigned warning (#48272 ) Summary: Fixes a pointless comparison against zero warning that arises for some scalar types Pull Request resolved: https://github.com/pytorch/pytorch/pull/48272 Test Plan: Arises with ``` xbuck test mode/dev-nosan //caffe2/torch/fb/sparsenn:gpu_test -- test_prior_correction_calibration_prediction_binary ``` Fixes issues raised by https://github.com/pytorch/pytorch/issues/47876 - `std::is_signed` was a poor choice `std::is_unsigned` is a better choice. Surprisingly, the two are non-reciprocal. Reviewed By: zhangguanheng66 Differential Revision: D25256251 Pulled By: r-barnes fbshipit-source-id: 31665f5b0bc7eebee7456b85c37c5bce3f738bea	2020-12-01 17:09:09 -08:00
Jokeren	18f1cb14d5	Avoid resizing ones array when bias is not used (#48540 ) Summary: https://github.com/pytorch/pytorch/issues/48539 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48540 Reviewed By: zhangguanheng66 Differential Revision: D25255175 Pulled By: ezyang fbshipit-source-id: 755435a0adf9129a2edbffbad252e95a05e84a5f	2020-12-01 16:21:56 -08:00
Edward Yang	f5788898a9	TensorIteratorConfig is not used by reorder_dimensions (#48613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48613 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zhangguanheng66 Differential Revision: D25228679 Pulled By: ezyang fbshipit-source-id: 06d57e89e7c9cfa84e2b0886c6e1f3a9fa06978a	2020-12-01 16:13:08 -08:00
Edward Yang	75f38c2fa9	ret is never reassigned, return 0 directly (#48609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48609 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zhangguanheng66 Differential Revision: D25228678 Pulled By: ezyang fbshipit-source-id: b0b501866c9beb509b0c8c37d074e2d276085a56	2020-12-01 16:08:11 -08:00
FNSTER	30324d1e71	fix INTERNAL ASSERT FAILED for maximum (#48446 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48393 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48446 Reviewed By: zhangguanheng66 Differential Revision: D25240270 Pulled By: ngimel fbshipit-source-id: 57fc223b98f2b6f96f2f24e1d9041644e3187262	2020-12-01 15:29:48 -08:00
Shantanu	1c02be1b6a	Fix AttributeError in _get_device_attr (#48406 ) Summary: In PyTorch 1.5, when running `torch.cuda.reset_peak_memory_stats()` on a machine where `torch.cuda.is_available() is False`, I would get: ``` AssertionError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx ``` In PyTorch 1.7, the same gets me a worse error (and a user warning about missing NVIDIA drivers if you look for it): ``` ... File "/opt/conda/lib/python3.7/site-packages/torch/_utils.py", line 440, in _get_device_attr if device_type.lower() == "cuda": AttributeError: 'NoneType' object has no attribute 'lower' ``` The formerly raised AssertionError is depended on by libraries like pytorch_memlab: `ec9a72fc30/pytorch_memlab/line_profiler/line_profiler.py (L90)` It would be pretty gross if pytorch_memlab had to change that to catch an AttributeError. With this patch, we get a more sensible: ``` ... File "/opt/conda/lib/python3.7/site-packages/torch/cuda/memory.py", line 209, in reset_peak_memory_stats return torch._C._cuda_resetPeakMemoryStats(device) RuntimeError: invalid argument to reset_peak_memory_stats ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48406 Reviewed By: mrshenli Differential Revision: D25205630 Pulled By: ngimel fbshipit-source-id: 7c505a6500d730f3a2da348020e2a7a5e1306dcb	2020-12-01 14:55:18 -08:00
Rong Rong	4fe583e248	fix move default not compile correctly on cuda92 (#48257 ) Summary: explicitly define move constructor when using cuda version <= 9200 this fixes https://github.com/pytorch/csprng/issues/84 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48257 Reviewed By: malfet, mrshenli Differential Revision: D25123467 Pulled By: walterddr fbshipit-source-id: 72deff82c421fbaada6f38b2b6288f7f2f833062	2020-12-01 14:23:20 -08:00
Rong Rong	54022e4f9b	add new build settings to torch.__config__ (#48380 ) Summary: many newly added build settings are not saved in torch.__config__. adding them to the mix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48380 Reviewed By: samestep Differential Revision: D25161951 Pulled By: walterddr fbshipit-source-id: 1d3dee033c93f2d1a7e2a6bcaf88aedafeac8d31	2020-12-01 14:16:36 -08:00
Shen Li	d9c76360b2	Add cuda_ipc channel to TensorPipe (#46791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46791 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25237121 Pulled By: mrshenli fbshipit-source-id: f1428175b260fb23c4e0e6f92651426f38beaca9	2020-12-01 14:12:00 -08:00
Shen Li	e3713ad706	Let JIT unpickler to accept CUDA DataPtr from read_record_ (#46827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46827 TensorPipe RPC agent uses JIT pickler/unpickler to serialize/deserialize tensors. Instead of saving tensors to a file, the agent can directly invoke `cudaMemcpy` to copy tensors from the sender to the receiver before calling into JIT unpickler. As a result, before unpickling, the agent might already have allocated tensors and need to pass them to the JIT unpickler. Currently, this is done by providing a `read_record` lambda to unpickler for CPU tensors, but this is no longer sufficient for zero-copy CUDA tensors, as the unpickler always allocate the tensor on CPU. To address the above problem, this commit introduces a `use_storage_device` flag to unpickler ctor. When this flag is set, the unpickler will use the device from the `DataPtr` returned by the `read_record` lambda to override the pickled device information and therefore achieves zero-copy. Test Plan: Imported from OSS Reviewed By: wanchaol Differential Revision: D24533218 Pulled By: mrshenli fbshipit-source-id: 35acd33fcfb11b1c724f855048cfd7b2991f8903	2020-12-01 14:09:09 -08:00
Jeff Daily	5f181e2e6e	centos now installs cmake from conda (#48035 ) Summary: For the same reason that ubuntu builds need conda cmake to find mkl. CC jaglinux Pull Request resolved: https://github.com/pytorch/pytorch/pull/48035 Reviewed By: zhangguanheng66 Differential Revision: D25246723 Pulled By: malfet fbshipit-source-id: 9dac130eecd2f76764d8027b888404c87e7a954a	2020-12-01 13:07:20 -08:00
Scott Wolchok	3ceec73db9	[PyTorch] Lazily construct guts of RecordFunction (#47550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47550 I saw over 5% time spent in RecordFunction's ctor during one of our framework overhead benchmarks in `perf`. Inspecting assembly, it looks like we just create a lot of RecordFunctions and the constructor has to initialize a relatively large number of member variables. This diff takes advantage of the observation that RecordFunction does nothing most of the time by moving its state onto the heap and only allocating it if needed. It does add the requirement that profiling is actually active to use RecordFunction accessors, which I hope won't be a problem. ghstack-source-id: 117498489 Test Plan: Run framework overhead benchmarks. Savings ranging from 3% (InPlace_ndim_1) to 7.5% (empty_ndim_3) wall time. Reviewed By: ilia-cher Differential Revision: D24812213 fbshipit-source-id: 823a1e2ca573d9a8d7c5b7bb3972987faaacd11a	2020-12-01 13:07:17 -08:00
Scott Wolchok	d1df4038ff	[PyTorch] Make RecordFunctionCallback::should_run_ a function pointer (#48274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48274 The std::function-ness of it was used only for tests. (std::function is huge at 32 bytes, and not particularly efficient.) ghstack-source-id: 117498491 Test Plan: CI Reviewed By: dzhulgakov Differential Revision: D25102077 fbshipit-source-id: fd941ddf32235a9659a1a17609c27cc5cb446a54	2020-12-01 13:02:25 -08:00
Hector Yuen	9342b97363	change global_fp16_constants for test_fc_nnpi_fp16 (#48663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48663 enable the flag inside the test Test Plan: GLOW_NNPI=1 USE_INF_API=1 buck-out/opt/gen/caffe2/caffe2/contrib/fakelowp/test/test_fc_nnpi_fp16nnpi#binary.par buck test -c glow.nnpi_use_inf_api=true mode/opt //caffe2/caffe2/contrib/fakelowp/test:test_fc_nnpi_fp16nnpi Reviewed By: hl475 Differential Revision: D25249575 fbshipit-source-id: bb0a64859fa8e70eeea458376998142f37361525	2020-12-01 12:54:43 -08:00
Jeff Daily	c5f1117be2	[ROCm] remove builds for versions less than 3.8 (#48118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48118 Reviewed By: zhangguanheng66 Differential Revision: D25246563 Pulled By: malfet fbshipit-source-id: cd6142286813411d542926284fbf65206bc371ae	2020-12-01 12:08:23 -08:00
Elijah Rippeth	aaf6582d02	fix issue by which pytorch_jni is not bundled in libtorch (#46466 ) Summary: Fixes issue with pytorch_jni.dll not being installed correctly in libtorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46466 Reviewed By: zhangguanheng66 Differential Revision: D25247564 Pulled By: ezyang fbshipit-source-id: a509476ec4a0863fd67da3258e9300a9527d4f3b	2020-12-01 11:38:56 -08:00
Pritam Damania	7c73fda501	Remove `balance` and `devices` parameter from Pipe. (#48432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48432 As per our design in https://github.com/pytorch/pytorch/issues/44827, changign the API such that the user places modules on appropriate devices instead of having a `balance` and `devices` parameter that decides this. This design allows us to use RemoteModule in the future. ghstack-source-id: 117491992 ghstack-source-id: 117491992 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D25172970 fbshipit-source-id: 61ea37720b92021596f69788e45265ac9cd41746	2020-12-01 11:21:59 -08:00
Peter Bell	74d6a6106c	Fuzzing benchmark for FFT operators (#47872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47872 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25237499 Pulled By: robieta fbshipit-source-id: 44eb68c5989508f072b75526ae5dcef30898e4bd	2020-12-01 10:58:53 -08:00
Peter Bell	df6fc3d83a	Fix complex tensors and missing data in benchmark utility (#47871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47871 - FuzzedTensor now supports complex data types - Compare no longer calls min on empty ranges when a table has empty cells * #47871 Fix complex tensors and missing data in benchmark utility Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25237500 Pulled By: robieta fbshipit-source-id: 76248647313d4d81590a68297a5f6768fa7d3d82	2020-12-01 10:54:19 -08:00
Vasiliy Kuznetsov	f80aaadbae	fx quantization: add option to leave graph inputs and/or outputs quantized (#48624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48624 Before this PR, there was an assumption that all graph inputs and outputs are in floating point, with some exceptions for `standalone_module`. This PR adds an option to specify either inputs or outputs as being quantized. This is useful for incremental migrations of models using Eager mode. Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25231833 fbshipit-source-id: 9f9da17be72b614c4c334f5c588458b3e726ed17	2020-12-01 10:39:51 -08:00
Mike Ruberry	98fddc1f06	Revert D25172740: [pytorch][PR] [CUDA graphs] Make CUDAGeneratorImpl capturable Test Plan: revert-hammer Differential Revision: D25172740 (`2200e72293`) Original commit changeset: c4568605755c fbshipit-source-id: 3ebc845856096f5707897bfabaf718c8e13e86f0	2020-12-01 09:10:14 -08:00
Kurtis David	0066b941f1	Add CUDA kernel checks to fbcode/caffe2/caffe2/sgd (#48347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48347 Add a safety check `TORCH_CUDA_KERNEL_LAUNCH_CHECK()` after each kernel launch. This only includes changes to `//caffe2/caffe2/sgd`. Specifically these files did not have any kernel launch checks before. Files changed were determined by running `python3 caffe2/torch/testing/check_kernel_launches.py`. Other directories will be done in seperate diffs. Test Plan: Check build status ``` buck build //caffe2/caffe2:caffe2_cu ``` - https://www.internalfb.com/intern/buck/build/5cb3185e-b481-4f83-9c3a-260827dde5ef Running ``` python3 caffe2/torch/testing/check_kernel_launches.py ``` results in no files within this subdirectory (as of now) - P150434759 Reviewed By: jianyuh Differential Revision: D24868557 fbshipit-source-id: 1ad02260bcc9d13710bfd577c8d93be52595845c	2020-12-01 08:55:36 -08:00
Xianguang Zhou	736e8965e5	Change the type hints of "pooling.py". (#48412 ) Summary: Change the type hints of "AvgPool2d" and "AvgPool3d". Pull Request resolved: https://github.com/pytorch/pytorch/pull/48412 Reviewed By: ejguan Differential Revision: D25221087 Pulled By: ezyang fbshipit-source-id: 5fba2a8051a7b3d5508e97763bacfd2140a777bf	2020-12-01 07:27:37 -08:00
Nikita Shulga	c81f2d9a2f	Revert D25222215: [quant][fix] Add bias once in conv_fused Test Plan: revert-hammer Differential Revision: D25222215 (`d2e429864c`) Original commit changeset: 90c0ab79835b fbshipit-source-id: 5c8eee107309cfa99cefdf439a62de0b388f9cfb	2020-12-01 07:17:45 -08:00
CedricPicron	dc7ab46dcc	Fix incorrect warnings in ParameterList/Dict (#48315 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46983. The solution is based of two components: 1. The introduction of the `_initialized` attribute. This will be used during ParameterList/Dict creation methods `__init__` (introduced in https://github.com/pytorch/pytorch/issues/47772) and `__setstate__` to not trigger warnings when setting general `Module` attributes. 2. The introduction of the `not hasattr(self, key)` check to avoid triggering warnings when changing general `Module` attributes such as `.training` during the `train()` and `eval()` methods. Tests related to the fix are added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48315 Reviewed By: mrshenli Differential Revision: D25130217 Pulled By: albanD fbshipit-source-id: 79e2abf1eab616f5de74f75f370c2fe149bed4cb	2020-12-01 07:08:33 -08:00
Akifumi Imanishi	492683bd42	Add LazyConvXd and LazyConvTransposeXd (#47350 ) Summary: This PR implements LazyConvXd and LazyConvTransposeXd based on https://github.com/pytorch/pytorch/issues/44538. (cc. emcastillo and albanD) Pull Request resolved: https://github.com/pytorch/pytorch/pull/47350 Reviewed By: ejguan Differential Revision: D25220645 Pulled By: albanD fbshipit-source-id: b5e2e866d53761a3415fd762d05a81920f8b16c3	2020-12-01 07:00:28 -08:00
Ivan Kobzarev	ccd20e995f	[vulkan] convolution old prepacking via cpu-shader (#48330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48330 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D25131500 Pulled By: IvanKobzarev fbshipit-source-id: b11edb94a78f5d6283c7be1887d72a4ca624a9ab	2020-11-30 22:52:43 -08:00
David	55fc0e9e53	[ONNX] Cast Gather index to Long if needed (#47653 ) Summary: Onnx op Gather index need be int32 or int64. However, we don't have this Cast in our converter. Therefore, it fails the following UT (for opset 11+) `seq_length.type().scalarType()` is None, so `_arange_cast_helper()` cannot treat it as all integral, then it will cast all to float. Then this float value will be used as Gather index, hence it throws error in ORT about float type index. The fix is that we need cast Gather index type to Long if it is not int/long. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47653 Reviewed By: ejguan Differential Revision: D25097324 Pulled By: bzinodev fbshipit-source-id: 42da1412d1b972d4d82c17fb525879c2575820c9	2020-11-30 21:36:17 -08:00
BowenBao	02e58aabe1	[ONNX] Support nonzero(*, as_tuple=True) export (#47421 ) Summary: Support exporting with `as_tuple = true` Example: `torch.nonzero(x, as_tuple=True)` This is the same as `torch.unbind(torch.nonzero(x), 1)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47421 Reviewed By: malfet Differential Revision: D24870760 Pulled By: bzinodev fbshipit-source-id: 06ca1e7ecf95fbf7c28eebce800df958c83264c8	2020-11-30 21:27:43 -08:00
James Donald	acd4fca376	[caffe2][torch] Clean up unused variable 'device' (#48600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48600 Fix this warning that pops up with clang and `-Wunused-variable`: ``` caffe2\torch\csrc\jit\frontend\schema_type_parser.cpp(153,30): warning: unused variable 'device' [-Wunused-variable] ``` Test Plan: Locally built & continuous integration Reviewed By: eellison Differential Revision: D25194298 fbshipit-source-id: 3af2895fcc96807a9df0ced60ec0af6b14dc0817	2020-11-30 20:56:35 -08:00
Peter Bell	9500e8a081	Testing: Improve interaction between dtypes and ops decorators (#48426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48426 Tests are run on the intersection of the dtypes requested and the types that are supported by the operator (or are _not_ if `unsupported_dtypes_only` is used). Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25205835 Pulled By: mruberry fbshipit-source-id: 2c6318a1a3dc9836af7361f32caf9df28d8a792b	2020-11-30 20:46:22 -08:00
Jerry Zhang	d2e429864c	[quant][fix] Add bias once in conv_fused (#48593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48593 Previously _conv_forward will add self.bias to the result, so bias is added twice in qat ConvBn module this PR added a bias argument to _conv_forward and _conv_forward is called with zero bias in ConvBn module fixes: https://github.com/pytorch/pytorch/issues/48514 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D25222215 fbshipit-source-id: 90c0ab79835b6d09622dcfec9de4139881a60746	2020-11-30 19:26:17 -08:00
Wang Xu	7a59a1b574	add aot_based_partition (#48336 ) Summary: This PR add supports on AOT based partition. Given each node and its corresponding partition id, generate the partition, submodules and dag Pull Request resolved: https://github.com/pytorch/pytorch/pull/48336 Reviewed By: gcatron Differential Revision: D25226899 Pulled By: scottxu0730 fbshipit-source-id: 8afab234afae67c6fd48e958a42b614f730a61d9	2020-11-30 19:11:02 -08:00
Yi Wang	ddb6594971	[Gradient Compression] Add a random generator to PowerSGD state for initializing low-rank matrix Q (#48507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48507 Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step. Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps. 'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675 Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117483639 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl_grad_is_view Also checked the initial Qs and input random seeds of torch.manual_seed() of different ranks for a few steps in real runs. Example logs: Exactly same random seed of different ranks at the same step on two nodes, and the random seed varies at each step. {F346971916} Reviewed By: rohan-varma Differential Revision: D25191589 fbshipit-source-id: f7f17df3ad2075ecae1a2a56ca082160f7c5fcfc	2020-11-30 18:46:45 -08:00
Scott Wolchok	61936cb11e	[PyTorch][JIT] Parameter passing & std::map API usage pass on ProfilingRecord::instrumentGraph (#47960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47960 Audited this code path after seeing it in profiling. Found some issues: - Multiple lookups in std::map can be avoided by using `std::map::insert`. It's really a find-or-insert, which is what this code wanted anyway. - Some unnecessary copying of arguments that could be moved from - We can move from shared_ptrs that are going out of scope anyway ghstack-source-id: 116914902 Test Plan: Please advise, as I'm new to this code. Does it have test coverage? Is there a way I can easily measure the performance impact of this change? Reviewed By: Krovatkin Differential Revision: D24971041 fbshipit-source-id: 881a45f8958854be0e95fba659e0b64bd341501e	2020-11-30 18:39:28 -08:00
Tristan Rice	dc7d8a889e	caffe2: refactor context to allow being typed (#48340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48340 This changes the context managed classes from using a decorator to define them to using inheritance. Inheritance allows the python static type checking to work correctly. ``` context.define_context() class Bar(object): ... context.define_context(allow_default=True) class Foo(object): ... ``` becomes ``` class Foo(context.Managed): ... class Bar(context.DefaultManaged): ... ``` Behavior differences: * arg_name has been removed since it's not used anywhere * classes need to call `super()` in `__enter__/__exit__` methods if they override (none do) This also defines a context.pyi file to add types for python3. python2 support should not be affected Test Plan: ci buck test //caffe2/caffe2/python:context_test //caffe2/caffe2/python:checkpoint_test Reviewed By: dongyuzheng Differential Revision: D25133469 fbshipit-source-id: 16368bf723eeb6ce3308d6827f5ac5e955b4e29a	2020-11-30 18:31:14 -08:00
Bert Maher	adb4fd3f2f	[te] Fix comparison ops on booleans (#48384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48384 As title Test Plan: buck test //caffe2/test:jit -- test_binary_ops Reviewed By: asuhan Differential Revision: D25115773 fbshipit-source-id: c5f8ee21692bcf0d78f099789c0fc7c457a1e4a2	2020-11-30 18:21:35 -08:00
Mikhail Zolotukhin	d9f5ac0805	[TensorExpr] Add a envvar to disable LLVM backend and use IR Eval instead. (#48355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48355 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D25139668 Pulled By: ZolotukhinM fbshipit-source-id: 34dfcceadb24446d103710f00526693a53f3750f	2020-11-30 18:16:28 -08:00
Mikhail Zolotukhin	a6f0c3c4f0	[TensorExpr] IREval: fix div for Half dtype. (#48354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48354 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D25139669 Pulled By: ZolotukhinM fbshipit-source-id: a7eccad883d8b175d7d73db48bd366382eabea53	2020-11-30 18:14:08 -08:00
Bert Maher	671a959233	Disable fast sigmoid since it causes divergence (#48623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48623 The error introduced by fast sigmoid/tanh seems to accumulate in a way that's detectable in a macro-benchmark (unfortunately I don't have the model demonstrating it in a format that can be publically committed). ghstack-source-id: 117496822 Test Plan: Tbh not sure how to test this since I'm not super well-versed in numerics. I can verify it fixes a model divergence locally. Reviewed By: navahgar Differential Revision: D25230376 fbshipit-source-id: c404a0439f190359b72ad65b3f42369c53cae340	2020-11-30 17:46:07 -08:00
Frank Seide	29f0e1e2ce	Fused8BitRowwiseQuantizedToFloat operator support (#48407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48407 T79817692: Fused8BitRowwiseQuantizedToFloat operator support for c2_pt_converter. Also refactored some repeated code from the existing test functions. (Initial commit only has refactoring.) Test Plan: buck test //caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test Reviewed By: bugra Differential Revision: D25069936 fbshipit-source-id: 72f6a845a1b4639b9542c6b230c8cd74b06bc5a0	2020-11-30 17:11:39 -08:00
Basil Hosmer	c3bb3827f9	remove unused params in scalar_tensor_static (#48550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48550 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D25229765 Pulled By: bhosmer fbshipit-source-id: 220b0b4a85a3d83d947960851a7369f654b8b455	2020-11-30 17:01:22 -08:00
Stephen Jia	ea0ffbb6e6	[vulkan] Fix Addmm prepacking to persist after GPU flush (#48313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48313 Previous prepacking method would cause stored data to be lost whenever data is flushed during a `.cpu()` call. I updated the weight/bias prepacking to use the same method from `conv2d` in order to avoid this. Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D25125405 Pulled By: SS-JIA fbshipit-source-id: 2533994d522d90824fc25ee78c54016cfd0f3253	2020-11-30 16:09:46 -08:00
peter	5b6b1495b9	Update Windows CI to CUDA 11.1, cuDNN 8.0.5 (#48469 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/48469 Reviewed By: walterddr Differential Revision: D25187095 Pulled By: malfet fbshipit-source-id: 47e29a172ebe71e60447a5483e63ac59818a0474	2020-11-30 15:48:30 -08:00
Daily, Jeff	7f869dca70	[ROCm] update debug flags (#46717 ) Summary: Improves support for rocgdb when setting DEBUG=1 and building for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46717 Reviewed By: mrshenli Differential Revision: D25171544 Pulled By: malfet fbshipit-source-id: b4699ba2277dcb89f07efb86f7153fae82a80dc3	2020-11-30 15:27:30 -08:00
ProGamerGov	d6ddd78eb0	Fix multiple spelling and grammar mistakes (#48592 ) Summary: I found a number of spelling & grammatical mistakes in the repository. Previously I had these fixes submitted individually, but I saw that a single word change was apparently too small for a PR to be merged. Hopefully this new PR has a sufficient number of changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48592 Reviewed By: ejguan Differential Revision: D25224216 Pulled By: mrshenli fbshipit-source-id: 2af3db2aee486563efd0dffc4e8f777306a73e44	2020-11-30 15:18:44 -08:00
Michael Carilli	2200e72293	[CUDA graphs] Make CUDAGeneratorImpl capturable (#47989 ) Summary: Part 1 of https://github.com/pytorch/pytorch/pull/46148 refactor: CUDAGeneratorImpl and eager mode kernel diffs. See [Note [CUDA Graph-safe RNG states]](https://github.com/pytorch/pytorch/compare/master...mcarilli:cudagraphs_generator_diffs?expand=1#diff-0b7fb41bc872bb4d1b6480d4fbbb70e6871c16b8c439a97d9d8ecc6c8b893bc2R13) for the strategy, based on https://github.com/pytorch/pytorch/pull/46148#issuecomment-724414794. By itself, this PR is a "no-op": it's unusable without cooperation from CUDA graph capture and replay bindings. Part 2 will add those bindings and tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47989 Reviewed By: mrshenli Differential Revision: D25172740 Pulled By: ngimel fbshipit-source-id: c4568605755c7b2d28d09d0fbb96837b494e6443	2020-11-30 15:11:23 -08:00
Hao Lu	4976208e73	[caffe2] Register BlackBoxPredictor AllocationArenaPool as CPUCachingAllocator (#48161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48161 - Register BlackBoxPredictor AllocationArenaPool as CPUCachingAllocator - Use the AllocationArenaPool in both BlackBoxPredictor and StaticRuntime Test Plan: ``` buck run //caffe2/caffe2/fb/predictor:black_box_predictor_test buck run //caffe2/caffe2/fb/predictor:pytorch_predictor_test ``` AF canary: https://www.internalfb.com/intern/ads/canary/431021257540238874/ Reviewed By: dzhulgakov Differential Revision: D24977611 fbshipit-source-id: 33ba596b43c1e558c3ab237a0feeae93565b2d35	2020-11-30 15:03:34 -08:00
Xiaodong Wang	d386d3323f	[dper] supress excessive msg (#48404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48404 On bento this is printing a lot of msgs like (see N408483 if you're an internal user) ``` W1123 120952.322 schema.py:811] Scalar should be considered immutable. Only call Scalar.set() on newly created Scalar with unsafe=True. This will become an error soon. ``` And it's ignoring the log level I set at global level. Removing this line unless this is super important. Test Plan: build a local dper package and verify Differential Revision: D25163808 fbshipit-source-id: 338d01c82b4e67269328bbeafc088987c4cbac75	2020-11-30 14:55:52 -08:00
Nikita Shulga	d74f2d28a1	Fix bazel build after sleef update (#48614 ) Summary: In https://github.com/shibatch/sleef/pull/361 `src/libm/sleeflibm_header.h.org` was renamed to `src/libm/sleeflibm_header.h.org.in` Updating bazel build rule for sleef accordingly Pull Request resolved: https://github.com/pytorch/pytorch/pull/48614 Reviewed By: ezyang Differential Revision: D25228160 Pulled By: malfet fbshipit-source-id: dc0e56c2eb8990a19b5de14318e41ea9661c63f8	2020-11-30 14:47:11 -08:00
Ashkan Aliabadi	66440d1b29	Tweak Vulkan memory use. (#47728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47728 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D25032740 Pulled By: AshkanAliabadi fbshipit-source-id: 7eb72538dc1aa3feb4e2f8c4ff9c675eb8e97057	2020-11-30 14:28:09 -08:00
Xiong Wei	8f8738ce5c	[vmap] implement batching rules for clamp, clamp_min and clamp_max (#48449 ) Summary: Fix https://github.com/pytorch/pytorch/issues/47754 - This PR implements batching rules for `clamp`, `clamp_min` and `clamp_max` operators. - Testcases are added to `test/test_vmap.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48449 Reviewed By: ejguan Differential Revision: D25219360 Pulled By: zou3519 fbshipit-source-id: 0b7e1b00f5553b4578f15a6cc440640e506b4918	2020-11-30 14:22:43 -08:00
Brian Hirsh	b5149513ec	migrate export_caffe2_op_to_c10.h macros to the new dispatcher registration API, update code_analyzer regex (#48308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48308 The original regex that I added didn't correctly match namespaces that started with an underscore (e.g. `_test`), which caused a master-only test to fail. The only change from the previous commit is that I updated the regex like so: before: `^.TORCH_LIBRARY_IMPL_init_([^_]+)_([^_]+)_[0-9]+(\(.)?$` after: `^.TORCH_LIBRARY_IMPL_init_([_][^_]+)_([^_]+)_[0-9]+(\(.)?$` I added in a `[_]` to the beginning of the namespace capture. I did the same for the `_FRAGMENT` regex. Verified that running `ANALYZE_TEST=1 tools/code_analyzer/build.sh` (as the master-only test does) produces no diff in the output. Fixing regex pattern to allow for underscores at the beginning of the namespace This reverts commit 3c936ecd3c68f395dad01f42935f20ed8068da02. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25123295 Pulled By: bdhirsh fbshipit-source-id: 54bd1e3f0c8e28145e736142ad62a18806bb9672	2020-11-30 13:05:33 -08:00
Nikita Shulga	032e4f81a8	Fix test comparison ops check for scalar overflow (#48597 ) Summary: Test should verify, that all listed conditions throw, not just the first one Refactor duplicated constants Use `self.assertTrue()` instead of suppressing flake8 `B015: Pointless Comparison` warning Pull Request resolved: https://github.com/pytorch/pytorch/pull/48597 Reviewed By: mruberry Differential Revision: D25222734 Pulled By: malfet fbshipit-source-id: 7854f755a84f23a1a52dc74402582e34d69ff984	2020-11-30 12:39:28 -08:00
Siyeong	b84d9b48d8	Fix the typo errror in the line #953 of the docs of 'torch/nn/modules/activation.py' (#48577 ) Summary: The title says it all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48577 Reviewed By: ejguan Differential Revision: D25224315 Pulled By: mrshenli fbshipit-source-id: 8e34e9ec29b28768834972bfcdb443efd184f9ca	2020-11-30 12:02:40 -08:00
Michael Suo	eba96b91cc	Back out "[pytorch][PR] [JIT] Add `__prepare_scriptable__` duck typing to allow replacing nn.modules with scriptable preparations" Summary: Original commit changeset: 4ddff2d35312 Test Plan: sandcastle Reviewed By: zhangguanheng66 Differential Revision: D25061862 fbshipit-source-id: 1d0cc5a34b8131ac88304f24394b677131d28e39	2020-11-30 11:49:36 -08:00
AishwaryaKalloli	fe80638212	added docs to nn.rst (#48374 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48198 Added following functions to a subsection "Global Hooks For Module" in containers sections of nn.rst. - register_module_forward_pre_hook - register_module_forward_hook - register_module_backward_hook screenshots: ![image](https://user-images.githubusercontent.com/30429206/99903019-9ee7f000-2ce7-11eb-95dd-1092d5e57ce7.png) ![image](https://user-images.githubusercontent.com/30429206/99903027-ac04df00-2ce7-11eb-9983-42ce67de75ba.png) ![image](https://user-images.githubusercontent.com/30429206/99903039-c3dc6300-2ce7-11eb-81c4-a0240067fe23.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48374 Reviewed By: ejguan Differential Revision: D25219507 Pulled By: albanD fbshipit-source-id: 0dd9d65f562c001c993ebcb51465e8ddcf631231	2020-11-30 11:34:49 -08:00
Hameer Abbasi	4e15877d5c	Add documentation for torch.overrides submodule. (#48170 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48087 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48170 Reviewed By: ejguan Differential Revision: D25220942 Pulled By: ezyang fbshipit-source-id: a2b7f7b565f5e77173d8ce2fe9676a8131f929b6	2020-11-30 11:25:31 -08:00
Joe Zhu	42e7cdc50a	Improve libuv detection on Windows (#48571 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48571 Reviewed By: ejguan Differential Revision: D25220903 Pulled By: mrshenli fbshipit-source-id: a485568621c4e289c5439474c2651186bc63c2f0	2020-11-30 11:16:13 -08:00
Eli Uriegas	0213a3858a	.circleci: Add python 3.9 builds for windows (#48138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48138 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D25039140 Pulled By: seemethere fbshipit-source-id: d39885562bdd8078a9735f1bc20f9d81cb024edc	2020-11-30 10:44:25 -08:00
Rong Rong	af520d9d04	[cmake] clean up blas discovery (#47940 ) Summary: remove useless variable changes in blas discovery Pull Request resolved: https://github.com/pytorch/pytorch/pull/47940 Reviewed By: malfet Differential Revision: D25122228 Pulled By: walterddr fbshipit-source-id: 12bc3ce9e4f89a72b6a92c10d14024e5941f4b96	2020-11-30 10:29:50 -08:00
Edward Yang	0b66cdadb6	Pin the rest of flake8 dependencies. (#48590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48590 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D25220976 Pulled By: ezyang fbshipit-source-id: 15817f8c5db7fea6efe9b70a1d1e46b8ca36d12b	2020-11-30 10:00:17 -08:00
Lillian Johnson	e41d8b3d3d	[JIT] adding missing test cases for test_isinstance.py (#47396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47396 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D24739765 Pulled By: Lilyjjo fbshipit-source-id: 881521175c9a4cdcda4555431fdf6861317f2f40	2020-11-30 09:10:15 -08:00
Gemfield	3c9e71c9ad	fix BUILD_MOBILE_BENCHMARK typo (#48515 ) Summary: BUILD_MOBILE_BENCHMARKS in CMakeLists.txt should be BUILD_MOBILE_BENCHMARK. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48515 Reviewed By: albanD Differential Revision: D25198724 Pulled By: mrshenli fbshipit-source-id: 12765d10c272da04cb104202fcbabc6a0b007c5e	2020-11-30 08:38:43 -08:00
Peter Bell	5bb2a87a94	Update sleef to fix build issues (#48529 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48532 After PR https://github.com/pytorch/pytorch/issues/48275 updated the sleef submodule, pytorch incremental builds started failing due to shibatch/sleef#349. This updates the submodule to include the CMake fix in shibatch/sleef#361. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48529 Reviewed By: mrshenli Differential Revision: D25210746 Pulled By: malfet fbshipit-source-id: b41ac8de94848413397a19259c6affed5b2cb25b	2020-11-30 06:48:32 -08:00
lixinyu	5cb688b714	Merge all vec256 tests into one framework (#47294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47294 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D24707378 Pulled By: glaringlee fbshipit-source-id: cc47ddb49bc2a3ecff9359e9623b0d7774743398	2020-11-30 05:13:09 -08:00
shubhambhokare1	bdf360f9f2	[ONNX] Update onnx submodule (#47366 ) Summary: Update onnx submodule to 1.8 release Pull Request resolved: https://github.com/pytorch/pytorch/pull/47366 Reviewed By: hl475 Differential Revision: D24968733 Pulled By: houseroad fbshipit-source-id: 2f0a3436ab3c9380ed8ff0887a483743c1209721	2020-11-30 00:05:46 -08:00
mariosasko	755b8158e2	Fix __config__ docs (#48557 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48287 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48557 Reviewed By: ngimel Differential Revision: D25211872 Pulled By: mruberry fbshipit-source-id: ac916e16722809e747bd8960675c1477e3a1084d	2020-11-29 23:57:06 -08:00
Edward Yang	0e5682d26b	Pruning codeowners who don't actual do code review. (#48109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48109 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D25026754 Pulled By: ezyang fbshipit-source-id: c8f77a05fad867427789f376ef9da3a697e25353	2020-11-29 19:46:32 -08:00
Guilherme Leobas	2fe382e931	annotate torch._tensor_str (#48463 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48462 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48463 Reviewed By: mrshenli Differential Revision: D25187168 Pulled By: malfet fbshipit-source-id: bb4ad1c6d376ad37995638615080452c71e36959	2020-11-29 10:09:19 -08:00
Mike Ruberry	36c87f1243	Refactors test_torch.py to be fewer than 10k lines (#47356 ) Summary: Creates multiple new test suites to have fewer tests in test_torch.py, consistent with previous test suite creation like test_unary_ufuncs.py and test_linalg.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47356 Reviewed By: ngimel Differential Revision: D25202268 Pulled By: mruberry fbshipit-source-id: 75fde3ca76545d1b32b86d432a5cb7a5ba8f5bb6	2020-11-28 20:11:40 -08:00
kiyosora	272f4db043	Implement NumPy-like function torch.float_power() (#44937 ) Summary: - Related with https://github.com/pytorch/pytorch/issues/38349 - Implementing the NumPy-like function `torch.float_power()` . Pull Request resolved: https://github.com/pytorch/pytorch/pull/44937 Reviewed By: ngimel Differential Revision: D25192119 Pulled By: mruberry fbshipit-source-id: 2e446b8e0c2825f045fe057e30c9419335557a05	2020-11-27 18:01:42 -08:00
kshitij12345	25ab39acd0	[numpy] `torch.asin` : promote integer inputs to float (#48461 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48461 Reviewed By: ngimel Differential Revision: D25192319 Pulled By: mruberry fbshipit-source-id: fd5dffeca9cd98b86782bfa6a9ab367e425ee934	2020-11-27 15:26:58 -08:00
Antonio Cuni	344918576c	Migrate `eig` from the TH to Aten (CUDA) (#44105 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24553 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44105 Reviewed By: ngimel Differential Revision: D25192116 Pulled By: mruberry fbshipit-source-id: 87f1ba4924b9174bfe0d9e2ab14bbe1c6bae879c	2020-11-27 15:15:48 -08:00
kshitij12345	f95af7a79a	[numpy] `torch.erf{c}` : promote integer inputs to float (#48472 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48472 Reviewed By: ngimel Differential Revision: D25192324 Pulled By: mruberry fbshipit-source-id: 6ef2fec8a27425f9c4c917fc3ae25ac1e1f5f454	2020-11-27 15:08:40 -08:00
Peter Bell	7df8445242	torch.fft: Remove complex gradcheck workaround (#48425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48425 gradcheck now natively supports functions with complex inputs and/or outputs. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25176377 Pulled By: mruberry fbshipit-source-id: d603e2511943f38aeb3b8cfd972af6bf4701ed29	2020-11-26 22:45:59 -08:00
Jeff Daily	5dfced3b0d	work around #47028 until a proper fix is identified (#48405 ) Summary: Otherwise, this test will appear flaky for ROCm even though it is a generic PyTorch issue. CC albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/48405 Reviewed By: mrshenli Differential Revision: D25183473 Pulled By: ngimel fbshipit-source-id: 0fa19b5497a713cc6c5d251598e57cc7068604be	2020-11-26 18:33:19 -08:00
Jeff Yang	84fafbe49c	[docs] docstring for no type checked meshgrid (#48471 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48395 I am not sure this is a correct way for fixing tho cc mruberry Locally built preview: ![Screen Shot 2020-11-26 at 14 57 49](https://user-images.githubusercontent.com/32727188/100326034-d14f6100-2ff7-11eb-8abb-53317b9f518e.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48471 Reviewed By: mrshenli Differential Revision: D25191033 Pulled By: mruberry fbshipit-source-id: e5d9cb2748f7cb81923a1d4f204ffb330f6da1ee	2020-11-26 17:28:41 -08:00
Hector Yuen	c5ce995834	reintroduce deadline removal (#48481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48481 we removed deadlines thinking that was the cause for the timeout of the int8 test, but turns out that the int8 tests were failing because of a legitimate bug and this was masked as a timeout Now that the bug has been fixed, the tests are failing because it takes more than 10s to run the test, this is an option we used to override https://www.internalfb.com/intern/testinfra/diagnostics/2533274833752501.562949971168104.1606367728/ Test Plan: reran the test manually Reviewed By: venkatacrc Differential Revision: D25184573 fbshipit-source-id: 0b1b2eaa690472e80b9b0991618da8d792aeb42b	2020-11-26 11:10:29 -08:00
Nikita Shulga	8b248af35d	Alias _size_N_t to BroadcastingListN[int] (#48297 ) Summary: Because they are one and the same Fixes https://github.com/pytorch/pytorch/issues/47528 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48297 Reviewed By: eellison Differential Revision: D25116203 Pulled By: malfet fbshipit-source-id: 7edc2c89daa3f3302822b1f9b83b41b04658c6b7	2020-11-26 08:09:43 -08:00
Nikita Shulga	e7ca62be08	Fix PyTorch compilation on Apple M1 (#48275 ) Summary: Update cpuinfo and sleef to contain build fixes for M1 Fixes https://github.com/pytorch/pytorch/issues/48145 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48275 Reviewed By: walterddr Differential Revision: D25135153 Pulled By: malfet fbshipit-source-id: 2a82e14407d6f40c7dacd11109a8499d808c8ec1	2020-11-26 07:08:33 -08:00
Peter Bell	18ae12a841	Refactor mkl fft planning to not use Tensor objects (#46910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46910 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25119656 Pulled By: mruberry fbshipit-source-id: 77943d6cdf629240c814dc8df530dd7ee4163963	2020-11-25 23:04:41 -08:00
xiorcale	6a37582162	Fix misleading doc string in quint8.h (#48418 ) Summary: The doc string let suppose that `quint8` is for singed 8 bit values. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48418 Reviewed By: ngimel Differential Revision: D25181705 Pulled By: mrshenli fbshipit-source-id: 70e151b6279fef75505f80a7b0cd50032b4f1008	2020-11-25 20:48:39 -08:00
pratish	e56e21b775	Grammatically update the readme docs (#48328 ) Summary: Small grammatical update to the readme docs. ![Capture-py1](https://user-images.githubusercontent.com/65657554/99846018-9b475280-2b9b-11eb-84ab-37e129e4f3e6.PNG) ![Capture-py2](https://user-images.githubusercontent.com/65657554/99846023-9da9ac80-2b9b-11eb-9b3b-0998f53ec2ce.PNG) ![Capture-py3](https://user-images.githubusercontent.com/65657554/99846034-a0a49d00-2b9b-11eb-807e-7200c0b6fef4.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48328 Reviewed By: linbinyu Differential Revision: D25132876 Pulled By: mrshenli fbshipit-source-id: f1214b3098bec6713ef53f226f8d0d33946a5ec1	2020-11-25 19:56:32 -08:00
Jithun Nair	f1c985695c	Enabled gloo backend in test_distributed unit tests for ROCm (#40395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40395 Reviewed By: ngimel Differential Revision: D25181692 Pulled By: mrshenli fbshipit-source-id: 29f478c974791efc0acea210c8c9e574944746a5	2020-11-25 19:51:40 -08:00
elfringham	db1b0b06c4	Flake8 fixes (#48453 ) Summary: Quiet errors from flake8. Only a couple of code changes for deprecated Python syntax from before 2.4. The rest is just adding noqa markers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48453 Reviewed By: mruberry Differential Revision: D25181871 Pulled By: ngimel fbshipit-source-id: f8d7298aae783b1bce2a46827b088fc390970641	2020-11-25 19:09:50 -08:00
Venkata Chintapalli	55e225a2dc	Int8 FC fix to match NNPI ICE-REF step-C (#48459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48459 Bias should be kept it in FP32. There is no need to convert the bias to FP16. Test Plan: https://internalfb.com/intern/testinfra/testrun/562950128141661 Reviewed By: hyuen Differential Revision: D25179863 fbshipit-source-id: e25d948c613d2b2d5adf2b674fc2ea4b4c8d3920	2020-11-25 14:58:06 -08:00
lixinyu	3858aaab37	Fix syntax issue in c++ cuda api note (#48434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48434 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D25173692 Pulled By: glaringlee fbshipit-source-id: bbd6fa7615200bf1eaea731a4ed251d423412593	2020-11-25 14:31:14 -08:00
Xiao Wang	4ab2055857	Re-enable only cuda tests wrongly disabled before (#48429 ) Summary: Close https://github.com/pytorch/pytorch/issues/46536 Re-enable only cuda tests wrongly disabled in https://github.com/pytorch/pytorch/pull/45332 See discussions https://github.com/pytorch/pytorch/issues/46536#issuecomment-721386038 and https://github.com/pytorch/pytorch/pull/45332#issuecomment-721350987 ~~See also https://github.com/pytorch/pytorch/pull/47237 and https://github.com/pytorch/pytorch/pull/47642~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/48429 Reviewed By: ngimel Differential Revision: D25176368 Pulled By: mruberry fbshipit-source-id: 3822f5a45e58c0e387624e70ea272d16218901a9	2020-11-25 13:26:35 -08:00
kshitij12345	9ecaeb0962	[numpy] Add unary-ufunc tests for `erf` variants (#47155 ) Summary: Adding Unary Ufunc Test entry for `erf` variants. We use scipy functions for reference implementation. We can later update the tests once these functions will update integer input to float. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47155 Reviewed By: ngimel Differential Revision: D25176654 Pulled By: mruberry fbshipit-source-id: cb08efed1468b27650cec4f87a9a34e999ebd810	2020-11-25 13:20:14 -08:00
kshitij12345	33cc1d6a64	[docs] fix torch.swap{dim/axes} to showup in docs (#48376 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48372 Verified locally that it is generated ![Screenshot from 2020-11-22 20-38-15](https://user-images.githubusercontent.com/19503980/99907517-298a1880-2d03-11eb-9a8f-9809609c2d2d.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48376 Reviewed By: ngimel Differential Revision: D25176483 Pulled By: mruberry fbshipit-source-id: 911b57d43319059cc9f809ea0396c3740ff81ff5	2020-11-25 13:15:39 -08:00
Vasiliy Kuznetsov	bc2c1d7d59	quant: make each line of fx/quantize.py <=80 chars (#48357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48357 Cleans up the long lines in `torch/quantization/fx/quantize.py` to fit the 80 character limit, so it's easier to read and looks better on FB's tools. In the future we can consider adding a linter for this. Test Plan: CI Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25140833 fbshipit-source-id: 78605d58eda0184eb82f510baec26685a34870e2	2020-11-25 09:04:23 -08:00
Vasiliy Kuznetsov	1d984410fb	quant fx: fix typo (#48356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48356 As titled Test Plan: CI Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25140834 fbshipit-source-id: e22f8d1ae77c7eb2ec8275b5fbca7dc5e503a4ca	2020-11-25 09:04:20 -08:00
Vasiliy Kuznetsov	8581c02a3f	quant: add type annotations on quantization.fx.Quantizer matches (#48350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48350 As titled, continuing to incrementally type quantization.fx.Quantizer. Test Plan: ``` mypy torch/quantization/ python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25138947 fbshipit-source-id: fd19bf360077b447ce2272bfd4f6d6b798ae05ac	2020-11-25 08:59:29 -08:00
Ilia Cherniavskii	f7a8bf2855	Use libkineto in profiler (#46470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46470 Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Reviewed By: Chillee Differential Revision: D25142223 Pulled By: ilia-cher fbshipit-source-id: b0dff46c28da5fb0a8e01cf548aa4f2b723fde80	2020-11-25 04:32:16 -08:00
kshitij12345	e9efd8df1b	[numpy] `torch.log1p` : promote integer inputs to float (#48002 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48002 Reviewed By: ngimel Differential Revision: D25148911 Pulled By: mruberry fbshipit-source-id: 902d0ddf699debd6edd1b3d55f5c73932ca45e83	2020-11-24 22:01:07 -08:00
Fayçal Arbai	2e0a8b75d8	An implementation of torch.tile as requested in pytorch/pytorch#38349 (#47974 ) Summary: The approach is to simply reuse `torch.repeat` but adding one more functionality to tile, which is to prepend 1's to reps arrays if there are more dimensions to the tensors than the reps given in input. Thus for a tensor of shape (64, 3, 24, 24) and reps of (2, 2) will become (1, 1, 2, 2), which is what NumPy does. I've encountered some instability with the test on my end, where I could get a random failure of the test (due to, sometimes, random value of `self.dim()`, and sometimes, segfaults). I'd appreciate any feedback on the test or an explanation for this instability so I can this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47974 Reviewed By: ngimel Differential Revision: D25148963 Pulled By: mruberry fbshipit-source-id: bf63b72c6fe3d3998a682822e669666f7cc97c58	2020-11-24 18:07:25 -08:00
Nikita Shulga	2dff0b3e91	Fix typos in comments (#48316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48316 Reviewed By: walterddr, mrshenli Differential Revision: D25125123 Pulled By: malfet fbshipit-source-id: 6f31e5456cc078cc61b288191f1933711acebba0	2020-11-24 10:56:40 -08:00
generatedunixname89002005325676	671ee71ad4	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D25158667 fbshipit-source-id: 3b2a7facbfbfaaabc2cb5ac22906673b17fd0f15	2020-11-23 05:03:53 -08:00
Ivan Yashchuk	4ed7f36ed1	Added linalg.eigh, linalg.eigvalsh (#45526 ) Summary: This PR adds `torch.linalg.eigh`, and `torch.linalg.eigvalsh` for NumPy compatibility. The current `torch.symeig` uses (on CPU) a different LAPACK routine than NumPy (`syev` vs `syevd`). Even though it shouldn't matter in practice, `torch.linalg.eigh` uses `syevd` (as NumPy does). Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45526 Reviewed By: gchanan Differential Revision: D25022659 Pulled By: mruberry fbshipit-source-id: 3676b77a121c4b5abdb712ad06702ac4944e900a	2020-11-22 04:57:28 -08:00
Kurt Mohler	b6654906c7	Fix assertEqual's handling of numpy array inputs (#48217 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48217 Reviewed By: mrshenli Differential Revision: D25119607 Pulled By: mruberry fbshipit-source-id: efe84380d3797d242c2aa7d43d2209bcba89cee0	2020-11-22 00:13:42 -08:00
Ilia Cherniavskii	f2da18af14	Add USE_KINETO build option (#45888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45888 Adding USE_LIBKINETO build option Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake Reviewed By: Chillee Differential Revision: D25142221 Pulled By: ilia-cher fbshipit-source-id: d1634a8f9599604ff511fac59b9072854289510c	2020-11-21 20:20:32 -08:00
Vasiliy Kuznetsov	c5e380bfcb	quant: add type annotations on quantization.fx.Quantizer class vars (#48343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48343 Annotates the 4 class variables on `Quantizer` with real types, fixing the small things uncovered by this along the way. Test Plan: ``` mypy torch/quantization/ python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: supriyar Differential Revision: D25136212 fbshipit-source-id: 6ee556c291c395bd8d8765a99f10793ca738086f	2020-11-21 15:31:00 -08:00
Vasiliy Kuznetsov	6b80b664bb	quant: enable mypy on torch/quantization/fx (#48331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48331 Enables mypy to not ignore type errors in FX quantization files. Fixes the easy typing errors inline, and comments out the harder errors to be fixed at a later time. After this PR, mypy runs without errors on `torch/quantization`. Test Plan: ``` > mypy torch/quantization/ Success: no issues found in 25 source files ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25133348 fbshipit-source-id: 0568ef9405b292b80b3857eae300450108843e80	2020-11-21 15:29:27 -08:00
Yi Wang	cac553cf34	[Gradient Compression] clang-format test_c10d.py (#48349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48349 Apply clang-format only. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117305263 Test Plan: N/A Reviewed By: pritamdamania87 Differential Revision: D25138833 fbshipit-source-id: 4ff112b579c0c5b8146495ebd2976d5faead2c1b	2020-11-21 09:28:39 -08:00
Yi Wang	6400d27bbb	[Gradient Compression] Define a customized state for PowerSGD comm hook (#48348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48348 To support the features like error feedback, warm start, PowerSGD comm hook needs to maintain a state besides process group. Currently this state only includes a process group and a matrix approximation rank config. This diff is a pure refactoring. Plan to add more state fields later. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117305280 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl_grad_is_view Reviewed By: rohan-varma Differential Revision: D25137962 fbshipit-source-id: cd72b8b01e20f80a92c7577d22f2c96e9eebdc52	2020-11-21 09:25:35 -08:00
Mikhail Zolotukhin	b967119906	[TensorExpr] Fix lowering for aten::div. (#48329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48329 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D25130750 Pulled By: ZolotukhinM fbshipit-source-id: 7c6345adcaec5f92cd6ce78b01f6a7d5923c0004	2020-11-21 09:20:28 -08:00
Mikhail Zolotukhin	5e1faa1d41	[TensorExpr] Fix aten::atan2 lowering and disable aten::pow lowering on CPU. (#48326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48326 The PR introduces a set of 'cuda-only' ops into `isSupported` function. It is done to disable `pow` lowering on CPU where it's tricky to support integer versions. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D25129211 Pulled By: ZolotukhinM fbshipit-source-id: c62ae466e1d9ba9b3020519aadaa2a7fe7942d84	2020-11-21 09:15:42 -08:00
Horace He	f1d328633c	Fix mypy error (#48359 ) Summary: Fixes error introduced in https://github.com/pytorch/pytorch/pull/47657 `node.target` can be either a str or a callable, but this is checked in the pattern matching portion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48359 Reviewed By: ilia-cher Differential Revision: D25141885 Pulled By: Chillee fbshipit-source-id: 94365a5a3dd351652ea7337077cd0e71b6ffe203	2020-11-21 03:32:54 -08:00
Elias Ellison	a00ba63023	Disable old fuser internally (#48322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48322 Disable old fuser internally. I would like to find where we are inadvertently setting the old fuser, but in the meantime I would like to land a diff that I know will 100% cause it not to be run, and verify that it fixes the issue. Test Plan: sandcastle Reviewed By: ZolotukhinM Differential Revision: D25126202 fbshipit-source-id: 5a4d0742f5f829e536f50e7ede1256c94dd05232	2020-11-21 00:42:23 -08:00
Jerry Zhang	636fa8fda8	[quant] Add backend_independent option for quantized linear module (#48192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48192 This is to allow producing a backend independent quantized module since some backend don't have packed weight for linear Test Plan: test_quantized_module.py Imported from OSS Reviewed By: raghuramank100 Differential Revision: D25061645 fbshipit-source-id: a65535e53f35af4f2926af0ee330fdaae6dae996	2020-11-21 00:32:27 -08:00
Ilia Cherniavskii	fdc62c74a6	Add Kineto submodule (separate PR) (#48332 ) Summary: Separate PR to add Kineto submodule, mirrors the one I used in my stack (45887) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48332 Reviewed By: gdankel Differential Revision: D25139969 Pulled By: ilia-cher fbshipit-source-id: b9ca2be5f15647655eeb4b2fbf4c82f84eee3dd8	2020-11-20 23:46:34 -08:00
Martin Yuan	6615edaf9a	[Pytorch Mobile] Disable OutOfPlace calls for mobile (#48255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48255 There's effort to move aten::native files to app level. After this effort, the operators and their kernels in aten::native can be selectively built per app. There are some direct reference to at::native symbols in jit/runtime/static/ops.cpp. Those symbols are missing if their implementations are moved up to app level in Android. Files in jit/runtime/static folder belongs to full-jit and should not be built from mobile. However, since Federated Learning is still using full-jit in fb4a, The current solution is to exclude those files from mobile torch_core target. ghstack-source-id: 117123663 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D24822690 fbshipit-source-id: c599b10f35e8d42bd4ca272da1a0cddf88ad7c37	2020-11-20 22:26:00 -08:00
Rohan Varma	16d089733b	Enable creation and transfer of ScriptModule over RPC (#48293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48293 This PR enables a `ScriptModule` to be created on a remote worker, retrieve it to current device, and run methods on it remotely. In order to do this, we define custom pickling for `ScriptModule` in our `InternalRPCPickler`. The pickling basically uses torch.save and torch.load to save/recover the ScriptModule. We test that we can create and retrieve a ScriptModule with rpc_sync, and create RRef to ScriptModule with rpc.remote. We can also run remote methods on the rref and transfer it to current worker. Although we can run methods remotely on the RRef to ScriptModule, this does not currently work with RRef helper, filed to track that. ghstack-source-id: 117275954 Test Plan: CI Reviewed By: wanchaol Differential Revision: D25107773 fbshipit-source-id: daadccf7bd25fe576110ee6e0dba6ed2bcd3e7f3	2020-11-20 22:15:54 -08:00
Jerry Zhang	44def9ad71	[quant][fix] Fix quantization for qat.ConvBnReLU1d (#48059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48059 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D25006388 fbshipit-source-id: ce911ce5e9c51966311cdf9e57dd6eceb357c74a	2020-11-20 21:02:02 -08:00
Nathan Lanza	50e42b9092	Explicitly cast an implicit conversion from some macro defined type to a double (#48290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48290 `scalar_t` here is expanded from nested macros to be an input value and `upper_bound` is templated upon it. Whatever it gives back is unconditionally cast to a `double` via the fact that it is always passed to `binary_kernel_reduce_vec` which has a `double` as the fourth argument. Change it here to be an explicit `static_cast<double>` to do what the compiler was implicitly doing. Test Plan: this is an error with -Werror in llvm11. This allows it to build Reviewed By: ezyang Differential Revision: D25111258 fbshipit-source-id: 6837afec52821f1f57b8c8f2df2d0eb3fc9b58bd	2020-11-20 19:21:22 -08:00
Horace He	0a3db1d460	[FX] Prototype Conv/BN fuser in FX (#47657 ) Summary: Some interesting stuff going on. All benchmarks are tested with both my implementation as well as the current quantized fuser. For these benchmarks, things like using MKLDNN/FBGEMM make a big differene. ## Manual compilation (everything turned off) In the small case, things look good ``` non-fused: 1.174886703491211 fused: 0.7494957447052002 ``` However, for `torchvision.resnet18`, we see ``` non-fused: 1.2272708415985107 fused: 3.7183213233947754 ``` This is because Conv (no bias) -> Batch Norm is actually faster than Conv (bias) if you don't have any libraries... ## Nightly (CPU) ``` Toy non-fused: 0.45807552337646484 fused: 0.34779977798461914 resnet18 non-fused: 0.14216232299804688 fused: 0.13438796997070312 resnet50 non-fused: 0.2999534606933594 fused: 0.29364800453186035 densenet161 non-fused: 0.6558926105499268 fused: 0.6190280914306641 inception_v3 non-fused: 1.2804391384124756 fused: 1.181272029876709 ``` with MKLDNN. We see a small performance gain across the board, with more significant performance gains for smaller models. ## Nightly (CUDA) ``` M non-fused: 1.2220964431762695 fused: 1.0833759307861328 resnet18 non-fused: 0.09721899032592773 fused: 0.09089207649230957 resnet50 non-fused: 0.2053072452545166 fused: 0.19138741493225098 densenet161 non-fused: 0.6830024719238281 fused: 0.660109281539917 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47657 Reviewed By: eellison Differential Revision: D25127546 Pulled By: Chillee fbshipit-source-id: ecdf682038def046045fcc09faf9aeb6c459b5e3	2020-11-20 18:51:32 -08:00
Nikita Shulga	6d0947c8cf	Revert D25093315: [pytorch][PR] Fix inf norm grad Test Plan: revert-hammer Differential Revision: D25093315 (`ca880d77b8`) Original commit changeset: be1a7af32fe8 fbshipit-source-id: b383ec2a2c5884149b4fc7896f9d2856259794cd	2020-11-20 18:27:52 -08:00
Xin Guan	f8722825b5	Compare Weights FX Implementation (#48056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48056 PyTorch FX Quantization API: Compare weights ghstack-source-id: 117255311 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_remove_qconfig_observer_fx' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_dynamic_fx' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_static_fx' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_conv_static_fx' Reviewed By: hx89 Differential Revision: D24940516 fbshipit-source-id: 301c1958c0e64ead9072e0fd002e4b21e8cb5b79	2020-11-20 17:17:19 -08:00
Nathan Lanza	fefd56c4db	Remove an accidental copy in a range-based for loop (#48234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48234 This was copying the iterator varaible instead of taking a reference. Fix the trivial error here. Test Plan: clang11 catches this and throws a warning that we promote with -Werror. This change fixes the error. Reviewed By: smeenai Differential Revision: D24970929 fbshipit-source-id: 335a1b53276467987bc27fa41326803e01e70c01	2020-11-20 17:10:30 -08:00
Bram Wasti	286cdf3cda	[static runtime] add static registry (#48258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48258 This will enable closed source contributions Test Plan: buck test mode/no-gpu //caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: hlu1 Differential Revision: D25031586 fbshipit-source-id: def859fa2fb4f01910b040242662a51b85804f01	2020-11-20 17:05:24 -08:00
Bram Wasti	0984d3123a	[static runtime] add more _out variants (#48260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48260 supporting a couple more operators Test Plan: use Ansha's test framework for e2e test ``` numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --pred_net=/home/bwasti/adindexer/precomputation_merge_net.pb --c2_inputs=/home/bwasti/adindexer/c2_inputs_precomputation_bs1.pb --c2_weights=/home/bwasti/adindexer/c2_weights_precomputation.pb --scripted_model=/home/bwasti/adindexer/traced_precomputation_partial_dper_fixes.pt --pt_inputs=/home/bwasti/adindexer/container_precomputation_bs1.pt --iters=30000 --warmup_iters=10000 --num_threads=1 --pt_enable_static_runtime=true --pt_cleanup_activations=true --pt_enable_out_variant=true --eps 1e-2 ``` Reviewed By: hlu1 Differential Revision: D24767322 fbshipit-source-id: dce7f9bc0427632129f263bad509f0f00a21ccf3	2020-11-20 17:05:21 -08:00
Nathan Lanza	87bfb2ff08	Automatically infer the type of the iterator in a range-based for loop (#48232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48232 This was not the correct type that's being generated via the begin() function and thus a new object was being created and attempted to take a reference to. Instead just take a reference to whatever the range-based loop generates. Test Plan: This fixes a build error from a new warnign in llvm11 Reviewed By: smeenai Differential Revision: D24970920 fbshipit-source-id: f125dca900f7550eee505b4f94781b6637533be0	2020-11-20 17:03:51 -08:00
Tao Xu	8f1af0947c	[iOS] Fix the fbios armv7 pika build Summary: UBN task - T80034029 Test Plan: 1. pika-armv7: ` buck build //xplat/caffe2:aten_metalApple --flagfile 'fbsource//fbobjc/mode/ios.py#iphoneos-armv7,pika10,apple_toolchain'` 2. pika-arm64: ` buck build //xplat/caffe2:aten_metalApple --flagfile 'fbsource//fbobjc/mode/ios.py#iphoneos,pika10,apple_toolchain'` Differential Revision: D25134207 fbshipit-source-id: 68b32dab1fc382ec23d7602e34bb64786cb38254	2020-11-20 16:31:37 -08:00
Nikita Shulga	dc843fe197	Fix test_ldexp on Windows (#48335 ) Summary: Force `torch.randint` to generate tensor of int32 rather than tensor of int64 Delete unneeded copies Pull Request resolved: https://github.com/pytorch/pytorch/pull/48335 Reviewed By: ranman Differential Revision: D25133312 Pulled By: malfet fbshipit-source-id: 70bfcb6b7ff3bea611c4277e6634dc7473541288	2020-11-20 15:41:59 -08:00
Kurtis David	7be30d1883	Move CUDA kernel check to c10 (#48277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48277 We move `TORCH_CUDA_KERNEL_LAUNCH_CHECK` from `//caffe2/aten/src/ATen/cuda/Exceptions.h` to `//caffe2/c10/cuda/CUDAException.h`. The primary reason is for allowing us to use this MACRO in other subdirectories of //caffe2, not just in ATen. Refer to D24309971 (`353e7f940f`) for context. An example of this use case is D24868557, where we add these checks to `//caffe2/caffe2/sgd`. Also, this should not affect current files, because `Exceptions.h` includes `CUDAException.h`. Test Plan: ``` buck build //caffe2/aten:ATen-cu ``` - https://fburl.com/buck/oq3rxbir Also wait for sandcastle tests. Reviewed By: ngimel Differential Revision: D25101720 fbshipit-source-id: e2b05b39ff1413a21e64949e26ca24c8f7d0400f	2020-11-20 14:58:15 -08:00
Chester Liu	8177f63c91	Reorganize and refine the Windows.h import in C++ files (#48009 ) Summary: This PR aims to reduce the import overhead and symbol noises from the `windows.h` headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48009 Reviewed By: gchanan Differential Revision: D25045840 Pulled By: ezyang fbshipit-source-id: 01fda70f433ba2dd0cd2d7cd676ab6ffe9d98b90	2020-11-20 14:21:09 -08:00
Nikita Shulga	6d5d336a63	Revert D25108971: [pytorch][PR] enable cuda11.1 and cudnn 8.0.5 in CI Test Plan: revert-hammer Differential Revision: D25108971 (`84d4e9c4fa`) Original commit changeset: d836690e1d5d fbshipit-source-id: 555d1b8ee046d4263920cba8859b6d58e11fccd7	2020-11-20 12:01:24 -08:00
Elias Ellison	d1b8da75e6	[JIT] Metacompile boolean constants (#46721 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46703 Previously, we would compile one side of an if-statement if it was a type-based expression we could statically resolve. I think it's reasonable to extend this metacompilation to booleans that are constant at compile time. There have been some instances where i've recommended unintuitive workarounds due to not having this behavior. This is also possibly needed if we add boolean literals to schema declarations, which is a feature that might be needed to cleanup our `boolean_dispatch` mechanism. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46721 Reviewed By: ppwwyyxx Differential Revision: D25008862 Pulled By: eellison fbshipit-source-id: 5bc60a18f1021c010cb6abbeb5399c669fe04312	2020-11-20 11:17:15 -08:00
Tristan Rice	6eaf1e358c	caffe2/core.Net: is_external_input rebuild lookup tables when necessary Summary: is_external_input doesn't check if the lookup tables are valid. Calling .Proto() should invalidate all lookup tables and have them rebuilt on call to any methods depending on them. This adds this check to is_external_input. Test Plan: internal unit tests Reviewed By: dzhulgakov, esqu1 Differential Revision: D25100464 fbshipit-source-id: d792dec7e5aa9ffeafda88350e05cb757f4c4831	2020-11-20 10:53:24 -08:00
Jeffrey Wan	ca880d77b8	Fix inf norm grad (#48122 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41779 Also fixes an issue with inf norm returning small non-zero values due to usage of `numeric_limit::min` which actually "returns the minimum positive normalized value" when applied to floating-point numbers. See https://en.cppreference.com/w/cpp/types/numeric_limits/min. ``` >>> import torch >>> with torch.enable_grad(): ... a = torch.tensor([ ... [9., 2., 9.], ... [-2., -3., -4.], ... [7., 8., -9.], ... ], requires_grad=True) ... b = torch.norm(a, p=float('inf')) ... b.backward() ... print(a.grad) ... tensor([[ 0.3333, 0.0000, 0.3333], [-0.0000, -0.0000, -0.0000], [ 0.0000, 0.0000, -0.3333]]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48122 Reviewed By: izdeby Differential Revision: D25093315 Pulled By: soulitzer fbshipit-source-id: be1a7af32fe8bac0df877971fd75089d33e4bd43	2020-11-20 10:22:11 -08:00
Brian Johnson	63b04dc11d	Update index.rst (#47282 ) Summary: Updating master to match changes we made to 1.7. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/47282 Reviewed By: zhangguanheng66 Differential Revision: D24727322 Pulled By: brianjo fbshipit-source-id: 64e3f06eb32c965390f282b81084460903d872a2	2020-11-20 08:52:00 -08:00
Bugra Akyildiz	68a50a7152	Replace `GatherRangesToDense` operator in Dper from c2 to pt. Summary: Replace `GatherRangesToDense` operator in Dper from c2 to pt. Test Plan: ``` buck test //caffe2/torch/fb/sparsenn:test mode/dev-sand -c fbcode.nvcc_arch=v100 -c fbcode.enable_nccl_a2a=1 ``` ``` Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/3659174735981484 ✓ ListingSuccess: caffe2/torch/fb/sparsenn:test - main (22.179) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_all_dropout_empty_input (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (27.738) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_one_hot_lengths (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (27.764) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (27.787) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_lengths_to_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (27.804) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_chunks (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (27.806) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges_empty_batch (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (27.947) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_multiple_runs (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (28.008) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_one_hot (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (28.036) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_sort_id_score_list_by_score (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (28.080) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges_to_dense_caffe2_without_key (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (28.119) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_range (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (28.147) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges_to_dense_caffe2 (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (28.179) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_lengths_range (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (28.241) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_transform (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (28.252) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_all_dropout (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (28.265) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_bucketize (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (28.274) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_box_cox (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (28.305) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_sigrid_hash_op (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (28.314) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_cumsum (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (28.314) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_ranges (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (28.393) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_rowwise_prune_op_32bit_indices (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (28.411) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_no_dropout (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (28.520) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_tracing (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (28.945) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_scale_gradient_backward (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (33.231) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.864) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_create (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.634) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_prior_correction_calibration_accumulate (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (21.113) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_scale_gradient (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (21.204) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_lengths (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (21.533) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_lengths_empty_batch (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (21.487) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (21.807) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges_to_dense_without_max_mismatched_ratio (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (21.576) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_rowwise_prune_op_64bit_indices (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (22.209) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_embedding_bag_4bit_rowwise_sparse (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (22.072) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_prior_correction_calibration_prediction (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (24.934) Summary Pass: 35 ListingSuccess: 1 Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3659174735981484 ``` ``` buck build mode/opt //caffe2/benchmarks/operator_benchmark/fb/pt:gather_ranges_to_dense_benchmark_test aibench-cli adhoc -c 'buck run //caffe2/benchmarks/operator_benchmark/fb/pt:gather_ranges_to_dense_benchmark_test' ``` ``` # Benchmarking PyTorch: gather_ranges_to_dense # Mode: Eager # Name: gather_ranges_to_dense_batch_size13_max_lengths14_opcaffe2_gather_ranges_to_dense # Input: batch_size: 13, max_lengths: 14, op: caffe2_gather_ranges_to_dense Forward Execution Time (us) : 10.428 # Benchmarking PyTorch: gather_ranges_to_dense # Mode: Eager # Name: gather_ranges_to_dense_batch_size13_max_lengths14_optorch_gather_ranges_to_dense # Input: batch_size: 13, max_lengths: 14, op: torch_gather_ranges_to_dense Forward Execution Time (us) : 8.986 ``` Reviewed By: dzhulgakov Differential Revision: D24831789 fbshipit-source-id: 110edc86335ae357da435babf87da1a3e537c631	2020-11-20 08:14:32 -08:00
Howard Huang	55d5b27343	Refactor request_callback_no_python.cpp processRpc function (#47816 ) Summary: Addresses step 1 of https://github.com/pytorch/pytorch/issues/46564 Took processing logic for each case in request_callback_no_python.cpp and put it in a dedicated function. cc: izdeby Pull Request resolved: https://github.com/pytorch/pytorch/pull/47816 Reviewed By: izdeby Differential Revision: D25090207 Pulled By: H-Huang fbshipit-source-id: bfa38e38db02e077d859125739aaede90ba492e7	2020-11-20 07:29:51 -08:00
Randall Hunt	562d4c3bc5	Add basic ldexp operator for numpy compatibility (#45370 ) Summary: Adds ldexp operator for https://github.com/pytorch/pytorch/issues/38349 I'm not entirely sure the changes to `NamedRegistrations.cpp` were needed but I saw other operators in there so I added it. Normally the ldexp operator is used along with the frexp to construct and deconstruct floating point values. This is useful for performing operations on either the mantissa and exponent portions of floating point values. Sleef, std math.h, and cuda support both ldexp and frexp but not for all data types. I wasn't able to figure out how to get the iterators to play nicely with a vectorized kernel so I have left this with just the normal CPU kernel for now. This is the first operator I'm adding so please review with an eye for errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45370 Reviewed By: mruberry Differential Revision: D24333516 Pulled By: ranman fbshipit-source-id: 2df78088f00aa9789aae1124eda399771e120d3f	2020-11-20 04:09:39 -08:00
Xiong Wei	ec256ab2f2	implement torch.addr using TensorIterator based kernels (#47664 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47313 This PR implements `torch.addr` function using `TensorIterator` with `cpu_kernel_vec` and `gpu_kernel`. It helps reduce memory usage, improve performance, and fix the bug when `beta` or `alpha` is a complex number. Todo - [x] benchmarking `torch.addr` for the change of this PR, as well as the legacy TH implementation used in PyTorch 1.6. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47664 Reviewed By: zhangguanheng66 Differential Revision: D25059693 Pulled By: ngimel fbshipit-source-id: 20a90824aa4cb2240e81a9f17a9e2f16ae6e3437	2020-11-20 00:21:49 -08:00
Mikhail Zolotukhin	eb49dabe92	[TensorExpr] Add even more operator tests. (#48292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48292 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25113397 Pulled By: ZolotukhinM fbshipit-source-id: a8591006e1fb71b87d50c8a150739a9bca835928	2020-11-19 23:35:19 -08:00
Mikhail Zolotukhin	efd41db32c	[TensorExpr] Add more operator tests. (#48282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48282 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D25108184 Pulled By: ZolotukhinM fbshipit-source-id: ba8cdf6253533210a92348f475b8b9400d8ecb1a	2020-11-19 23:29:11 -08:00
Hector Yuen	56129bdea2	remove having no deadline for the test (#48226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48226 this test is timing out, I am removing the deadline argument to see if things improve Test Plan: ran locally https://fburl.com/p41ocvrs Reviewed By: venkatacrc Differential Revision: D25067867 fbshipit-source-id: 80065553e0bd9883ea80e70a6748de1012e0d4e3	2020-11-19 23:10:37 -08:00
Jiakai Liu	de284b6d35	[pytorch][codegen] add autograd data model (#48249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48249 Introduced autograd related data models at tools.codegen.api.autograd. Migrated load_derivatives.py to produce the new data models from derivatives.yaml. It has clean mypy-strict result. Changed both gen_autograd_functions.py and gen_variable_type.py to consume the new data model. Added type annotations to gen_autograd_functions.py - it has clean mypy-strict result except for the .gen_autograd import (so haven't added it to the strict config in this PR). To limit the scope of the PR, gen_variable_type.py is not refactored, and the main structure of load_derivatives.py / gen_autograd_functions.py is kept. We only make necessary changes to make it work. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25086561 Pulled By: ljk53 fbshipit-source-id: 1f43ab0931d9814c24683b9a48ca497c5fc3d729	2020-11-19 21:47:05 -08:00
Scott Wolchok	fa41275899	[Pytorch] Weaker memory ordering for c10::intrusive_ptr (#48221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48221 load-acquire, acquire-release increment and decrement. (We need acquire-release increment to make unique() and use_count() reliable.) Note that this doesn't make a difference on x86, but we should expect it to improve things on ARM and ARM64. ghstack-source-id: 117065956 Test Plan: Careful review :) Reviewed By: ezyang Differential Revision: D24708209 fbshipit-source-id: 5e574115eee5c0a65047b638c5f9b1ec0124d04d	2020-11-19 20:59:30 -08:00
Meghan Lele	d6b374956f	[JIT] Resolve `torch.device` in recursive compilation of classes (#47734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47734 Summary This commit allows `torch.device` to be resolved properly when used in class types that are recursively scripted. This is accomplished by augmenting the resolution callback used during recursively class scripting to include the type annotations used on class method declarations. Classes that are not explicitly annotated with `torch.jit.script` are implicitly scripted during the compilation of a function or class method that uses them. One key difference between this method of class type compilation and explicit scripting is that the former uses a resolution callback that can only resolve variables that class methods close over (see `_jit_internal.createResolutionCallbackForClassMethods`). This does not include type annotations and default arguments. This means that builtin types like `torch.Tensor` and `torch.device` cannot be resolved using the resolution callback. This issue does not arise when explicitly scripting classes because the resolution callback for that code path is constructed from scope of the class definition (see `_jit_internal.createResolutionCallbackFromFrame`). `torch.Tensor` and `torch.device` are almost always present in that scope, usually from `import`ing `torch`. Test Plan This commit adds a new unit test to `TestClassType`, `test_recursive_script_builtin_type_resolution`. Fixes This commit closes #47405. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D24995374 Pulled By: SplitInfinity fbshipit-source-id: db68212634cacf81cfaeda8095a1fe5105fa73b7	2020-11-19 20:40:09 -08:00
Yanan Cao	28580d3c0f	Add TorchBind-based Python and TorchScript binding for ProcessGroup (#47907 ) Summary: Add TorchBind-binding for ProcessGroup class. Currently there are a few limitation of TorchBind that prevents us from fully matching existing PyBind-binding of ProcessGroup: - TorchBind doesn't support method overloading. Current PyBind binding uses overloading extensively to provide flexible API, but TorchBind (and TorchScript ClassType behind it) doesn't yet support it. Therefore, we can provide at most one version of API under each name. - TorchBind doesn't support C++ enums yet. This prevents us from making real uses of XXXOptions, which is widely used in many APIs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47907 Reviewed By: wanchaol Differential Revision: D24945814 Pulled By: gmagogsfm fbshipit-source-id: e103d448849ea838c10414068c3e4795db91ab1c	2020-11-19 20:25:56 -08:00
Zhi-Zheng Wu	7828a22094	fix a bug in leakyReLU (#48265 ) Summary: The scale variable needs to be a scalar, otherwise it will report the following error: "RuntimeError: Cannot input a tensor of dimension other than 0 as a scalar argument" Pull Request resolved: https://github.com/pytorch/pytorch/pull/48265 Test Plan: Tested locally and the error disappeared. Reviewed By: zhizhengwu Differential Revision: D25105423 Pulled By: jerryzh168 fbshipit-source-id: 2a0df24cf7e40278a950bffe6e0a9552f99da1d1	2020-11-19 20:15:05 -08:00
James Reed	998c4cac9a	[FX] Add Node.all_input_nodes (#48270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48270 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D25100241 Pulled By: jamesr66a fbshipit-source-id: f742f5a13debebb5be37f7c0045c121f6eaff1d5	2020-11-19 19:53:28 -08:00
Eli Uriegas	aa8aa30a0b	third_party: Update pybind to point to fork (#48117 ) Summary: There are specific patches we need for Python 3.9 compatibilty and that process is currently hung up on separate issues. Let's update to a newer version of our forked pybind to grab the Python 3.9 fixes while we wait for them to be upstreamed Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/48117 Relates to: https://github.com/pybind/pybind11/pull/2657 Full comparison for this update looks like this: `59a2ac2745`...seemethere:v2.6-fb Fixes https://github.com/pytorch/pytorch/issues/47776 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48120 Reviewed By: gchanan Differential Revision: D25030688 Pulled By: seemethere fbshipit-source-id: 10889c813aeaa70ef1298adad5c631e6b5a39d72	2020-11-19 19:30:09 -08:00
Yi Zhang	84d4e9c4fa	enable cuda11.1 and cudnn 8.0.5 in CI (#48242 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/48242 Reviewed By: walterddr Differential Revision: D25108971 Pulled By: malfet fbshipit-source-id: d836690e1d5d33c3395a44a86994a0a4bb381628	2020-11-19 19:27:36 -08:00
Yi Wang	1a6666c967	[Gradient Compression] Add a comment on _orthogonalize. (#48253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48253 Explained why a hand-crafted orthogonalize function is used instead of `torch.qr`. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117132622 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D25088607 fbshipit-source-id: ebc228afcb4737bb8529e7143ea170086730520e	2020-11-19 19:22:04 -08:00
Brian Hirsh	3c936ecd3c	Revert D25056091: migrate export_caffe2_op_to_c10.h macros to the new dispatcher registration API Test Plan: revert-hammer Differential Revision: D25056091 (`0ea4982cf3`) Original commit changeset: 0f647ab9bc5e fbshipit-source-id: e54047b91d82df25460ee00482373c4580f94d50	2020-11-19 19:10:14 -08:00
Brian Hirsh	0ea4982cf3	migrate export_caffe2_op_to_c10.h macros to the new dispatcher registration API (#48097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48097 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25056091 Pulled By: bdhirsh fbshipit-source-id: 0f647ab9bc5e5aee497dac058df492f6e742cfe9	2020-11-19 17:56:56 -08:00
Wang Xu	4b56aef05d	add kl_based_partition (#48197 ) Summary: This is a partition search based on Kernighan-Lin algorithm. First, the graph is partitioned using size_based_partition, then nodes from different partitions are swapped until the cost reaches minimum. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48197 Reviewed By: gcatron Differential Revision: D25097065 Pulled By: scottxu0730 fbshipit-source-id: 3a11286bf4e5a712ab2848b92d0b98cd3d6a89be	2020-11-19 17:38:25 -08:00
Howard Huang	c0723a0abf	Add MessageTypeFlags enum for RPC Messages (#48143 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/47145 Adds a new MessageTypeFlags enum so that checking for certain properties (e.g. isResponse, isRequest) can be done with a BITWISE AND instead of checking for each MessageType enum individually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48143 Reviewed By: mrshenli Differential Revision: D25091008 Pulled By: H-Huang fbshipit-source-id: 56a823747748633c1ef3fa07817ca0f08c7399a8	2020-11-19 15:51:31 -08:00
Rohan Varma	feb6487acf	Dont skip NCCL backend when testing all_reduce_cuda (#48231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48231 Noticed that these tests were being skipped with NCCL backend, but there doesn't appear to be a valid reason to. Enabled these tests and verify that they pass with 500 stress runs. ghstack-source-id: 117085209 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D25079030 fbshipit-source-id: 8204288ffbd387375a1a86fe8c07243cfd855549	2020-11-19 15:26:57 -08:00
Peter Bell	685cd9686f	Refactor CuFFTConfig to not use tensor objects (#46909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46909 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25083884 Pulled By: mruberry fbshipit-source-id: 15f8ec1da1a457811cf118a3adf2941c4b0a6a37	2020-11-19 14:31:51 -08:00
Xiaomeng Yang	2039ff3fbb	[Caffe2] Optimize MishOp on CPU (#48212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48212 Optimize MishOp on CPU Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:activation_ops_test -- "mish" Reviewed By: houseroad Differential Revision: D25071304 fbshipit-source-id: fe94bfab512188d60412d66962983eff4f37bc07	2020-11-19 14:17:27 -08:00
Jiakai Liu	f98ab18445	[pytorch][codegen] move is_abstract property to NativeFunction model (#48252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48252 Moved to a shared place so that gen_variable_type.py can reuse it. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25087808 Pulled By: ljk53 fbshipit-source-id: 1f32e506956fc4eb08734cfde0add47b3e666bd9	2020-11-19 12:30:13 -08:00
skyline75489	9b19880c43	Fix collect_env.py with older version of PyTorch (#48076 ) Summary: Inspired by https://github.com/pytorch/pytorch/issues/47993, this fixes the import error in `collect_env.py` with older version of PyTorch when `torch.version` does not have `hip` property. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48076 Reviewed By: seemethere, xuzhao9 Differential Revision: D25024352 Pulled By: samestep fbshipit-source-id: 7dff9d2ab80b0bd25f9ca035d8660f38419cdeca	2020-11-19 12:18:08 -08:00
Ivan Yashchuk	343b3e5cae	Added linalg.tensorinv (#45969 ) Summary: This PR adds `torch.linalg.tensorinv` for NumPy compatibility. Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45969 Reviewed By: zhangguanheng66 Differential Revision: D25060568 Pulled By: mruberry fbshipit-source-id: 3b145ce64e4bd5021bc229f5ffdd791c572673a0	2020-11-19 11:54:50 -08:00
Taylor Robie	678fe9f077	Add blas compare example (#47058 ) Summary: Adds a standalone script which can be used to test different BLAS libraries. Right now I've deliberately kept it limited (only a couple BLAS libs and only GEMM and GEMV). It's easy enough to expand later. CC ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/47058 Reviewed By: zhangguanheng66 Differential Revision: D25078946 Pulled By: robieta fbshipit-source-id: b5f7f7ec289d59c16c5370b7a6636c10a496b3ac	2020-11-19 11:27:27 -08:00
kiyosora	008f840e7a	Implement in-place method torch.cumsum_ and torch.cumprod_ (#47651 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47193 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47651 Reviewed By: zou3519 Differential Revision: D24992438 Pulled By: ezyang fbshipit-source-id: c38bea55f4af1fc92be780eaa8e1d462316e6192	2020-11-19 11:20:12 -08:00
Martin Yuan	fe6bb2d287	[PyTorch] Declare the instantiation of PackedConvWeightsQnnp<2>::prepack (#48256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48256 `PackedConvWeightsQnnp<2>::prepack` is referenced by both `quantized::conv_prepack` and fbgemm.cpp. Since `quantized::conv_prepack` is in the same compilation unit as the class template it was fine. However, if we make operator registration selective, the reference from `quantized::conv_prepack` is gone. The reference from fbgemm.cpp is in another compilation unit and there is link error. To avoid the link error, instantiate the symbol in the cpp file. It should also work to move all implementations to .h file, but to keep the existing code structure and to avoid (small chance of) code bloat, the implementations are kept as is. ghstack-source-id: 117123564 Test Plan: CI buck build //fbandroid/apps/oculus/assistant:assistant_arm64_debug Reviewed By: dhruvbird Differential Revision: D24941989 fbshipit-source-id: adc96d0e55c89529fb71a43352aa68a1088a62a2	2020-11-19 10:54:58 -08:00
Eli Uriegas	1dd4f4334c	docker: Make CUDA_VERSION configurable (#48199 ) Summary: makes CUDA_VERSION configurable for the docker images: make CUDA_VERSION=10.2 CUDNN_VERSION=7 official-image Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/48199 Reviewed By: xuzhao9, janeyx99 Differential Revision: D25064256 Pulled By: seemethere fbshipit-source-id: 25f52185097be647d11b5324f9f97cd41cdad75b	2020-11-19 10:06:45 -08:00
Sam Estep	a7153a89a5	Exclude docs/cpp/src from flake8 (#48201 ) Summary: Currently when I run `flake8` locally I get [a bunch of extraneous warnings](https://pastebin.com/DMQevCtC) because the docs build puts a `pytorch-sphinx-theme` dir into `docs/cpp/src`. Those warnings don't show up in CI because the CI lint job doesn't generate that dir. This PR adds that to the Flake8 `exclude` list, similar to how `docs/src` is already present in that list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48201 Reviewed By: walterddr, zhangguanheng66 Differential Revision: D25069130 Pulled By: samestep fbshipit-source-id: 2fda9e813f54092398525b7fc97d0a8f7f835ca6	2020-11-19 10:00:14 -08:00
mattip	975ff6624b	DOC: backport doc build fix from 1.7, tweak link (#47349 ) Summary: xref gh-46927 to the 1.7 release branch This backports a fix to the script to push docs to pytorch/pytorch.github.io. Specifically, it pushes to the correct directory when a tag is created here. This issue became apparent in the 1.7 release cycle and should be backported to here. Along the way, fix the canonical link to the pytorch/audio documentation now that they use subdirectories for the versions, xref pytorch/audio#992. This saves a redirect. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47349 Reviewed By: zhangguanheng66 Differential Revision: D25073752 Pulled By: seemethere fbshipit-source-id: c778c94a05f1c3e916217bb184f69107e7d2c098	2020-11-19 09:51:18 -08:00
Erjia Guan	c542614e53	Implement C++ ModuleDict (#47707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47707 Fixes #45896 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24872641 Pulled By: ejguan fbshipit-source-id: 3d1dc9148ba3bcf66ab9c44ddb5774060bbc365d	2020-11-19 08:07:51 -08:00
Sam Estep	c4a6df989c	Pass any verbosity from test/run_test.py to pytest (#48204 ) Summary: Previously it was only possible to pass up to one [verbosity level](https://adamj.eu/tech/2019/10/03/my-most-used-pytest-commandline-flags/) to `pytest` when running a test via `test/run_test.py`. Presumably that behavior was never added because `unittest` [doesn't do anything extra](https://stackoverflow.com/a/1322648/5044950) when given more than one `--verbose` flag. This PR removes that limitation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48204 Test Plan: Make a dummy `pytest`-style file `test/test_foo.py`: ```py def test_bar(): assert 'hello\n' * 10 == 'hello\n' * 20 ``` Then add `'test_foo'` to both `TESTS` and `USE_PYTEST_LIST` in `test/run_test.py`, and run this command: ```sh test/run_test.py -vvi test_foo ``` Reviewed By: walterddr Differential Revision: D25069147 Pulled By: samestep fbshipit-source-id: 2765ee78d18cc84ea0e262520838993f9e9ee04f	2020-11-19 08:06:26 -08:00
Richard Zou	370310bedb	batched grad for binary_cross_entropy, symeig (#48057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48057 This PR fixes batched grad computation for: - binary_cross_entropy (i.e., vmap through binary_cross_entropy_double_backward) - symeig (i.e. vmap through symeig_backward) It was previously impossible to vmap through those functions because they use in-place operations in a vmap-incompatible way. See note at `233192be73/aten/src/ATen/BatchedFallback.cpp (L117-L122)` for what it means for an in-place operation to be vmap-incompatible. This PR adds a check: if the in-place operations in e.g. symeig are vmap-incompatible and we are inside of a vmap, then we do the out-of-place variant of the operation. Ditto for binary_cross_entropy. This is to avoid code duplication: the alternative would be to register the backward formula as an operator and change just those lines to be out-of-place! This PR also adds some general guidelines for what to do if an in-place operation is vmap-incompatible. General guidelines ------------------ If an in-place operation used in a backward formula is vmap-incompatible, then as developers we have the following options: - If the in-place operation directly followed the creation of a tensor with a factory function like at::zeros(...), we should replace the factory with a corresponding grad.new_zeros(...) call. The grad.new_zeros(...) call propagates the batch dims to the resulting tensor. For example: Before: at::zeros(input.sizes(), grad.options()).copy_(grad) After: grad.new_zeros(input.sizes()).copy_(grad) - If the in-place operation followed some sequence of operations, if the we want to be able to vmap over the backward formula as-is (this is usually the case for simple (<15loc) backward formulas), then use inplace_is_vmap_compatible to guard the operation. For example: c = a * b Before: c.mul_(grad) After: c = inplace_is_vmap_compatible(c, grad) ? c.mul_(grad) : c * grad - If we don't want to vmap directly over the backward formula (e.g., if the backward formula is too complicated or has a lot of vmap-incompatible operations, then register the backward formula as an operator and eventually write a batching rule for it. Test Plan --------- New tests Test Plan: Imported from OSS Reviewed By: zhangguanheng66 Differential Revision: D25069525 Pulled By: zou3519 fbshipit-source-id: e0dfeb5a812f35b7579fc6ecf7252bf31ce0d790	2020-11-19 07:59:02 -08:00
Yanan Cao	db767b7862	Add c10d new frontend to build (#48146 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/48148 Add TorchBind-based Python and TorchScript binding for ProcessGroup * https://github.com/pytorch/pytorch/issues/48147 Add process group creation logic in c10d new frontend * https://github.com/pytorch/pytorch/issues/48146 Add c10d new frontend to build Pull Request resolved: https://github.com/pytorch/pytorch/pull/48146 Reviewed By: wanchaol Differential Revision: D25073969 Pulled By: gmagogsfm fbshipit-source-id: d111649144a4de9f380e5f7a2ad936860de4bd7b	2020-11-19 04:47:02 -08:00
Yi Wang	daff3a81a1	[Gradient Compression] PowerSGD comm hook (#48060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48060 Implement a PowerSGD variant that applies to a batched flattened tensor with zero paddings. This version does not require handling 1D tensors and multi-dimenionsal tensors in the input separately, and hence it does not need to create two asyncrhonous future chains. Potential optimizations: 1) Consider FP16 compression throughout PowerSGD. 2) Warm start and save one matrix multiplication per ieration. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117105938 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl Reviewed By: jiayisuse Differential Revision: D24843692 fbshipit-source-id: f44200b1fd6e12e829fc543d21ab7ae086769561	2020-11-19 02:59:11 -08:00
Nikolay Korovaiko	0d8ddb5ec2	Make softmax and log_softmax handle negative dims, add tests (#48156 ) Summary: Make softmax and log_softmax handle negative dims, add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/48156 Reviewed By: bertmaher Differential Revision: D25059788 Pulled By: Krovatkin fbshipit-source-id: 985963e7df400857c9774660c76be7d56201a1ad	2020-11-19 01:38:14 -08:00
Xin Guan	46d846f5bb	T78750158 Support varying size input in numeric suite at 10/30/2020, 3:55:01 PM (#47391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47391 Current Numeric Suite will fail if it's collecting for multiple inputs and each input is of not same size. This fix adds support for varying size input in numeric suite. ghstack-source-id: 117058862 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_shadow_logger' buck test mode/dev caffe2/test:quantization -- 'test_output_logger' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_submodule_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_functional_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_functional_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_dynami Reviewed By: hx89 Differential Revision: D24662271 fbshipit-source-id: 6908169ee448cbb8f33beedbd26104633632896a	2020-11-18 23:57:41 -08:00
mfkasim91	8819bad86c	Implement igammac (3rd PR) (#48171 ) Summary: Related: https://github.com/pytorch/pytorch/issues/46183 (torch.igamma) This is the regularized upper incomplete gamma function. This is supposed to be exactly the same as https://github.com/pytorch/pytorch/issues/47463, but after rebasing the `viable/strict` branch. cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/48171 Reviewed By: zhangguanheng66 Differential Revision: D25060107 Pulled By: mruberry fbshipit-source-id: 89780dea21dbb2141cbc4f7f18192cb78a769b17	2020-11-18 23:44:32 -08:00
Hao Lu	c5dae335e4	[PT][StaticRuntime] Move prim op impl to ops.cpp (#48210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48210 - Move prim op implementation from `ProcessedNode::run` to `getNativeOperation` - Add out variant for `prim::listConstruct` Test Plan: ``` buck test //caffe2/test:static_runtime buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest buck test //caffe2/caffe2/fb/predictor:pytorch_predictor_test buck run mode/dev //caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench -- \ --scripted_model=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge/traced_precomputation.pt \ --pt_inputs=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge/container_precomputation_bs1.pt \ --iters=1 --warmup_iters=1 --num_threads=1 --pt_enable_static_runtime=true \ --pt_cleanup_activations=true --pt_enable_out_variant=true ``` Reviewed By: ajyu Differential Revision: D24748947 fbshipit-source-id: 12caeeae87b69e60505a6cea31786bd96f5c8684	2020-11-18 23:07:39 -08:00
Bert Maher	6da26fe79b	[te] Fix pow (#48213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48213 it was completely broken unless rhs was a constant. Test Plan: new unit test in test_jit_fuser_te.py Reviewed By: eellison Differential Revision: D25071639 fbshipit-source-id: ef1010a9fd551db646b83adfaa961648a5c388ae	2020-11-18 22:44:16 -08:00
Jerry Zhang	ed57f804fa	[quant][refactor] Move some util functions from torch/quantization/fx/utils.py to torch/quantization/utils.py (#48107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48107 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D25026495 fbshipit-source-id: 3634b6b95a18670232600874b1e593180ea9f44c	2020-11-18 22:32:19 -08:00
James Reed	4316bf98f5	[FX] Refactor unique name handling (#48205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48205 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D25068934 Pulled By: jamesr66a fbshipit-source-id: 04e02bbfd2cc9a8c3b963d9afdf40bac065c319b	2020-11-18 21:56:52 -08:00
Scott Wolchok	bef460a803	[PyTorch] Return raw ptr from ThreadLocalDebugInfo::get() (#47796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47796 `ThreadLocalDebugInfo::get()` is a hot function. For example, it is called by `DefaultCPUAllocator::allocate()`. Most callers do not even bother to keep the returned `shared_ptr` around, proving that they have no lifetime issues currently. For the rest, it appears that the only way that the returned pointer could become invalid is if they then called a function that swapped out `ThreadLocalDebugInfo` using `ThreadLocalStateGuard`. There are very few such paths, and it doesn't look like any current callers of `ThreadLocalDebugInfo::get()` needed a `shared_ptr` at all. ghstack-source-id: 116979577 Test Plan: 1) reviewers to double-check audit of safety 2) run framework overhead benchmarks Reviewed By: dzhulgakov Differential Revision: D24902978 fbshipit-source-id: d684737cc2568534cac7cd3fb8d623b971c2fd28	2020-11-18 20:37:17 -08:00
Jerry Zhang	5883e0b0e0	[quant][fix][ez] Fix quant_type classification for fp16, fp16 (#48073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48073 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D25011799 fbshipit-source-id: a12f645d6be1c607898633225b02617283d37df1	2020-11-18 20:07:54 -08:00
Tao Xu	773d1f3208	[Person Seg] Compress the person seg model (#48008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48008 ### Motivation The idea is to quantize the weights during model exporting and dequantize them when performing setstate in runtime. To replicate exactly what caffe2 did, only 10 conv layers were quantized. Since the code here is restricted to the unet model, I created a custom prepacking context to do the graph rewriting and registering custom ops. To run on iOS/MacOS, we need to link `unet_metal_prepack` explicitly. - buck build //xplat/caffe2/fb/custom_ops/unet_metal_prepack:unet_metal_prepackApple - buck build //xplat/caffe2/fb/custom_ops/unet_metal_prepack:unet_metal_prepackAppleMac On the server side, the `unet_metal_prepack.cpp` needs to be compiled into the `aten_cpu` in order to do the graph-rewrite via optimize_for_mobile. However, since we don't want to ship it to the production, some local hacks were made to make this happen. More details can be found in the following diffs. ### Results -rw-r--r-- 1 taox staff 1.1M Nov 10 22:15 seg_init_net.pb -rw-r--r-- 1 taox staff 1.1M Nov 10 22:15 seg_predict_net.pb Note since we quantize the weights, some precision loss are expected, but overall good. ### ARD - Person seg - v229 - Hair seg - v105 ghstack-source-id: 117019547 Test Plan: ### Video eval results from macos {F345324969} Differential Revision: D24881316 fbshipit-source-id: b67811d6d06de82130f4c22392cc961c9dda7559	2020-11-18 20:01:51 -08:00
Edward Yang	a97d059614	Get TestTorch.test_empty_meta working again (#48113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48113 Fix is simple: just treat Meta as a backend covered by AutogradOther. This semantically makes sense, since meta kernels are just like regular CPU/CUDA kernels, they just don't do any compute. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zhangguanheng66 Differential Revision: D25056641 Pulled By: ezyang fbshipit-source-id: 7b68911982352b3e0ee8616b38cd9c70bd58a740	2020-11-18 19:50:27 -08:00
Scott Wolchok	4c9eb57914	[PyTorch] Narrow Device to 2 bytes by narrowing DeviceType and DeviceIndex (#47023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47023 DeviceType pretty clearly only needs 1 byte. DeviceIndex only needs 1 byte given that machines don't have anywhere near 255 GPUs in them as far as I know. ghstack-source-id: 116901430 Test Plan: Existing tests, added assertion to catch if my assumption about DeviceIndex is incorrect Reviewed By: dzhulgakov Differential Revision: D24605460 fbshipit-source-id: 7c9a89027fcf8eebd623b7cdbf6302162c981cd2	2020-11-18 19:39:40 -08:00
Jerry Zhang	72918e475e	[quant] FakeQuantize inherit from FakeQuantizeBase (#48072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48072 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D25011074 fbshipit-source-id: 260f4d39299bc148b65c21e67b571dfa1d0fe2ad	2020-11-18 19:14:20 -08:00
Nikita Shulga	efeb988518	Suppress "ioctl points to uninitialised" check (#48187 ) Summary: libcuda.so from CUDA-11.1 makes ioctl() that valgrind's memcheck tool considers dangerous Instruct valgrind to suppress that check Fixes false positives reported in https://app.circleci.com/pipelines/github/pytorch/pytorch/240774/workflows/d4c66de8-f13b-47a2-ae62-2ec1bbe0664b/jobs/9026496 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48187 Reviewed By: janeyx99 Differential Revision: D25059850 Pulled By: malfet fbshipit-source-id: 982df5860524482b0fcb2bfc6bb490fb06694cf6	2020-11-18 18:45:46 -08:00
Jerry Zhang	576fa09157	[quant][fix] Fix quant type classification for float_qparam qconfig (#48069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48069 also renamed float_qparam_dynamic_qconfig to float_qparam_weight_only_qconfig It's not used in user code yet so we only need to update the tests. Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D25010175 fbshipit-source-id: caa3eaa5358a8bc5c808bf5f64e6ebff3e0b61e8	2020-11-18 18:22:08 -08:00
Kimish Patel	f0f8b97d19	Introducing winograd transformed fp16 nnpack to PT for unet 106 (#47925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47925 ghstack-source-id: 117004847 Test Plan: buck run caffe2/fb/custom_ops/unet_106_pt:unet_106_rewrite buck run caffe2/fb/custom_ops/unet_106_pt:tests Reviewed By: dreiss Differential Revision: D24822418 fbshipit-source-id: 0c0bc0772e4c878e979ee3d2078105377e220c43	2020-11-18 18:05:53 -08:00
Scott Wolchok	383abf1f0c	[PyTorch] Make RecordFunction::active private (#47549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47549 In preparation for moving state onto the heap. ghstack-source-id: 117027862 Test Plan: CI Reviewed By: ilia-cher Differential Revision: D24812214 fbshipit-source-id: 1455c2782b66f6a59c4d45ba58e1c4c92402a323	2020-11-18 17:58:54 -08:00
Scott Wolchok	1bafff2366	[PyTorch][JIT] Skip unnecessary refcounting in TensorType::merge (#47959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47959 Taking a shared_ptr by value incurs refcounting overhead and should only be done if the callee needs to take ownership. Otherwise, `const T&` is more efficient. (Specifically, you will have to do an atomic decrement when the argument is destroyed and probably an atomic increment as well. Passing by `const T&` also takes one less register than passing `std::shared_ptr<T>`, but that's less important.) This diff fixes just this one function, but I'd be happy to audit & fix this whole file in future diffs. Thoughts? ghstack-source-id: 116914899 Test Plan: build ATen-cpu Reviewed By: Krovatkin Differential Revision: D24970954 fbshipit-source-id: 6bdb4b710a94b8baf4ad63418fb38136134e0ef3	2020-11-18 17:49:16 -08:00
Max Ouellet	0f89be616a	Removing non-thread-safe log statement from ReinitializeTensor (#48185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48185 In a scenario where we have Caffe2 wrapped into a dynamic library, we were running into the memory corruption crash at program termination: "corrupted size vs. prev_size in fastbins" Turns out the crash occurs in glog's logging.cc, which is not thread-safe and has to initialize a static hostname string when flushing. If this ends up happening on multiple threads simultaneously, this can lead to a memory corruption. ``` ==1533667== Invalid free() / delete / delete[] / realloc() ==1533667== at 0xA3976BB: operator delete(void, unsigned long) (vg_replace_malloc.c:595) ==1533667== by 0x37E36AE: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() (basic_string.h:647) ==1533667== by 0xAD87 (`97b9712aed`)F6B: __run_exit_handlers (in /usr/lib64/libc-2.28.so) ==1533667== by 0xAD8809 (`153e2e96d4`)F: exit (in /usr/lib64/libc-2.28.so) ==1533667== by 0xAD71799: (below main) (in /usr/lib64/libc-2.28.so) ==1533667== Address 0x165cd720 is 0 bytes inside a block of size 31 free'd ==1533667== at 0xA3976BB: operator delete(void, unsigned long) (vg_replace_malloc.c:595) ==1533667== by 0x37E36AE: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() (basic_string.h:647) ==1533667== by 0xAD87 (`97b9712aed`)F6B: __run_exit_handlers (in /usr/lib64/libc-2.28.so) ==1533667== by 0xAD8809 (`153e2e96d4`)F: exit (in /usr/lib64/libc-2.28.so) ==1533667== by 0xAD71799: (below main) (in /usr/lib64/libc-2.28.so) ==1533667== Block was alloc'd at ==1533667== at 0xA39641F: operator new(unsigned long) (vg_replace_malloc.c:344) ==1533667== by 0x37F4E18: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned long, unsigned long, char const, unsigned long) (basic_string.tcc:317) ==1533667== by 0x37F4F2E: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const, unsigned long) (basic_string.tcc:466) ==1533667== by 0x5170344: GetHostName(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (logging.cc:227) ==1533667== by 0x51702D4 (`fc7f026980`): google::LogDestination::hostname[abi:cxx11]() (logging.cc:555) ==1533667== by 0x5173789: google::(anonymous namespace)::LogFileObject::Write(bool, long, char const, int) (logging.cc:1072) ==1533667== by 0x51746DF: google::LogDestination::LogToAllLogfiles(int, long, char const, unsigned long) (logging.cc:773) ==1533667== by 0x5170BDC: google::LogMessage::SendToLog() (logging.cc:1386) ==1533667== by 0x5171236: google::LogMessage::Flush() (logging.cc:1305) ==1533667== by 0x517114D: google::LogMessage::~LogMessage() (logging.cc:1264) ==1533667== by 0x108DC840: caffe2::ReinitializeTensor(caffe2::Tensor, c10::ArrayRef<long>, c10::TensorOptions) (tensor.cc:0) ==1533667== by 0x103BBED0: caffe2::int8::Int8GivenTensorFillOp::RunOnDevice() (int8_given_tensor_fill_op.h:29) ==1533667== ``` There doesn't seem to be an obvious easy solution here. The logging API being used by c10 is fundamentally not thread-safe, at least when it uses glog. Glog does have a threadsafe API (raw_logging), but this doesn't seem to be used by c10 right now. I suspect other callers are not running into this crash because: - They have other libraries using glog in their module, so the static variable in glog gets initialized before getting into a race condition - They don't use int8 network in a glog context, thus avoiding this problematic log statement An alternative fix would be to correctly initialize the dtype of the int8 tensor, which is currently always uninitialized, making the log statement always trigger for int8 networks. Initializing the int8 tensor correctly in tensor_int8.h is proving to be challenging though, at least without knowledge of Caffe2's codebase. And even then, it wouldn't fix the issue for all use cases. Test Plan: Ran my app with valgrind, I no longer get the crash and valgrind doesn't complain about a memory corruption anymore Reviewed By: thyu, qizzzh Differential Revision: D25040725 fbshipit-source-id: 1392a97ccf9b4c9ade1ea713610ee44a1578ae7d	2020-11-18 17:42:22 -08:00
jiej	4360486346	pass strict_fuser_check for recursive fusion (#47221 ) Summary: We forgot to pass `strict_fuser_check` recursively to nested GraphFuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47221 Reviewed By: zhangguanheng66 Differential Revision: D25060095 Pulled By: Krovatkin fbshipit-source-id: 31fe79c3bc080b637fce9aacc562d60708223321	2020-11-18 16:57:38 -08:00
Mike Ruberry	ea1e78a0c5	Revert D24853669: [pytorch][PR] Migrate `eig` from the TH to Aten (CUDA) Test Plan: revert-hammer Differential Revision: D24853669 (`866f8591be`) Original commit changeset: a513242dc7f4 fbshipit-source-id: a0c8c424b61b1e627d9102de6b4c6d0717a6c06d	2020-11-18 16:53:18 -08:00
Peter Bell	2fbd70d336	fft: Generalize fill with conjugate symmetry and use complex dtypes (#46908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46908 Generalize to non-contiguous dimensions. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25048504 Pulled By: mruberry fbshipit-source-id: a82545de17fc207fefea7fbd88d03042a3ca41fe	2020-11-18 15:39:14 -08:00
Taylor Robie	0639387ff1	move Tensor comparisons back to C (#48018 ) Summary: It seems that the machinery to handle comparison method in C rather than Python already exists, unless I'm missing something. (There is a wrapper for `TypeError_to_NotImplemented_`, and Python code gen handles `__torch_function__` which are the two things `_wrap_type_error_to_not_implemented` is doing) The performance change is quite stark: ``` import torch from torch.utils.benchmark import Timer global_dict = { "x": torch.ones((2, 2)), "y_scalar": torch.ones((1,)), "y_tensor": torch.ones((2, 1)), } for stmt in ("x == 1", "x == y_scalar", "x == y_tensor"): print(Timer(stmt, globals=global_dict).blocked_autorange(min_run_time=5), "\n") ``` ### Before: ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7f3d1289dc10> x == 1 Median: 12.86 us IQR: 0.65 us (12.55 to 13.20) 387 measurements, 1000 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7f3d1289d1d0> x == y_scalar Median: 6.03 us IQR: 0.33 us (5.91 to 6.24) 820 measurements, 1000 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7f3d2b9e2050> x == y_tensor Median: 6.34 us IQR: 0.33 us (6.16 to 6.49) 790 measurements, 1000 runs per measurement, 1 thread ``` ### After: ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7fbdba2a16d0> x == 1 Median: 6.88 us IQR: 0.40 us (6.74 to 7.14) 716 measurements, 1000 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7fbdd2e07ed0> x == y_scalar Median: 2.98 us IQR: 0.19 us (2.89 to 3.08) 167 measurements, 10000 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7fbdd33e4510> x == y_tensor Median: 3.03 us IQR: 0.13 us (2.97 to 3.10) 154 measurements, 10000 runs per measurement, 1 thread ``` There's still a fair bit of work left. Equivalent NumPy is about 6x faster than the new overhead, and PyTorch 0.4 is about 1.25 us across the board. (No scalar cliff.) But it's a start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48018 Reviewed By: gchanan Differential Revision: D25026257 Pulled By: robieta fbshipit-source-id: 093b06a1277df25b4b7cc0d4e585b558937b10a1	2020-11-18 15:25:41 -08:00
Linbin Yu	ed4dd86567	move aten::round to lite interpreter (#45931 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45931 move aten::round to lite interpreter. It's needed by TTS Test Plan: build Reviewed By: zhizhengwu Differential Revision: D24149089 fbshipit-source-id: c8e292598dd04d7f0d40f121cb861f91d359e957	2020-11-18 12:30:32 -08:00
Jiakai Liu	a36e646878	[pytorch][codegen] simplify python signature creation logic (#47977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47977 Avoid calling CppSignatureGroup api - python signature shouldn't depend on cpp signature. Still use cpp.group_arguments() to group TensorOptions. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24976334 Pulled By: ljk53 fbshipit-source-id: 5df5a7bbfd2b8cb460153e5bea4d91e65f716390	2020-11-18 12:26:50 -08:00
Jiakai Liu	5eaf8562cd	[pytorch][codegen] simplify dunder method check in gen_python_functions.py (#47976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47976 Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24976273 Pulled By: ljk53 fbshipit-source-id: 6f8f20d18db20c3115808bfac0a8b8ad83dcf64c	2020-11-18 12:26:47 -08:00
Jiakai Liu	5243456728	[pytorch][codegen] remove dead code in gen_variable_type.py (#47975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47975 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24976274 Pulled By: ljk53 fbshipit-source-id: 8542471ee30f26592aad949fc17eef87a47df024	2020-11-18 12:26:44 -08:00
Bert Maher	07657b6001	[tensorexpr] Switch cpp tests to pure gtest (#48160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48160 We no longer use the custom c++ test infra anyways, so move to pure gtest. Fixes #45703 ghstack-source-id: 116977283 Test Plan: `buck test //caffe2/test/cpp/tensorexpr` Reviewed By: navahgar, nickgg Differential Revision: D25046618 fbshipit-source-id: da34183d87465f410379048148c28e1623618553	2020-11-18 12:23:34 -08:00
Bert Maher	464d23e6b4	[te][benchmark] Add more optimized versions of gemm (#48159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48159 Test Plan: Imported from OSS Reviewed By: Chillee, ngimel Differential Revision: D25059742 Pulled By: bertmaher fbshipit-source-id: f197347f739c5bd2a4182c59ebf4642000c3dd55	2020-11-18 12:21:08 -08:00
Bert Maher	8a996dd139	[te] Make BUILD_TENSOREXPR_BENCHMARK a real CMake option (#48158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48158 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D25059877 Pulled By: bertmaher fbshipit-source-id: a98b6c18a91b4fe89d12bf5f7ead604e3cc0c8b0	2020-11-18 12:19:14 -08:00
Antonio Cuni	866f8591be	Migrate `eig` from the TH to Aten (CUDA) (#44105 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24553 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44105 Reviewed By: heitorschueroff Differential Revision: D24853669 Pulled By: mruberry fbshipit-source-id: a513242dc7f49f55dbc6046c18d8a9d9aa2aaf8d	2020-11-18 12:10:18 -08:00
Nikita Shulga	8af9f2cc23	Revert D24924736: [pytorch][PR] Hipify revamp Test Plan: revert-hammer Differential Revision: D24924736 (`10b490a3e0`) Original commit changeset: 4af42b8ff4f2 fbshipit-source-id: 7f8f90d55d8a69a2890ec73622fcea559189e381	2020-11-18 11:48:30 -08:00
kshitij12345	68a3a3f3b5	Add `torch.swapdims` and `torch.swapaxes` (#46041 ) Summary: Reference https://github.com/pytorch/pytorch/issues/38349 Delegates to `torch.transpose` (not sure what is the best way to alias) TODO: * [x] Add test * [x] Add documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/46041 Reviewed By: gchanan Differential Revision: D25022816 Pulled By: mruberry fbshipit-source-id: c80223d081cef84f523ef9b23fbedeb2f8c1efc5	2020-11-18 11:35:53 -08:00
Mikhail Zolotukhin	d256e38823	[JIT] Pass TypePtr by reference in Argument::type() and Type::isSubtypeOfExt(). (#48061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48061 This results in a ~6% improvement on DeepAndWide model and would improve other models as well. Before the change: ``` 393[ms] 458[ms] 413[ms] 390[ms] 430[ms] 399[ms] 426[ms] 392[ms] 428[ms] 399[ms] ``` After the change: ``` 396[ms] 375[ms] 396[ms] 392[ms] 370[ms] 402[ms] 395[ms] 409[ms] 366[ms] 388[ms] ``` Differential Revision: D25006357 Test Plan: Imported from OSS Reviewed By: suo Pulled By: ZolotukhinM fbshipit-source-id: c9cdc6354c42962b14207db31cf2580a4e2430b1	2020-11-18 11:29:46 -08:00
Kurt Mohler	df88cc3f7f	Document that `remainder` does not support complex inputs (#48024 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34266 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48024 Reviewed By: ngimel Differential Revision: D25028700 Pulled By: mruberry fbshipit-source-id: 6d88c7d0930283455deb51d70708cc4919eeca55	2020-11-18 11:21:23 -08:00
mx-iao	0387f2a6fa	Fix default value of `num_replicas` in DistributedSampler docstring (#48135 ) Summary: Change default value of `num_replicas` from `rank` to `world_size` in DistributedSampler docstring. Addresses https://github.com/pytorch/pytorch/issues/48055 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48135 Reviewed By: gchanan Differential Revision: D25045328 Pulled By: rohan-varma fbshipit-source-id: 6f84f7bb69087d8dae931cda51891b3cb1894306	2020-11-18 11:18:40 -08:00
Rohan Varma	140e946fec	Disable distributed collectives profiling tests (#48129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48129 It looks like all test failures in distributed_test have to do with profiling, so disabling them in this PR (by setting `expect_event=False` always), so that the distributed profiling tests don't run. Created https://github.com/pytorch/pytorch/issues/48127 to track the fix. Will verify with CI all that re-enabling distributed tests passes as expected. ghstack-source-id: 116938304 Test Plan: CI Reviewed By: mrshenli Differential Revision: D25034888 fbshipit-source-id: c10bad3ca2425a2f2cde82232001dafcca152d1c	2020-11-18 11:12:09 -08:00
Howard Huang	a6898cb5f4	Small documentation changes for RRef and Dist Autograd (#48123 ) Summary: Small wording changes and polishing documentation for: https://pytorch.org/docs/master/rpc/rref.html https://pytorch.org/docs/master/rpc/distributed_autograd.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/48123 Reviewed By: zhangguanheng66 Differential Revision: D25059320 Pulled By: H-Huang fbshipit-source-id: 7a0be56f062de06483b3bd3a5d617234101862ba	2020-11-18 10:57:59 -08:00
Ivan Yashchuk	81b1673a21	Enable complex tests that depend on batched matmul on CUDA (#47910 ) Summary: Now when https://github.com/pytorch/pytorch/pull/42553 is merged we can delete a bit of code from the tests and enable some of the skipped complex tests. Unfortunately, `test_pinverse_complex_xfailed` and `test_symeig_complex_xfailed` had bugs and it wasn't caught automatically that these tests xpass. Need to be careful next time with `unittest.expectedFailure`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47910 Reviewed By: zhangguanheng66 Differential Revision: D25052130 Pulled By: mruberry fbshipit-source-id: 29512995c024b882f9cb78b7bede77733d5762d0	2020-11-18 10:44:47 -08:00
Gao, Xiang	3ca4c656de	Install magma on CUDA 11.1 (#48164 ) Summary: cc: xwang233 janeyx99 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48164 Reviewed By: malfet, zhangguanheng66 Differential Revision: D25058068 Pulled By: janeyx99 fbshipit-source-id: ab136ba60e5dda6a2eb7ac76548875e1df75a242	2020-11-18 10:12:08 -08:00
Jithun Nair	10b490a3e0	Hipify revamp (#45451 ) Summary: This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself. Correspondingly, changes are made to `cpp_extension.py` to match these improvements. The list of improvements to hipify is as follows: 1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda". 2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file. 3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use. 4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created. 5. Update header rewrite logic so that it accounts for the previous improvement. 6. Update header rewrite logic so it respects the rules for finding header files depending on whether `""` or `<>` is used. 7. Return a dictionary of mappings of original file paths to hipified file paths from `hipify` function. 8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed. 9. Update `cuda_to_hip_mappings.py` to account for the ROCm component subdirectories inside `/opt/rocm/include`. This also results in cleanup of the `Caffe2_HIP_INCLUDE` path to remove unnecessary additions to the include path. The list of changes to `cpp_extension.py` is as follows: 1. Call `hipify` when building a CUDAExtension for ROCm. 2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list) 3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically cc jeffdaily sunway513 hgaspar lcskrishna ashishfarmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/45451 Reviewed By: ezyang Differential Revision: D24924736 Pulled By: malfet fbshipit-source-id: 4af42b8ff4f21c3782dedb8719b8f9f86b34bd2d	2020-11-18 08:37:49 -08:00
Nikita Shulga	1454cbf087	Make numpy optional dependency for torch.cuda.amp (#48154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48154 Test Plan: Uninstall `numpy` and try to importing `torch` Discovered while working on https://github.com/pytorch/pytorch/issues/48145 Reviewed By: walterddr Differential Revision: D25046307 Pulled By: malfet fbshipit-source-id: c1171a49e03bdc40e8dc1d65928c6c12626e33db	2020-11-18 08:31:44 -08:00
Guanheng Zhang	e2b4c63dd9	Enable the faster combined weight branch in MHA when query/key/value is same object with nan (#48126 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47979 For MHA module, it is preferred to use the combined weight branch as much as possible when query/key/value are same (in case of same values by `torch.equal` or exactly same object by `is` ops). This PR will enable the faster branch when a single object with `nan` is passed to MHA. For the background knowledge ``` import torch a = torch.tensor([float('NaN'), 1, float('NaN'), 2, 3]) print(a is a) # True print(torch.equal(a, a)) # False ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48126 Reviewed By: gchanan Differential Revision: D25042082 Pulled By: zhangguanheng66 fbshipit-source-id: 6bb17a520e176ddbb326ddf30ee091a84fcbbf27	2020-11-18 08:24:41 -08:00
Nikita Shulga	9ead558899	Add max supported SM for nvrtc-11.0 (#48151 ) Summary: Should fix the regression when nvrtc from CUDA-11.0 is used on the system with RTX3080 Addresses issue described in https://github.com/pytorch/pytorch/issues/47669#issuecomment-725073808 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48151 Reviewed By: ngimel Differential Revision: D25043899 Pulled By: malfet fbshipit-source-id: 998ded59387e3971c2c1a5df4af595630515a72e	2020-11-18 08:17:28 -08:00
Jeff Daily	21c823970e	[ROCm] remove sccache wrappers post build (#47947 ) Summary: For ROCm, the CI images serve both the CI jobs as well as public releases. Without removing the sccache wrappers, end users are forced to use sccache. Our users have encountered sccache bugs when using our PyTorch images, so we choose to remove them after the CI build completes. Further, runtime compilation of MIOpen kernels still experiences errors due to sccache. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47947 Reviewed By: gchanan Differential Revision: D24994031 Pulled By: malfet fbshipit-source-id: 65c57ae98e28fc0ce79f754b792d504148c7fcd6	2020-11-18 08:09:15 -08:00
Yi Zhang	98722ab8a7	There should be a newline between BUILD WITH CUDA and NVTX (#48048 ) Summary: When you do want to insert a `<br />` break tag using Markdown, you end a line with two or more spaces, then type return. From https://stackoverflow.com/questions/33191744/how-to-add-new-line-in-markdown-presentation/33191810 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48048 Reviewed By: gchanan Differential Revision: D25003623 Pulled By: walterddr fbshipit-source-id: ab5f7267ae936f6f006b4afa43254afa690ef7f4	2020-11-18 08:00:05 -08:00
Heitor Schueroff	2ff748a680	Move kthvalue scalar test to separate method for XLA (#48042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48042 Moving scalar test to a separate method so the XLA team can continue to test for the other cases without failing. Requested here https://github.com/pytorch/xla/issues/2620#issuecomment-725696108 Test Plan: Imported from OSS Reviewed By: zhangguanheng66 Differential Revision: D25055677 Pulled By: heitorschueroff fbshipit-source-id: 5da66bac78ea197821fee0b9b8a213ff2dc19c67	2020-11-18 07:49:14 -08:00
Guilherme Leobas	ca8b9437ab	Add type annotations for a few torch.nn.modules (#46013 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46012 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46013 Reviewed By: gchanan Differential Revision: D25012419 Pulled By: ezyang fbshipit-source-id: 9fd8ad9fa3122efa294a08010171cb7ddf752778	2020-11-18 07:44:59 -08:00
Samuel Aldana	8c00221fe2	Fix inconsistent environment variable naming for setting NVTOOLEXT_HOME in TorchConfig.cmake (#48012 ) Summary: When building libtorch with CUDA installed in some unconventional location, CMake files rely on some environment variables to set cmake variable, in particular NVTOOLSEXT_PATH environment variable is used to set NVTOOLEXT_HOME in cmake/public/cuda.cmake. Later when consuming such build using the generated cmake finder TorchConfig.cmake, another convention is used which feels rather inconsistent, relying on a completly new environment variable NVTOOLEXT_HOME, although the former way is still in place, cmake/public/cuda.cmake being transitively called via Caffe2Config.cmake, which is called by TorchConfig.cmake Fixes https://github.com/pytorch/pytorch/issues/48032 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48012 Reviewed By: gchanan Differential Revision: D25031260 Pulled By: ezyang fbshipit-source-id: 0d6ab8ba9f52dd10be418b1a92b0f53c889f3f2d	2020-11-18 07:37:53 -08:00
Luca Wehrstedt	2832e325dd	[TensorPipe] Avoid using deprecated alias for error (#48168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48168 TensorPipe deduplicated a set of error (which existed both under the ::transport and the ::channel namespaces). The old names were kept as aliases but we should migrate to the new ones. ghstack-source-id: 116989010 Test Plan: CI Reviewed By: beauby Differential Revision: D25051218 fbshipit-source-id: caef27f1a0ff0e6f0b8b09fa92d6f79641c1e17a	2020-11-18 04:59:08 -08:00
Hao Lu	df0ae244a9	[static runtime] Add out_ variant for aten::stack and aten::nan_to_num (#48150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48150 With D24767322, the remaining ops without out_ variant are pretty much the sparsenn specific ops, which are a bit trickier to add. Test Plan: ``` buck run //caffe2/test:static_runtime buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest buck run //caffe2/caffe2/fb/predictor:pytorch_predictor_test buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \ --pred_net=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge/precomputation_merge_net.pb \ --c2_inputs=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge/c2_inputs_precomputation_bs1.pb \ --c2_weights=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge/c2_weights_precomputation.pb \ --scripted_model=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge/traced_precomputation_partial_dper_fixes.pt \ --pt_inputs=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge/container_precomputation_bs1.pt \ --iters=1 --warmup_iters=1 --num_threads=1 --pt_enable_static_runtime=true \ --pt_cleanup_activations=true --pt_enable_out_variant=true \ --eps 1e-2 ``` Reviewed By: bwasti Differential Revision: D25016076 fbshipit-source-id: 59a7948d4cca60182b6755217571128c2fc51f4d	2020-11-17 23:06:17 -08:00
Jerry Zhang	6049653c20	[quant][graphmode][fx] Keep linear op unchanged when qconfig is not supported (#48067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48067 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D25008463 fbshipit-source-id: d0bfc6bf8d544824d0a55cd4bcd1f9301d75c935	2020-11-17 21:59:55 -08:00
Xiao Wang	a1f494cb8b	Fix test_inverse_singular for cublas path; fix cusolver inverse multi-stream issue (#47026 ) Summary: ### test_inverse_singular for cublas failure Related https://github.com/pytorch/pytorch/pull/46616#issuecomment-718102758 https://app.circleci.com/pipelines/github/pytorch/pytorch/232112/workflows/4131d4ca-cd51-44e3-8e6c-b1c3555c62fa/jobs/8523970/tests The cuda 11.1 CI container doesn't have MAGMA library, so cublas matrix inverse path is enabled. ``` Oct 27 23:13:47 -- MAGMA not found. Compiling without MAGMA support ``` The test_inverse_singular was introduced in https://github.com/pytorch/pytorch/pull/46625, but I forgot to fix that functionality for cublas path as well. ### cusolver inverse multi-stream failure fix https://github.com/pytorch/pytorch/issues/47272 The original cuda event record/block stream was wrong, which could cause NaN in output tensor. On my machine, the original code observes NaN in about 50k~500k loops. After this change, no NaN is observed in more than 2.5m loops. The performance for batch 2 matrix inverse is still the same as those in https://github.com/pytorch/pytorch/issues/42403. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47026 Reviewed By: mruberry Differential Revision: D24838546 Pulled By: ngimel fbshipit-source-id: 3b83e4ab8e6b47a8273cba277251765bd6d97911	2020-11-17 21:42:11 -08:00
Wanchao Liang	bc484cfed1	[c10d][jit] initial torchbind bindings for ProcessGroupNCCL (#42944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42944 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23228682 Pulled By: wanchaol fbshipit-source-id: 30f4258ec2a90202264745511b897f4e1f5550f7	2020-11-17 21:01:55 -08:00
Nikita Shulga	cc611280d3	Revert D24862372: [PyTorch Mobile] Fix for messenger: avoid error with [-Werror,-Wglobal-constructors] Test Plan: revert-hammer Differential Revision: D24862372 (`9392137dbe`) Original commit changeset: d07548645d5a fbshipit-source-id: 973678b9afe64b68df774c327ba3b62ff252a141	2020-11-17 19:04:57 -08:00
Martin Yuan	4883d39c6f	Avoid direct reference to at::native::tensor from TensorDataContainer (#47567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47567 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24822517 Pulled By: iseeyuan fbshipit-source-id: f69bfc029aae5199dbc63193fc7a5e5e6feb5790	2020-11-17 17:32:21 -08:00
Meghan Lele	c6c6a53ba0	[JIT] Fix function schema subtype checking (#47965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47965 Summary This commit fixes `FunctionSchema::isSubtypeOf` so that the subtyping rule it implements for `FunctionSchema` instances is contravariant in argument types and covariant in return type. At present, the rule is covariant in argument types and contravariant in return type, which is not correct. A brief but not rigourous explanation follows. Suppose there are two `FunctionSchema`s, `M = (x: T) -> R` and `N = (x: U) -> S`. For `M <= N` to be true (i.e. that `M` is a subtype of `N`), it must be true that `U <= T` and `R <= S`. This generalizes to functions with multiple arguments. Test Plan This commit extends `TestModuleInterface.test_module_interface_subtype` with two new tests cases that test the contravariance of argument types and covariance of return types in determining whether a `Module` implements an interface type. Test Plan: Imported from OSS Reviewed By: qizzzh Differential Revision: D24970883 fbshipit-source-id: 2e4bda079c7062806c105ffcc14a28796b063525	2020-11-17 17:19:13 -08:00
Iurii Zdebskyi	94cd048bda	Added foreach_frac API (#47384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47384 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D24737052 Pulled By: izdeby fbshipit-source-id: 8c94cc42bf22bfbb8f78bfeb2017a5756045763a	2020-11-17 16:56:30 -08:00
Iurii Zdebskyi	134bce7cd0	Adding bunch of unary foreach APIs (#47875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47875 Implementing several unary operators for _foreach_ APIs. ### Planned list of ops - [x] abs - [x] acos - [x] asin - [x] atan - [x] ceil - [x] cos - [x] cosh - [x] erf - [x] erfc - [x] exp - [x] expm1 - [x] floor - [x] log - [x] log10 - [x] log1p - [x] log2 - [ ] frac - [x] neg - [ ] reciprocal - [x] round - [ ] rsqrt - [ ] sigmoid - [x] sin - [x] sinh - [x] sqrt - [x] tan - [x] tanh - [ ] trunc - [x] lgamma - [ ] digamma - [ ] erfinv - [ ] sign - [ ] mvlgamma - [ ] clamp - [ ] clamp_min - [ ] clamp_max ### Perf results ``` ----------------- OP: sin ----------------- Median: 998.79 us 300.84 us ----------------- OP: abs ----------------- Median: 1.19 ms 294.97 us ----------------- OP: acos ----------------- Median: 982.30 us 299.40 us ----------------- OP: asin ----------------- Median: 1.16 ms 298.09 us ----------------- OP: atan ----------------- Median: 986.92 us 295.64 us ----------------- OP: ceil ----------------- Median: 1.17 ms 297.25 us ----------------- OP: cos ----------------- Median: 972.72 us 294.41 us ----------------- OP: cosh ----------------- Median: 1.17 ms 294.97 us ----------------- OP: erf ----------------- Median: 1.17 ms 297.02 us ----------------- OP: erfc ----------------- Median: 1.14 ms 299.23 us ----------------- OP: exp ----------------- Median: 1.15 ms 298.79 us ----------------- OP: expm1 ----------------- Median: 1.17 ms 291.79 us ----------------- OP: floor ----------------- Median: 1.17 ms 293.51 us ----------------- OP: log ----------------- Median: 1.13 ms 318.01 us ----------------- OP: log10 ----------------- Median: 987.17 us 295.57 us ----------------- OP: log1p ----------------- Median: 1.13 ms 297.15 us ----------------- OP: log2 ----------------- Median: 974.21 us 295.01 us ----------------- OP: frac ----------------- Median: 1.15 ms 296.01 us ----------------- OP: neg ----------------- Median: 1.13 ms 294.98 us ----------------- OP: reciprocal ----------------- Median: 1.16 ms 293.69 us ----------------- OP: round ----------------- Median: 1.12 ms 297.48 us ----------------- OP: sigmoid ----------------- Median: 1.13 ms 296.53 us ----------------- OP: sin ----------------- Median: 991.02 us 295.78 us ----------------- OP: sinh ----------------- Median: 1.15 ms 295.70 us ----------------- OP: sqrt ----------------- Median: 1.17 ms 297.75 us ----------------- OP: tan ----------------- 978.20 us 297.99 us ----------------- OP: tanh ----------------- Median: 967.84 us 297.29 us ----------------- OP: trunc ----------------- Median: 1.14 ms 298.72 us ----------------- OP: lgamma ----------------- Median: 1.14 ms 317.53 us ``` ### Script ``` import torch import torch.optim as optim import torch.nn as nn import torchvision import torch.utils.benchmark as benchmark_utils inputs = [torch.rand(3, 200, 200, device="cuda") for _ in range(100)] def main(): for op in [ "sin", "abs", "acos", "asin", "atan", "ceil", "cos", "cosh", "erf", "erfc", "exp", "expm1", "floor", "log", "log10", "log1p", "log2", "frac", "neg", "reciprocal", "round", "sigmoid", "sin", "sinh", "sqrt", "tan", "tanh", "trunc", "lgamma" ]: print("\n\n----------------- OP: ", op, " -----------------") stmt = "[torch.{op}(t) for t in inputs]" timer = benchmark_utils.Timer( stmt=stmt.format(op = op), globals=globals(), label="str(optimizer)", ) print(f"autorange:\n{timer.blocked_autorange()}\n\n") stmt = "torch._foreach_{op}(inputs)" timer_mta = benchmark_utils.Timer( stmt=stmt.format(op = op), globals=globals(), label="str(optimizer_mta)", ) print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n") if __name__ == "__main__": main() ``` Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D24948801 Pulled By: izdeby fbshipit-source-id: defec3c0394d6816d9a8b05a42a057348f1b4d96	2020-11-17 16:51:54 -08:00
Wang Xu	0adace3706	fix calculate_extra_mem_bytes_needed_for (#48102 ) Summary: This PR fixes a bug in calculate_extra_mem_bytes_needed_for in get_device_to_partitions_mapping Pull Request resolved: https://github.com/pytorch/pytorch/pull/48102 Reviewed By: gcatron Differential Revision: D25029059 Pulled By: scottxu0730 fbshipit-source-id: 7447b70e8da96b3dc2c5922cf9b62eb306877317	2020-11-17 16:46:11 -08:00
Martin Yuan	9392137dbe	[PyTorch Mobile] Fix for messenger: avoid error with [-Werror,-Wglobal-constructors] Summary: Messenger build may set [-Werror,-Wglobal-constructors], which triggers the compilation error `declaration requires a global destructor`. See https://fb.workplace.com/groups/2148543255442743/permalink/2531994563764275/ for details. Solution: https://stackoverflow.com/questions/15708411/how-to-deal-with-global-constructor-warning-in-clang Test Plan: CI based on D24842445, `buck test //xplat/messenger/ml/ranking_service:MessengerRankingServiceApple` Reviewed By: abiczo Differential Revision: D24862372 fbshipit-source-id: d07548645d5af480c4e53e167b30b7cd7398ccb2	2020-11-17 16:15:44 -08:00
Ashkan Aliabadi	194ea076b2	Update VMA. (#47727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47727 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D25032739 Pulled By: AshkanAliabadi fbshipit-source-id: 223df5f18dbfee02ed41eb5e116cc15437e28e8e	2020-11-17 15:40:18 -08:00
Ashkan Aliabadi	568a72bacc	Fix Vulkan empty (and family) breakage as a result of API update. (#47937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47937 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D25032738 Pulled By: AshkanAliabadi fbshipit-source-id: 8ee573033f7c9c7abcb9c08e4c80ca91da9f422f	2020-11-17 15:35:45 -08:00
Edward Yang	cdc2d2843b	Structured kernel definitions (#45277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45277 Implements structured kernels as per https://github.com/pytorch/rfcs/pull/9 and ports upsample_nearest1d to use the framework. The general structure of this diff: - Define a new syntax for specifying structured kernels in `native_functions.yaml`. You put `structured: True` on the `out` function (that's what you implement) and `structured_delegate: foo.out` on the functional/inplace variants to define them in terms of the `out` function. There's a bunch of new consistency checking to see if you've done this right, though the error messages are of varying quality. This is most of what's going on in tools.codegen.model - NativeFunctionGroup turns into StructuredNativeFunctions. Previously I thought that maybe we would use this grouping mechanism for both structured and unstructured kernels, but it turned out that Jiakai needed to make his own grouping structure. So now I've specialized it for structured kernels, which also means I get to add a bunch of invariants, like requiring structured kernels to have both a functional and an out variant. This is the lower bundle of changes in tools.codegen.model - When you make an out kernel structured, this induces us to generate a new meta function signature for you to write shape checking and output allocation code. The signatures of these is defined by `tools.codegen.api.meta` and generated into `MetaFunctions.h`. Coverage here is very bare bones and will be driven by actual operators we port as we go. - The meaty part of code generation is what we do when we have some grouped StructuredNativeFunctions. We continue to generate a wrapper per function type, but they're are a bit different as the call your meta functions, and make reference to the actual implementations in out. - Then there's a port of `upsample_nearest1d`; easiest to review by just looking at what the final code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work This PR improves instruction counts on `upsample_nearest1d` because it eliminates an extra redispatch. Testing `at::upsample_nearest1d(x, {10});` * Functional: before 1314105, after 1150705 * Out: before 915705, after 838405 These numbers may be jittered up to +-16400 (which is the difference when I tested against an unaffected operator `at::upsample_linear1d`), though that may also because unrelated changes affected all operators globally. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D24253555 Test Plan: Imported from OSS Reviewed By: smessmer Pulled By: ezyang fbshipit-source-id: 4ef58dd911991060f13576864c8171f9cc614456	2020-11-17 15:24:43 -08:00
Jerry Zhang	d7e838467a	[qunat][graphmode][fx] Embedding/EmbeddingBag works in static quant qconfig (#48062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48062 When Embedding/EmbeddingBag are configured with static quant we'll skip inserting observer for them in the graph and keep the op unchanged and print a warning. This also aligns with eager mode behavior as well. We'll enforce this behavior for other ops that only supports dynamic/weight_only quant but not static quant as well. We used a global variable `DEFAULT_NOT_OBSERVED_QUANTIZE_HANDLER`, this is not exposed to user right now, we can add that later if needed. Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D25007537 fbshipit-source-id: 6ab9e025269b44bbfd0d6dd5bb9f95fe3ca9dead	2020-11-17 15:02:04 -08:00
Tao Xu	3846e35a55	[GPU] Enable Metal on macosx (#47635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47635 Add macosx support for metal. The supported os version is 10.13 and above. ghstack-source-id: 116845318 Test Plan: 1. Sandcastle Tests 2. CircleCI Jobs 3. In the next diff, we'll run the person segmentation model inside a macos app Reviewed By: dreiss Differential Revision: D24825088 fbshipit-source-id: 10d7976c953e765599002dc42d7f8d248d7c9846	2020-11-17 14:44:34 -08:00
Eli Uriegas	05dc9821be	.circleci: Add python 3.9 builds for macOS (#47689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47689 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D25029226 Pulled By: seemethere fbshipit-source-id: 1db2b021d3adf243453f4405219d5ce03d03a9c1	2020-11-17 14:21:50 -08:00
Zafar	04545f4b46	[quant] out-variant for the reflection pad (#48037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48037 Test Plan: Imported from OSS Reviewed By: ayush29feb Differential Revision: D25000345 Pulled By: z-a-f fbshipit-source-id: 8404239a70136dd8ba1ede9695af0cf848b933a2	2020-11-17 14:10:49 -08:00
Zafar	e1a101676b	[quant] ReflectionPad2d (#48036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48036 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25000347 Pulled By: z-a-f fbshipit-source-id: f42bf3c6f7069385bc62609cf59d24c15734a058	2020-11-17 14:06:37 -08:00
Bram Wasti	cb046f7bd2	[static runtime] Initial memonger (#47759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47759 Parity reached :) /0 -> no memonger /1 -> memonger on We can see that the impact is large when activations don't all fit in cache (6x speed up on this micro bench) ``` BM_long_static_memory_optimization/2/0 8563 ns 8559 ns 86370 BM_long_static_memory_optimization/8/0 8326 ns 8322 ns 84099 BM_long_static_memory_optimization/32/0 11446 ns 11440 ns 56107 BM_long_static_memory_optimization/512/0 6116629 ns 6113108 ns 128 BM_long_static_memory_optimization/2/1 8151 ns 8149 ns 87000 BM_long_static_memory_optimization/8/1 7905 ns 7902 ns 85124 BM_long_static_memory_optimization/32/1 10652 ns 10639 ns 66055 BM_long_static_memory_optimization/512/1 1101415 ns 1100673 ns 641 ``` TODO: [x] implementation [x] enable/disable flag [x] statistics about memory saved [x] additional models Test Plan: ``` buck test //caffe2/test:static_runtime buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest buck test //caffe2/caffe2/fb/predictor:pytorch_predictor_test ``` Reviewed By: yinghai Differential Revision: D24824445 fbshipit-source-id: db1f5239f72cbd1a9444017e20d5a107c3b3f043	2020-11-17 13:55:49 -08:00
Nikita Shulga	06707a7ef8	Fix flake8 failure (#48124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48124 Reviewed By: walterddr Differential Revision: D25032696 Pulled By: malfet fbshipit-source-id: 2519d18de7417721d53f6404dc291fd8f7cc94fe	2020-11-17 13:48:08 -08:00
Nikita Shulga	b1c5f06f9e	Revert D24925955: Fix "pointless comparison" warning Test Plan: revert-hammer Differential Revision: D24925955 (`a03f05f2a2`) Original commit changeset: 56bcf32aeb16 fbshipit-source-id: f7bea36e5b23f254381a3cc655cb199a106cc62c	2020-11-17 13:35:37 -08:00
Brian Hirsh	d522cd15a3	fix BC test, after removing __caffe2 ops (#48099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48099 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25023321 Pulled By: bdhirsh fbshipit-source-id: c9567b3dcfc2bea3587a17e4b6400fc490349365	2020-11-17 12:51:02 -08:00
Tristan Rice	b10d6c6089	[caffe2] cache NextName indexes for faster name generation (#47768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47768 This stores the next ID for a given NextName(prefix, output_id) so repeated calls to NextName are significantly faster. This accounts for ~65% of time spent for large models. Test Plan: buck test //caffe2/caffe2/python/... will launch canary job before landing to ensure no regressions + confirm speedup Reviewed By: dzhulgakov Differential Revision: D24876961 fbshipit-source-id: 668d73060d800513bc72d7cd405a47d15c4acc34	2020-11-17 12:24:00 -08:00
Bert Maher	736deefc1f	[torch][te] aten::type_as is unary, not binary (#48085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48085 We were treating it as a binary operator, which implies shape broadcasting, even though the second arg is thrown away aside from the type. Treating it as a unary is the proper approach. ghstack-source-id: 116873680 Test Plan: new unit test Reviewed By: ZolotukhinM Differential Revision: D25017585 fbshipit-source-id: 0cfa89683c9bfd4fbb132617c74b47b268d7f368	2020-11-17 12:17:19 -08:00
Bert Maher	bbee0ecbd1	[pytorch][te] Handle negative axis in chunk (#48084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48084 as title ghstack-source-id: 116870328 Test Plan: new unit test Reviewed By: Krovatkin Differential Revision: D25017489 fbshipit-source-id: 0d1998fccad6f509db04b6c67a4e4e4093d96751	2020-11-17 12:12:49 -08:00
Nick Gibson	aabc87cd04	[NNC] Fix HalfChecker when half present but unused (#48068 ) Summary: Fixes an internally reported issue in the tensorexpr fuser when using FP16 on Cuda. The HalfChecker analysis to determine if we need to define the Half type searches the IR for expressions that use Half. If one of the parameters is of type Half but it (or any other Half expr) are not used in the IR we'll return a false negative. Fix this by adding the parameter list to the HalfChecker. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48068 Reviewed By: ZolotukhinM Differential Revision: D25009680 Pulled By: nickgg fbshipit-source-id: 24fddef06821f130db3d3f45d6d041c7f34a6ab0	2020-11-17 12:07:57 -08:00
Eli Uriegas	0d6c900bdb	docker: Fix PYTHON_VERSION not propagating (#47877 ) Summary: Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/47877 Reviewed By: samestep Differential Revision: D24929116 Pulled By: seemethere fbshipit-source-id: 442f8eb13318c44735200dfbb2f88e4ca1d9a127	2020-11-17 11:49:30 -08:00
Gao, Xiang	315122ce15	Bump up the CUDA OOM test memory size (#48029 ) Summary: 80GB is no longer large any more https://nvidianews.nvidia.com/news/nvidia-doubles-down-announces-a100-80gb-gpu-supercharging-worlds-most-powerful-gpu-for-ai-supercomputing Hopefully, the new size could be OK until the end of Moore's Law :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48029 Reviewed By: linbinyu Differential Revision: D25003603 Pulled By: zou3519 fbshipit-source-id: 626b9c031daee950df8453be4d7643dd67647213	2020-11-17 11:16:31 -08:00
Ansley Ussery	9443150549	Update Graph docstring to match `__init__.py` (#48100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48100 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D25023407 Pulled By: ansley fbshipit-source-id: e00706059b4c684451d2e1e48ca634b42693c1e1	2020-11-17 10:52:28 -08:00
Jerry Zhang	8aaca4b46a	[reland][quant] Remove nn.quantized.ReLU module and nn.quantized.functional.relu (#47415 ) (#48038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48038 nn.ReLU works for both float and quantized input, we don't want to define an nn.quantized.ReLU that does the same thing as nn.ReLU, similarly for nn.quantized.functional.relu this also removes the numerical inconsistency for models quantizes nn.ReLU independently in qat mode Test Plan: Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D25000462 fbshipit-source-id: e3609a3ae4a3476a42f61276619033054194a0d2	2020-11-17 09:52:21 -08:00
Richard Barnes	a03f05f2a2	Fix "pointless comparison" warning (#47876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47876 Fixes a pointless comparison against zero warning that arises for some scalar types Test Plan: Arises with ``` buck test mode/dev-nosan //caffe2/torch/fb/sparsenn:gpu_test -- test_prior_correction_calibration_prediction_binary ``` Reviewed By: ngimel Differential Revision: D24925955 fbshipit-source-id: 56bcf32aeb164b078d537dd5d7c28a52bd7b66de	2020-11-17 09:05:04 -08:00
Xu Zhao	49f0e5dfeb	Fix typing errors in torch.distributed.*, close issue #42967 . (#47534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47534 Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D24952497 Pulled By: xuzhao9 fbshipit-source-id: 063bfd0707198436fcfd9431f72f9a392bc0017e	2020-11-16 23:27:59 -08:00
Xu Zhao	7f66fa62ca	Fix typing errors in torch.distributed.nn.* directory. (#47533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47533 Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D24952500 Pulled By: xuzhao9 fbshipit-source-id: 8e66784fd8f9f111b6329e0bb48d6cd61c690a4a	2020-11-16 23:27:55 -08:00
Xu Zhao	915050ed66	Fix typing errors in torch.distributed.distributed_c10d.* (#47532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47532 Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D24952501 Pulled By: xuzhao9 fbshipit-source-id: 9b2dd1069eb1729c24be00f46da60d6a0439a8da	2020-11-16 23:27:51 -08:00
Xu Zhao	49eb82a7b2	Fix type annotation errors in torch.distributed.* directory (#47531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47531 This is part of a stack of PRs that fixes mypy typing errors in the torch.distributed.* directory. Test Plan: python test_type_hints.py -v TestTypeHints.test_run_mypy Imported from OSS Reviewed By: walterddr Differential Revision: D24952499 fbshipit-source-id: b193171e28c2211a71d28a544fa44770bf938a1e	2020-11-16 23:23:13 -08:00
Bert Maher	af37f8f810	[pytorch][te] Do not merge Tensor[] variant of aten::where into fusion group (#48063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48063 The TE fuser does not know how to construct a list of Tensors. Test Plan: new unit test Reviewed By: eellison Differential Revision: D25007234 fbshipit-source-id: 1a8ffdf5ffecb39a727357799ed32df8f53150d6	2020-11-16 22:41:10 -08:00
Bram Wasti	43a9d6fb6e	[TorchScript] Support user defined classes as constants (#5062 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/5062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45556 User defined classes can be used as constants. This is useful when freezing and removing the module from the graph. Test Plan: waitforsadcastle Reviewed By: eellison Differential Revision: D23994974 fbshipit-source-id: 5b4a5c91158aa7f22df39d71f2658afce1d29317	2020-11-16 20:52:02 -08:00
Mikhail Zolotukhin	3611d26a25	[JIT] Optimize FunctionSchema::checkArg for the Tensor case. (#48034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48034 The Tensor case is one of the most common and the existing check can be made faster. This results in a ~21% improvement on DeepAndWide model and would improve other models as well. Before the change: ``` 505[ms] 491[ms] 514[ms] 538[ms] 514[ms] 554[ms] 556[ms] 512[ms] 516[ms] 527[ms] ``` After the change: ``` 406[ms] 394[ms] 414[ms] 423[ms] 449[ms] 397[ms] 410[ms] 389[ms] 395[ms] 414[ms] ``` Differential Revision: D24999486 Test Plan: Imported from OSS Reviewed By: zdevito Pulled By: ZolotukhinM fbshipit-source-id: 7139a3a38f9c44e8ea793afe2fc662ff51cc0460	2020-11-16 20:50:24 -08:00
Brian Hirsh	7b2c78f120	Revert D24714803: make duplicate def() calls an error in the dispatcher. Updating all fb operators to use the new dispatcher registration API Test Plan: revert-hammer Differential Revision: D24714803 (`824f710694`) Original commit changeset: c809aad8a698 fbshipit-source-id: fb2ada65f9fc00d965708d202bd9d050f13ef467	2020-11-16 20:14:26 -08:00
Ankur Singla	549ef1d668	[caffe][memonger] Extend operator schema check to dag memonger (#48021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48021 Extending operator schema check for simple memonger to dag memonger as well. As part of this a fix is being made to handle inplace ops (having at least one output name same as input blob). Earlier all the output blobs from ops were being treated as shareable but it failed assertion of external input blobs with the same name not allowed to share. Test Plan: Added corresponding unit tests Reviewed By: hlu1 Differential Revision: D24968862 fbshipit-source-id: b6679a388a82b0d68f65ade64b85560354aaa3ef	2020-11-16 19:17:55 -08:00
Wang Xu	fa0acb73bd	fix node manipulation in partition class (#48016 ) Summary: This PR fixes the add_node and remove_node in partition class and also add a unit test for node manipulation in partition Pull Request resolved: https://github.com/pytorch/pytorch/pull/48016 Reviewed By: gcatron Differential Revision: D24996368 Pulled By: scottxu0730 fbshipit-source-id: 0ddffd5ed3f95e5285fffcaee8c4b671929b4df3	2020-11-16 15:33:11 -08:00
Brian Hirsh	824f710694	make duplicate def() calls an error in the dispatcher. Updating all fb operators to use the new dispatcher registration API (#47322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47322 Updating all call-sites of the legacy dispatcher registration API in fbcode to the new API. I migrated all call sites that used the legacy dispatcher registration API (RegisterOperators()) to use the new API (TORCH_LIBRARY...). I found all call-sites by running `fbgs RegisterOperators()`. This includes several places, including other OSS code (nestedtensor, torchtext, torchvision). A few things to call out: For simple ops that only had one registered kernel without a dispatch key, I replaced them with: ``` TORCH_LIBRARY_FRAGMENT(ns, m) { m.def("opName", fn_name); } ``` For ops that registered to a specific dispatch key / had multiple kernels registered, I registered the common kernel (math/cpu) directly inside a `TORCH_LIBRARY_FRAGMENT` block, and registered any additional kernels from other files (e.g. cuda) in a separate `TORCH_LIBRARY_IMPL` block. ``` // cpu file TORCH_LIBRARY_FRAGMENT(ns, m) { m.def("opName(schema_inputs) -> schema_outputs"); m.impl("opName", torch::dispatch(c10::DispatchKey::CPU, TORCH_FN(cpu_kernel))); } // cuda file TORCH_LIBRARY_IMPL(ns, CUDA, m) { m.impl("opName", torch::dispatch(c10::DispatchKey::CUDA, TORCH_FN(cuda_kernel))); } ``` Special cases: I found a few ops that used a (legacy) `CPUTensorId`/`CUDATensorId` dispatch key. Updated those to use CPU/CUDA- this seems safe because the keys are aliased to one another in `DispatchKey.h` There were a handful of ops that registered a functor (function class) to the legacy API. As far as I could tell we don't allow this case in the new API, mainly because you can accomplish the same thing more cleanly with lambdas. Rather than delete the class I wrote a wrapper function on top of the class, which I passed to the new API. There were a handful of ops that were registered only to a CUDA dispatch key. I put them inside a TORCH_LIBRARY_FRAGMENT block, and used a `def()` and `impl()` call like in case two above. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24714803 Pulled By: bdhirsh fbshipit-source-id: c809aad8a698db3fd0d832f117f833e997b159e1	2020-11-16 15:33:08 -08:00
Brian Hirsh	cba26e40cf	migrate export_caffe2_op_to_c10.h macros to the new dispatcher registration API (#47321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47321 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24714805 Pulled By: bdhirsh fbshipit-source-id: cd695c9c203a7fa4d5217c2466d7f274ce2cd096	2020-11-16 15:33:05 -08:00
Brian Hirsh	93d9837375	rename macro. TORCH_LIBRARY_FRAGMENT_THIS_API_IS_FOR_PER_OP_REGISTRATION_ONLY to TORCH_LIBRARY_FRAGMENT (#47320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47320 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24714806 Pulled By: bdhirsh fbshipit-source-id: 7007c9c54b785015577ebafd8e591aa534fe0640	2020-11-16 15:33:02 -08:00
Brian Hirsh	95b9c2061b	update legacy dispatcher registration API tests to avoid duplicate def() calls (#47319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47319 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24714804 Pulled By: bdhirsh fbshipit-source-id: 4827fbb9a568a44599bb84e45cbe63b02181f21e	2020-11-16 15:32:59 -08:00
Brian Hirsh	6ec2a89e01	remove ops in the __caffe2 namespace (#47318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47318 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24714807 Pulled By: bdhirsh fbshipit-source-id: 7f040c12c0b3a0f322498386f849f693a64d1dcf	2020-11-16 15:30:16 -08:00
albanD	233192be73	Make sure valid ParameterList/Dict don't warn on creation (#47772 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46983 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47772 Reviewed By: zou3519 Differential Revision: D24991341 Pulled By: albanD fbshipit-source-id: 0fa21192f529a016048e3eef88c5a8f3cbb3c235	2020-11-16 13:16:59 -08:00
Gao, Xiang	b12d645c2f	Test TORCH_LIBRARY in CUDA extension (#47524 ) Summary: In the [official documentation](https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html), it is recommended to use `TORCH_LIBRARY` to register ops for TorchScript. However, that code is never tested with CUDA extension and is actually broken (https://github.com/pytorch/pytorch/issues/47493). This PR adds a test for it. It will not pass CI now, but it will pass when the issue https://github.com/pytorch/pytorch/issues/47493 is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47524 Reviewed By: zou3519 Differential Revision: D24991839 Pulled By: ezyang fbshipit-source-id: 037196621c7ff9a6e7905efc1097ff97906a0b1c	2020-11-16 13:12:22 -08:00
Guilherme Leobas	cf92b0f3a0	add type annotations to multiprocessing module (#47756 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47757 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47756 Reviewed By: malfet Differential Revision: D24970773 Pulled By: ezyang fbshipit-source-id: b0b9edb9cc1057829c6320e78174c6d5f7a77477	2020-11-16 13:05:49 -08:00
OverLordGoldDragon	1e0ace7fdc	Fix docstring typo (#47545 ) Summary: It's its. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47545 Reviewed By: ezyang Differential Revision: D24921308 Pulled By: heitorschueroff fbshipit-source-id: 3bd53b0303afa3b75cce23d0804096f3d7f67c7e	2020-11-16 13:03:36 -08:00
Tristan Rice	825ee7e7f8	[caffe2] plan_executor_test: add test case for should_stop loops (#47613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47613 This is to test some more cancellation edge cases that were missing before. It passes under the current code. Test Plan: buck test mode/opt caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 10 Reviewed By: dahsh Differential Revision: D24836956 fbshipit-source-id: 3b00dc081cbf4f26e7756d597099636edb49d256	2020-11-16 12:59:13 -08:00
Ailing Zhang	550f26c6d5	Port math kernel for layer_norm from pytorch/xla. (#47882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47882 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24958691 Pulled By: ailzhang fbshipit-source-id: 694e22c20a365730fbacf94efa1bdf7fdd7aec20	2020-11-16 12:49:58 -08:00
albanD	95ea778ac6	Set proper output differentiability for unique function (#47930 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47851 Since the definitions of these functions in `native_functions.yaml` has special dispatch, we were already generating the proper `NotImplemented` behavior for these functions but we were wrongfully setting that gradient of all of the outputs. Added entries in `derivatives.yaml` to allow us to specify which outpus are differentiable or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47930 Reviewed By: smessmer Differential Revision: D24960667 Pulled By: albanD fbshipit-source-id: 19e5bb3029cf0d020b31e2fa264b3a03dd86ec10	2020-11-16 12:26:10 -08:00
Vasiliy Kuznetsov	dea2337825	torch.Assert: make it torch.jit.script'able (#47399 ) (#47973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47973 Currently torch.Assert is not scriptable, which makes it not very useful for production code. According to jamesr66a , moving this to c++ op land will help with scriptability. This PR implements the change. Note: with the current code the Assert is scriptable but the Assert is a no-op after being scripted. Would love suggestions on how to address that (can be in future PR). Test Plan: ``` python test/test_utils.py TestAssert.test_assert_scriptable python test/test_utils.py TestAssert.test_assert_true python test/test_fx.py TestFX.test_symbolic_trace_assert ``` Reviewed By: supriyar Differential Revision: D24974299 Pulled By: vkuzo fbshipit-source-id: 20d4f4d8ac20d76eee122f2cdcdcdcaf1cda3afe	2020-11-16 11:46:12 -08:00
Vasiliy Kuznetsov	ee995d33bd	rename torch.Assert to torch._assert (#47763 ) (#47972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47972 Changing the name due to the discussion in https://github.com/pytorch/pytorch/pull/47399. Test Plan: ``` python test/test_utils.py TestAssert.test_assert_true python test/test_fx.py TestFX.test_symbolic_trace_assert python test/test_fx_experimental.py ``` Reviewed By: supriyar Differential Revision: D24974298 Pulled By: vkuzo fbshipit-source-id: 24ded93a7243ec79a0375f4eae8a3db9b787f857	2020-11-16 11:43:27 -08:00
Jeffrey Wan	d20483a999	Skip dummy node creation for autograd engine when there is a single input and place on correct queue (#47592 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42890 - Removes dummy node - Places graph root on the correct queue based on input buffer's device instead of cpu queue by default cpu - no significant change in speed (too noisy to measure), but we see up to 7% reduction in small graphs cuda - small reduction in speed (still very noisy) and up to ~20% reduction in instruction count for small graphs CPU Code: ``` import torch from torch.utils.benchmark import Timer setup=""" a = torch.rand((2, 2), requires_grad=True) b = torch.rand((2, 2), requires_grad=True) gradient = torch.ones(2, 2) """ stmt=""" torch.autograd.grad(ab, [a, b], gradient) """ timer = Timer(stmt, setup) print(timer.timeit(10000)) print(timer.collect_callgrind(100)) ``` Before (when dummy node is not skipped): ``` torch.autograd.grad(ab, [a, b], gradient) setup: a = torch.rand((2, 2), requires_grad=True) b = torch.rand((2, 2), requires_grad=True) gradient = torch.ones(2, 2) 26.62 us 1 measurement, 10000 runs , 1 thread <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7efee44ad8e0> torch.autograd.grad(ab, [a, b], gradient) setup: a = torch.rand((2, 2), requires_grad=True) b = torch.rand((2, 2), requires_grad=True) gradient = torch.ones(2, 2) All Noisy symbols removed Instructions: 9755488 9659378 Baseline: 4300 3784 100 runs per measurement, 1 thread ``` After ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7f56961a7730> torch.autograd.grad(ab, [a, b], gradient) setup: a = torch.rand((2, 2), requires_grad=True) b = torch.rand((2, 2), requires_grad=True) gradient = torch.ones(2, 2) 26.78 us 1 measurement, 10000 runs , 1 thread <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f56961a78e0> torch.autograd.grad(ab, [a, b], gradient) setup: a = torch.rand((2, 2), requires_grad=True) b = torch.rand((2, 2), requires_grad=True) gradient = torch.ones(2, 2) All Noisy symbols removed Instructions: 9045508 8939872 Baseline: 4280 3784 100 runs per measurement, 1 thread ``` Cuda* Before ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7f84cbaa1ee0> torch.autograd.grad(out, [x, y], gradient) setup: x = torch.rand((2,2), requires_grad=True, device="cuda") y = torch.rand((2,2), requires_grad=True, device="cuda") out = x + y gradient = torch.ones(2, 2).cuda() 70.49 us 1 measurement, 10000 runs , 1 thread <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f84cbaa1e50> torch.autograd.grad(out, [x, y], gradient) setup: x = torch.rand((2,2), requires_grad=True, device="cuda") y = torch.rand((2,2), requires_grad=True, device="cuda") out = x + y gradient = torch.ones(2, 2).cuda() All Noisy symbols removed Instructions: 5054581 4951911 Baseline: 4105 3735 100 runs per measurement, 1 thread ``` Remove dummy node only ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7fbf29c67eb0> torch.autograd.grad(out, [x, y], gradient) setup: x = torch.rand((2,2), requires_grad=True, device="cuda") y = torch.rand((2,2), requires_grad=True, device="cuda") out = x + y gradient = torch.ones(2, 2).cuda() 55.65 us 1 measurement, 10000 runs , 1 thread <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7fbf29c67e20> torch.autograd.grad(out, [x, y], gradient) setup: x = torch.rand((2,2), requires_grad=True, device="cuda") y = torch.rand((2,2), requires_grad=True, device="cuda") out = x + y gradient = torch.ones(2, 2).cuda() All Noisy symbols removed Instructions: 5002105 4900841 Baseline: 4177 3731 100 runs per measurement, 1 thread ``` Remove dummy node and put in correct queue ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7fb64438ce80> torch.autograd.grad(out, [x, y], gradient) setup: x = torch.rand((2,2), requires_grad=True, device="cuda") y = torch.rand((2,2), requires_grad=True, device="cuda") out = x + y gradient = torch.ones(2, 2).cuda() 27.56 us 1 measurement, 10000 runs , 1 thread <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7fb64438cdf0> torch.autograd.grad(out, [x, y], gradient) setup: x = torch.rand((2,2), requires_grad=True, device="cuda") y = torch.rand((2,2), requires_grad=True, device="cuda") out = x + y gradient = torch.ones(2, 2).cuda() All Noisy symbols removed Instructions: 4104433 4007555 Baseline: 4159 3735 100 runs per measurement, 1 thread ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47592 Reviewed By: ailzhang Differential Revision: D24890761 Pulled By: soulitzer fbshipit-source-id: f457376e4a882f8a59476e8c1e708391b1a031a2	2020-11-16 11:33:35 -08:00
Nick Gibson	957e45a97c	[NNC] Support vectorization of reductions (#47924 ) Summary: Add support for ReduceOp in the Vectorizer, which allows vectorization of reductions. Only non-reduce axes can be vectorized currently, we'd need either automatically pulling out the RHS of reductions (better as a separate transform, I think) or special handling of vector reduce in the LLVM codegen (tricky, maybe not useful?) to make vectorizing reduce axes work. There was a disabled LLVM test for this case which I reenabled with a bit of massaging, and added a few more. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47924 Reviewed By: bertmaher Differential Revision: D24963464 Pulled By: nickgg fbshipit-source-id: 91d91e9e2696555ab5690b154984b1ce48359d51	2020-11-16 10:43:53 -08:00
Rong Rong	9aaf7fb398	[CI] Fix additional CI jobs not launched when PR is created from fork repo (#47969 ) Summary: `CIRCLE_PR_NUMBER` is not always set during CI. This is to extract PR NUMBER from BRANCH info in order to launch additional CI jobs. Should allow tag:`ci/binaries` and `ci/all` worked on forked PRs Pull Request resolved: https://github.com/pytorch/pytorch/pull/47969 Reviewed By: janeyx99 Differential Revision: D24991790 Pulled By: walterddr fbshipit-source-id: 3ca30752135d54236a9abf0610eb89946852d45a	2020-11-16 08:38:54 -08:00
Hameer Abbasi	3a2aad9314	Fix documentation to point to torch.overrides instead of _overrides. (#47842 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47697 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47842 Reviewed By: smessmer Differential Revision: D24951750 Pulled By: ezyang fbshipit-source-id: df62ec2e52f1c561c864a50bac4abf4a55e4f8e6	2020-11-16 08:28:53 -08:00
Yi Zhang	f9552e6da4	update windows build guide (#47840 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47483 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47840 Reviewed By: malfet Differential Revision: D24951466 Pulled By: walterddr fbshipit-source-id: 7530ec5a3aff7095978c330d9b78e58b10349373	2020-11-16 08:15:42 -08:00
Rong Rong	147a48fb27	[cmake] clean up cmake/Utils.cmake (#47923 ) Summary: Consolidate into cmake/public/utils.cmake Pull Request resolved: https://github.com/pytorch/pytorch/pull/47923 Reviewed By: samestep Differential Revision: D24955961 Pulled By: walterddr fbshipit-source-id: 9d5f6af2b353a8c6f6d521c841fd0989393755cd	2020-11-16 08:12:32 -08:00
albanD	cd4aa9c95c	Fix inplace check logic to be triggered when written to Tensor does not require gradients (#46296 ) Summary: Fix https://github.com/pytorch/pytorch/issues/46242 This ensures that the `check_inplace()` run the proper checks even if the Tensor that is being modified inplace does not requires gradient. As the Tensor written into it might require gradient and will make this inplace modification actually differentiable. This contains: - Codegen changes to tell `check_inplace()` if the inplace will be differentiable - Changes in `handle_view_on_rebase` to work properly even when called for an input that does not require gradients (which was assumed to be true before) - Corresponding tests (both warnings and the error raise internal assert errors without this fix) Pull Request resolved: https://github.com/pytorch/pytorch/pull/46296 Reviewed By: ezyang Differential Revision: D24903770 Pulled By: albanD fbshipit-source-id: 74e65dad3d2e3b9f762cbb7b39f92f19d9a0b094	2020-11-16 08:06:06 -08:00
Jane Xu	d032d22141	Replacing CUDA11.0 config with CUDA11.1 in CI (#47942 ) Summary: Relands https://github.com/pytorch/pytorch/issues/46616 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47942 Reviewed By: walterddr Differential Revision: D24963006 Pulled By: janeyx99 fbshipit-source-id: 71a61c56dec88a32a1c5d194db5a2730100f60a1	2020-11-16 07:32:35 -08:00
Mike Ruberry	013e6a3d9d	Revert D24698027: Fix auto exponent issue for torch.pow Test Plan: revert-hammer Differential Revision: D24698027 (`8ef7ccd669`) Original commit changeset: f23fdb65c925 fbshipit-source-id: 9a67a2c6310c9e4fdefbb421a8cd4fa41595bc9a	2020-11-15 03:58:44 -08:00
anjali411	8ef7ccd669	Fix auto exponent issue for torch.pow (#47024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47024 Fixes https://github.com/pytorch/pytorch/issues/46936 Stack from [ghstack](https://github.com/ezyang/ghstack): * #47024 Fix auto exponent issue for torch.pow Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24698027 Pulled By: anjali411 fbshipit-source-id: f23fdb65c925166243593036e08214c4f041a63d	2020-11-14 22:50:12 -08:00
Xiang Gao	d293413b3e	Batched matmul dtypes (#47873 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/47873 Reviewed By: navahgar Differential Revision: D24928256 Pulled By: anjali411 fbshipit-source-id: a26aef7a15a13fc0b5716e905971265d8b1cea61	2020-11-14 22:45:48 -08:00
anjali411	db1f217d8d	Add complex support for torch.addcmul and torch.addcdiv (#46639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46639 Resolves: https://github.com/pytorch/pytorch/issues/46546#issuecomment-713122245 Test Plan: Imported from OSS Reviewed By: izdeby, ansley Differential Revision: D24879099 Pulled By: anjali411 fbshipit-source-id: 76131dc68ac964e67a633f62e07f7c799df4463e	2020-11-14 21:27:34 -08:00
Bert Maher	5adf840259	[pytorch][te][easy] Remove KernelScope from fusion pass tests (#47952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47952 We don't actually generate a TE kernel so no need to use the arena-allocation guard. Test Plan: ``` buck test //caffe2/test/cpp/tensorexpr -- FuserPass ``` Reviewed By: ZolotukhinM Differential Revision: D24967107 fbshipit-source-id: 302f65b2fcff704079e8b51b942b7b3baff95585	2020-11-14 20:25:01 -08:00
Jianyu Huang	0e98fdd389	[ATen/CPU] Parallelize HalfToFloat + FloatToHalf operators in PT (#47777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47777 Parallelize FP32 <-> FP16 op. - Use at::Parallelize in ATen instead of parallelizing inside FBGEMM; - provide more flexibility (at::Parallelize can be configured with different parallel backend). ghstack-source-id: 116499687 Test Plan: ``` OMP_NUM_THREADS=10 buck test //caffe2/test:torch -- .test_half_tensor. ``` https://our.intern.facebook.com/intern/testinfra/testrun/7036874441928985 ``` OMP_NUM_THREADS=10 buck run mode/opt -c pytorch.parallel_backend=tbb //caffe2/benchmarks/operator_benchmark/pt:tensor_to_test -- --iterations 1 --omp_num_threads 10 --warmup_iterations 0 ``` Benchmark results for 512 x 512 Tensor copy: - With 1 thread: ``` (base) [jianyuhuang@devbig281.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/caffe2/operators] $ buck run mode/opt -c py torch.parallel_backend=tbb //caffe2/benchmarks/operator_benchmark/pt:tensor_to_test -- --iterations 1 --omp_num_thread s 1 --warmup_iterations 10 Parsing buck files: finished in 1.3 sec Building: finished in 5.7 sec (100%) 6087/6087 jobs, 0 updated Total time: 7.0 sec No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: FloatToHalfTensorConversionBenchmark # Mode: Eager # Name: FloatToHalfTensorConversionBenchmark_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 99.279 # Benchmarking PyTorch: HalfToFloatTensorConversionBenchmark # Mode: Eager # Name: HalfToFloatTensorConversionBenchmark_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 81.707 ``` - With 2 threads: ``` (base) [jianyuhuang@devbig281.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/caffe2/operators] $ buck run mode/opt -c py torch.parallel_backend=tbb //caffe2/benchmarks/operator_benchmark/pt:tensor_to_test -- --iterations 1 --omp_num_thread s 2 --warmup_iterations 10 Parsing buck files: finished in 1.3 sec Building: finished in 4.4 sec (100%) 6087/6087 jobs, 0 updated Total time: 5.7 sec No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: FloatToHalfTensorConversionBenchmark # Mode: Eager # Name: FloatToHalfTensorConversionBenchmark_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 68.162 # Benchmarking PyTorch: HalfToFloatTensorConversionBenchmark # Mode: Eager # Name: HalfToFloatTensorConversionBenchmark_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 49.245 ``` Reviewed By: ngimel Differential Revision: D24676355 fbshipit-source-id: 02bfb893a7b5a60f97c0559d8974c53837755ac2	2020-11-14 18:45:23 -08:00
Rohan Varma	f8248543a1	Pass in smaller timeout into init_process_group for distributed_test (#47896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47896 Per title ghstack-source-id: 116710141 Test Plan: CI Reviewed By: osalpekar Differential Revision: D24943323 fbshipit-source-id: 7bf33ce3a021b9750b65e0c08f602c465cd81d28	2020-11-14 13:38:20 -08:00
Jiakai Liu	07e98d28cf	[pytorch][codegen] migrate gen_variable_factories.py to the new data model (#47818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47818 This is another relatively small codegen. Ideally we should CppSignature.decl() to generate the c++ function declaration. We didn't because it needs to add 'at::' to the types defined in ATen namespace. E.g.: - standard declaration: ``` Tensor eye(int64_t n, int64_t m, const TensorOptions & options={}) ``` - expected: ``` at::Tensor eye(int64_t n, int64_t m, const at::TensorOptions & options = {}) ``` Kept the hacky fully_qualified_type() method to keep compatibility with old codegen. We could clean up by: - Using these types in torch namespace - but this is a user facing header file, not sure if it will cause problem; - Update cpp.argument_type() method to take optional namespace argument; Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24909478 Pulled By: ljk53 fbshipit-source-id: a0ceaa60cc765c526908fee39f151cd7ed5ec923	2020-11-14 13:05:23 -08:00
Vasiliy Kuznetsov	4779553921	Revert "[quant] Remove nn.quantized.ReLU module and nn.quantized.functional.relu (#47415 )" (#47949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47949 This reverts commit 1478e5ec2aa42b2a9742257642c7c1d3203d7309. Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D24966363 Pulled By: vkuzo fbshipit-source-id: ca1126f699eef84027a15df35962728296c8a790	2020-11-14 08:40:30 -08:00
Jiakai Liu	c936b43f14	[pytorch][codegen] add fully migrated scripts to mypy strict config (#47747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47747 Moved MANUAL_AUTOGRAD / etc to gen_trace_type.py to avoid mypy from scanning not yet migrated gen_variable_type.py. Differential Revision: D24885066 Test Plan: Imported from OSS Reviewed By: ezyang Pulled By: ljk53 fbshipit-source-id: bf420e21c26f45fe2b94977bc6df840ffd8a3128	2020-11-14 02:28:00 -08:00
Jiakai Liu	4ff8cd8f3a	[pytorch][codegen] gen_python_functions.py loading native_functions.yaml / deprecated.yaml directly (#47746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47746 - Removed the integration hack in gen_python_functions.py. It now directly loads native_functions.yaml. All dependencies on Declarations.yaml have been removed / moved to elsewhere. - Rewrote the deprecated.yaml parsing logic to work with new data model directly. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Differential Revision: D24885067 Test Plan: Imported from OSS Reviewed By: bhosmer Pulled By: ljk53 fbshipit-source-id: 8e906b7dd36a64395087bd290f6f54596485ceb4	2020-11-14 02:27:57 -08:00
Jiakai Liu	d91cefb0d8	[pytorch][codegen] migrate gen_annotated_fn_args.py to new codegen model (#47745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47745 This is a relatively small codegen. Reintroduced 'simple_type' to preserve old codegen output. It depends on some methods defined in gen_python_functions.py - next PR will clean up the remaining Declarations.yaml methods in gen_python_functions.py. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Differential Revision: D24885068 Test Plan: Imported from OSS Reviewed By: ezyang Pulled By: ljk53 fbshipit-source-id: c0fbd726bcc450c3c7fe232c23e5b31779d0b65f	2020-11-14 02:24:39 -08:00
Wang Xu	0dbff184e9	change file name to snake style (#47914 ) Summary: Change Partitioner.py file name to partitioner.py Change GraphManipulation.py file name to graph_manipulation.py Move test_replace_target_nodes_with() to test_fx_experimental.py Remove the unnecessary argument in size_based_partition() in Partitioner class Pull Request resolved: https://github.com/pytorch/pytorch/pull/47914 Reviewed By: gcatron Differential Revision: D24956653 Pulled By: scottxu0730 fbshipit-source-id: 25b65be7dc7d64e90ffdc59cf394446fee83c3e6	2020-11-14 01:29:25 -08:00
Jagadish Krishnamoorthy	1606899dbe	distributed_test: Map rank to GPU accordingly (#47898 ) Summary: If world_size is lesser than or equal to number of GPU's available then the rank can be directly mapped to corresponding GPU. This fixes the issue referenced in https://github.com/pytorch/pytorch/issues/45435 and https://github.com/pytorch/pytorch/issues/47629 For world_size = 3 and number of GPU's = 8, the rank to GPU mapping will be 0,2,4. This is due to the introduction of barrier, (refer PR https://github.com/pytorch/pytorch/issues/45181) the tensors in barrier is mapped to cuda0,1,2 and the tensors in the actual test cases are mapped to cuda0,2,4 resulting in different streams and leading to timeout. This issue is specific to default process group. Issue is not observed in new process group since the streams are created again after the initial barrier call. This patch maps the rank to corresponding GPU's when the world_size is less than or equal to the number of GPU's, in this case 0,1,2 Note: The barrier function in distributed_c10d.py should include new parameter to specify the tensor or rank to GPU mapping. In that case, this patch will be redundant but harmless since the tests can specify the tensors with appropriate GPU rankings. Fixes https://github.com/pytorch/pytorch/issues/47629 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47898 Reviewed By: smessmer Differential Revision: D24956021 Pulled By: rohan-varma fbshipit-source-id: a88257f22a7991ba36566329766c106d3360bb4e	2020-11-13 23:59:42 -08:00
Natalia Gimelshein	982ae987d3	Revert D24941350: [pytorch][PR] Reopen PR for 0 dim batch size for AvgPool2d. Test Plan: revert-hammer Differential Revision: D24941350 (`ceeab70da1`) Original commit changeset: b7e50346d86e fbshipit-source-id: 2e42e4418476658dc1afb905184841bf61688cfd	2020-11-13 22:33:37 -08:00
Richard Barnes	c543b3b582	Fix a downcast (#47919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47919 Suppresses a downcast warning. Test Plan: Reproduces with ``` buck test mode/dev-nosan //caffe2/torch/fb/sparsenn:gpu_test ``` Reviewed By: suphoff Differential Revision: D24866987 fbshipit-source-id: 44f19ab37a7d95abe08f570abfebc702827a2510	2020-11-13 22:26:29 -08:00
Katy Voor	fe7d1d7d0e	Add LeakyReLU operator to static runtime (#47798 ) Summary: - Add LeakyReLU operator to static runtime - Add LeakyReLU benchmark - Add LeakyReLU correctness test case Static Runtime ``` ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ BM_leaky_relu/1 4092 ns 4092 ns 172331 BM_leaky_relu/8 4425 ns 4425 ns 158434 BM_leaky_relu/20 4830 ns 4830 ns 145335 BM_leaky_relu_const/1 3545 ns 3545 ns 198054 BM_leaky_relu_const/8 3825 ns 3825 ns 183074 BM_leaky_relu_const/20 4222 ns 4222 ns 165999 ``` Interpreter ``` ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ BM_leaky_relu/1 7183 ns 7182 ns 96377 BM_leaky_relu/8 7580 ns 7580 ns 91588 BM_leaky_relu/20 8066 ns 8066 ns 87183 BM_leaky_relu_const/1 6466 ns 6466 ns 107925 BM_leaky_relu_const/8 7063 ns 7063 ns 98768 BM_leaky_relu_const/20 7380 ns 7380 ns 94564 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47798 Reviewed By: ezyang Differential Revision: D24927043 Pulled By: kavoor fbshipit-source-id: 69b12cc57f725f1dc8d68635788813710a74dc2b	2020-11-13 22:05:52 -08:00
Chester Liu	17a6bc7c1b	Cleanup unused code for Python < 3.6 (#47822 ) Summary: I think these can be safely removed since the min version of supported Python is now 3.6 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47822 Reviewed By: smessmer Differential Revision: D24954936 Pulled By: ezyang fbshipit-source-id: 5d4b2aeb78fc97d7ee4abaf5fb2aae21bf765e8b	2020-11-13 21:37:01 -08:00
Guilherme Leobas	4f9d0757f3	Add type informations to torch.cuda (#47134 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47133 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47134 Reviewed By: smessmer Differential Revision: D24955031 Pulled By: ezyang fbshipit-source-id: 87f4623643715baa6ac0627383f009956f80cd46	2020-11-13 21:34:35 -08:00
Masaki Kozuki	2eb1e866e8	Update links in DDP note (#47663 ) Summary: Update the links in https://pytorch.org/docs/stable/notes/ddp.html#. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47663 Reviewed By: smessmer Differential Revision: D24951684 Pulled By: ezyang fbshipit-source-id: c1c104d76cf0292a7fc75a627bf76bb56fea72d0	2020-11-13 21:26:28 -08:00
Greg Tarr	550973b675	Missing curly bracket. (#47855 ) Summary: Typo fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/47855 Reviewed By: smessmer Differential Revision: D24951767 Pulled By: ezyang fbshipit-source-id: 8884390370d4d71efd6cee10c3e0b8f55d7e5739	2020-11-13 21:17:24 -08:00
Meghan Lele	1bdd3687b9	Back out "[JIT] Fix function schema subtype checking" Summary: Original commit changeset: bd07e7b47d2a Test Plan: T79664004 Reviewed By: qizzzh Differential Revision: D24969339 fbshipit-source-id: 8ecc4d52b86c5440c673e42b0e2cb78d94937a6f	2020-11-13 20:33:54 -08:00
Zino Benaissa	11710598db	Preserve module parameters in freezing (#47094 ) Summary: Added preserveParameters to freezing API that allows to preserve module parameters. Fixes #{39613} Pull Request resolved: https://github.com/pytorch/pytorch/pull/47094 Reviewed By: eellison Differential Revision: D24792867 Pulled By: bzinodev fbshipit-source-id: f0cd980f5aed617b778afe2f231067c7c30a1527	2020-11-13 20:18:32 -08:00
Omkar Salpekar	f8c559db8e	[resubmit] Providing more information while crashing process in async error handling (#47246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47246 We crash the process in NCCL Async Error Handling if the collective has been running for greater than some set timeout. This PR introduces more information about the rank and duration the collective ran. ghstack-source-id: 116676182 Test Plan: Run desync tests and flow. Reviewed By: pritamdamania87 Differential Revision: D24695126 fbshipit-source-id: 61ae46477065a1a451dc46fb29c3ac0073ca531b	2020-11-13 20:11:06 -08:00
Xiaomeng Yang	a9b6fa9e46	Fix multinomial when input has 0 prob (#47386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47386 Fix multinomail when input has 0 prob Test Plan: buck test mode/dev-nosan //caffe2/test:torch -- "multinomial" Reviewed By: ngimel Differential Revision: D24699691 fbshipit-source-id: d88bb5be8cfed9da2ce6f6a8abd18e834fbde580	2020-11-13 19:07:49 -08:00
Ayush Saraf	f86ec08160	[pytorch][quantization] adding jit state for QuantizedLeakyReLU (#47660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47660 Currently, `QuantizedLeakyReLU` doesn't have any items in the `state_dict`. However, this operator needs to store the `scale` and `zero_point` in its state dictionary or the loading state dict for a quantized model with LeakyReLUs that have non-default quantization params would break. Test Plan: Originally the issue was found here: https://www.internalfb.com/intern/anp/view/?id=390362&revision_id=2510709822565735 In the latest version, I fixed this issue: https://www.internalfb.com/intern/anp/view/?id=390362 Reviewed By: jerryzh168 Differential Revision: D24757522 fbshipit-source-id: 57e1dea072b5862e65e228e52a86f2062073aead	2020-11-13 18:59:46 -08:00
Elias Ellison	4380934b9b	[JIT] Dont use specialized tensor type (#46130 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/46122 For `Any`, we infer the type of the ivalue to set the ivalue's type tag. When we saw a Tensor, we would use a specialized Tensor type, so when `Dict[str, Tensor]` was passed in as any `Any` arg it would be inferred as `Dict[str, Float(2, 2, 2, 2)]` which breaks runtime `isinstance` checking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46130 Reviewed By: glaringlee Differential Revision: D24261447 Pulled By: eellison fbshipit-source-id: 8a2bb26ce5b6c56c8dcd8db79e420f4b5ed83ed5	2020-11-13 18:34:40 -08:00
Richard Barnes	5c0dff836a	Improve dimensionality mismatch warning (#47874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47874 Test Plan: N/A Reviewed By: ngimel Differential Revision: D24926123 fbshipit-source-id: ace5543ae5122906164e13ae9463fe4dfa74d8d6	2020-11-13 18:26:34 -08:00
Sameer Deshmukh	ceeab70da1	Reopen PR for 0 dim batch size for AvgPool2d. (#47426 ) Summary: Resubmitting https://github.com/pytorch/pytorch/pull/40694 since it could not be landed for some reason. CC ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/47426 Reviewed By: mruberry Differential Revision: D24941350 Pulled By: ngimel fbshipit-source-id: b7e50346d86eb63aaaf4fdd5ee71fafee2d0b476	2020-11-13 17:57:35 -08:00
Ivan Yashchuk	260daf088d	Added linalg.cholesky (#46083 ) Summary: This PR adds `torch.linalg.cholesky` function that matches `numpy.linalg.cholesky`. Fixed `lda` argument to `lapackCholesky` calls. Added `random_hermitian_pd_matrix` helper function for tests. Ref https://github.com/pytorch/pytorch/issues/42666. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46083 Reviewed By: ailzhang Differential Revision: D24861752 Pulled By: mruberry fbshipit-source-id: 214dbceb4e8a2c589df209493efd843962d25593	2020-11-13 16:50:40 -08:00
Xiang Gao	e8fecd5caf	Add constructor for ArgumentDef (#47492 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47493 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47492 Reviewed By: bdhirsh Differential Revision: D24791564 Pulled By: dzhulgakov fbshipit-source-id: 43e4bbda754c61f40855675c1d5d0ddc9f351ebe	2020-11-13 16:39:45 -08:00
Facebook Community Bot	0685773d8d	Automated submodule update: FBGEMM (#47929 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `9b0131179f` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47929 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: smessmer Differential Revision: D24957361 fbshipit-source-id: 72fe80a784f10ddca52ee99fcf67cf6448a93012	2020-11-13 16:06:49 -08:00
Yang Wang	0125e14c9a	[OpBench] change relu entry point after D24747035 Summary: D24747035 (`1478e5ec2a`) removes the entry point of `nnq.functional.relu`. Adjust op benchmark to `torch.nn.ReLU` accordingly. Test Plan: buck run caffe2/benchmarks/operator_benchmark/pt:qactivation_test -- --use_jit --iterations 1 --warmup_iterations 1 Reviewed By: mingzhe09088 Differential Revision: D24961625 fbshipit-source-id: 5ed0ec7fa6d8cfefc8e7fc8324cf9a2a3e59de90	2020-11-13 15:38:27 -08:00
Xiang Gao	6e42b77be1	Add '--allow-run-as-root' to mpiexec to allow running distributed test inside a container (#43794 ) Summary: Inside a container, the user is often root. We should allow this use case so that people can easily run `run_test.py` insider a container Pull Request resolved: https://github.com/pytorch/pytorch/pull/43794 Reviewed By: ezyang Differential Revision: D24904469 Pulled By: malfet fbshipit-source-id: f96cb9dda3e7bd18b29801cde4c5b0616c750016	2020-11-13 15:31:06 -08:00
Ben Koopman	7b8bd91632	fp16 -> fp32 EmbeddingBag moved into CPU impl (#47076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47076 Pull Request resolved: https://github.com/pytorch/glow/pull/5038 Eliminate double casting in glow when submitting fp16 per sample weights Test Plan: buck test glow/glow/torch_glow/tests:embedding_bag_test Due to dependency conflicts between glow and caffe2, the test has been reverted from this diff, and landed separately Reviewed By: allwu Differential Revision: D24421367 fbshipit-source-id: eb3615144a2cad3d593543428dfdec165ad301df	2020-11-13 15:17:04 -08:00
BowenBao	6a4d55f23c	[ONNX] Enable onnx shape inference in export by default (#46629 ) Summary: * Enable ONNX shape inference by default. * ONNX could potentially set inferred shape in output instead of value_infos, checking both to be sure. * Small fix in symbol_map to avoid overlooking dup symbols. * Fix scalar_type_analysis to be consistent with PyTorch scalar type promotion logic. * Correctly handle None dim_param from ONNX inferred shape. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46629 Reviewed By: ailzhang Differential Revision: D24900171 Pulled By: bzinodev fbshipit-source-id: 83d37fb9daf83a2c5969d8383e4c8aac986c35fb	2020-11-13 15:09:46 -08:00
Jerry Zhang	c0aa863c56	[quant][graphmode][fx][refactor] insert_quantize_node (#47880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47880 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24928797 fbshipit-source-id: 9a8b359cabfb800da86da114bf26bb5bd99d3fff	2020-11-13 14:50:42 -08:00
Omkar Salpekar	5d51b63984	Use Blocking Wait if both Blocking Wait and Async Error Handling Are Set (#47926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47926 Given that we're soon enabling async error handling in PET, we should make the behavior explicit when users have set NCCL_BLOCKING_WAIT in their own code while also using PET. This PR essentially gives blocking wait precedence (for now). This way the blast radius of the PET change is smaller, while we continue working with blocking wait users and discussing whether moving to async error handling may be a good fit. ghstack-source-id: 116553583 Test Plan: Simple FBL run/CI Reviewed By: jiayisuse Differential Revision: D24928149 fbshipit-source-id: d42c038ad44607feb3d46dd65925237c564ff7a3	2020-11-13 14:43:00 -08:00
Ankur Singla	f743b5639a	[caffe2][memonger] Add support for distributed inference predict nets in DAG memonger (#47718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47718 Distributed Inference splits a predict net into multiple parts, part0 being the main part which contains ops to make remote calls to other parts. part0 predict net may contain AsyncIf ops to optimize rpc call usage. AsyncIf ops have internal nets which may refer to memongered blobs. This change handles AsyncIf ops to update internal nets to refer to memongered blobs. As part of this change, I am also updating dag memonger traversal to always start from root op, i.e. ops with 0 in degree. Earlier logic will start traversing ops based on input head blobs and if one of the head inputs is getting used in a non-root op which gets visited before its parent, the traversal will throwing assertion error here: https://fburl.com/diffusion/ob110s9z . Almost for all the distributed inference part0 nets, it was throwing this assertion error. Test Plan: Added corresponding tests in memonger_test.py . Could not find unit tests in c++ version of memonger. Reviewed By: hlu1 Differential Revision: D24872010 fbshipit-source-id: 1dc99b2fb52b2bc692fa4fc0aff6b7e4c5e4f5b0	2020-11-13 14:12:07 -08:00
Jonathan Kwok	a3e08e5344	Support ReduceSum in c2_pt_converter (#47889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47889 Adds support for converting the [caffe2 ReduceSum](https://caffe2.ai/docs/operators-catalogue#reducesum) operator to torch. ghstack-source-id: 116580127 Test Plan: buck test //caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test : [results](https://our.intern.facebook.com/intern/testinfra/testrun/6755399466095119) ✓ ListingSuccess: caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test - main (60.273) ✓ Pass: caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test - test_sub_op (caffe2.torch.fb.model_transform.c2_convert.c2_pt_converter_test.C2PTConverterTest) (101.119) ✓ Pass: caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test - test_layer_norm_conversion (caffe2.torch.fb.model_transform.c2_convert.c2_pt_converter_test.C2PTConverterTest) (101.404) ✓ Pass: caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test - test_local_model_conversion (caffe2.torch.fb.model_transform.c2_convert.c2_pt_converter_test.C2PTConverterTest) (101.966) ✓ Pass: caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test - test_reduce_sum (caffe2.torch.fb.model_transform.c2_convert.c2_pt_converter_test.C2PTConverterTest) (114.896) Reviewed By: bugra Differential Revision: D24925318 fbshipit-source-id: 3f3b791eff1b03e8f5adee744560fe8bc811c659	2020-11-13 12:02:58 -08:00
Linbin Yu	eccbd4df1c	Remove fbcode/caffe2/mode (#46454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46454 we stopped syncing this folder to fbcode, and it's not been used. AIbench will use the ones in xplat. Test Plan: zbgs fbcode/caffe2/mode/ find nothing Reviewed By: xta0 Differential Revision: D24356743 fbshipit-source-id: 7e70a2181a49b8ff3f87e5be3b8c808135f4c527	2020-11-13 11:54:47 -08:00
Meghan Lele	03d1978a1a	[JIT] Resolve string literal type annotations using `Resolver::resolveType` (#47731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47731 Summary This commit modifies `ScriptTypeParser::parseTypeFromExpr` so that string literal type annotations are resolved using `Resolver::resolveType`. At present, they are parsed in `parseBaseTypeName`, which inadvertently allows any key from `string_to_type_lut` to be used as a string literal type annotation. Test Plan Existing unit tests (most notably `TestClassType.test_self_referential_method` which tests the main feature, self-referential class type annotations, that make use of string literal type annotations). Fixes This commit fixes #47570. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D24934717 Pulled By: SplitInfinity fbshipit-source-id: b915b2c08272566b63b3cf5ff4a07ad43bdc381a	2020-11-13 11:46:08 -08:00
Jerry Zhang	1915ae9510	[quant][graphmode][fx][refactor] is_output_quantized (#47879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47879 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24928796 fbshipit-source-id: 55c49243b6a0b4811953cf72af57e5f56be8c419	2020-11-13 11:15:55 -08:00
Bert Maher	6b8d20c023	[pytorch][te] Don't start TE fusion groups with an unknown-typed result (#47884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47884 We need to know output types of everything in a fusion group to ensure that we generate correctly-typed tensors. We were incorrectly starting a fusion group with an unknown-typed output. Test Plan: New unit tests: ``` buck test //caffe2/test:jit //caffe2/test/cpp/tensorexpr:tensorexpr ``` Reviewed By: eellison Differential Revision: D24932786 fbshipit-source-id: 83978a951f32c1207bbc3555a7d3bd94fe4e70fb	2020-11-13 10:52:53 -08:00
Sam Estep	d54497fca7	Try again to give hash in doc push scripts (#47922 ) Summary: This is a second attempt at `8304c25c67`, since the first attempt did not work as shown by `b05f3571fe` and `c59015f21d`. This time the idea is to directly embed the commit hash itself into the generated command that is fed to `docker exec`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47922 Reviewed By: zou3519 Differential Revision: D24953734 Pulled By: samestep fbshipit-source-id: 35b14d1266ef039e8c1bdf3648275af812a2e57b	2020-11-13 10:17:37 -08:00
Gary Zheng	f1babb00f0	[caffe2] Fix ListWithEvicted _pprint_impl wrongly printing _evicted_values (#47881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47881 ListWithEvicted's _pprint_impl was accidentally printing _items before this change. Reviewed By: dzhulgakov Differential Revision: D24928521 fbshipit-source-id: 0d7940719b4a27defbaae3b99af104d7fe7b5144	2020-11-13 09:23:10 -08:00
Richard Zou	d4db4718fa	Revert D24873991: Profiler benchmark fix Test Plan: revert-hammer Differential Revision: D24873991 (`a97c7e2ef0`) Original commit changeset: 1c3950d7d289 fbshipit-source-id: 6f3b8a49caf90aaa3e16707005b6b7cf6e61d89f	2020-11-13 08:37:14 -08:00
Richard Zou	e5da3b6097	Revert D24891767: rename torch.Assert to torch._assert Test Plan: revert-hammer Differential Revision: D24891767 (`a8ca042ec0`) Original commit changeset: 01c7a5acd83b fbshipit-source-id: cd2271467151b578185758723fcd23f69051d3a3	2020-11-13 08:35:05 -08:00
Richard Zou	4cec19b56a	Revert D24740727: torch.Assert: make it torch.jit.script'able Test Plan: revert-hammer Differential Revision: D24740727 (`b787e748f0`) Original commit changeset: c7888e769c92 fbshipit-source-id: 1e097bd9c0f8b04bea0e0346317a126b42a3dc4f	2020-11-13 08:31:40 -08:00
Richard Zou	1c7c612af0	Revert D24543682: [pytorch][PR] Added support for complex input for torch.lu_solve Test Plan: revert-hammer Differential Revision: D24543682 (`ffd0003022`) Original commit changeset: 165bde39ef95 fbshipit-source-id: 790b4157fdbc7149aaf0748555efe6daed7e1a23	2020-11-13 08:24:53 -08:00
generatedunixname89002005325676	8855c4e12f	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Differential Revision: D24946660 fbshipit-source-id: e47d04cac21314acb7f9ac3bdfa0d09289e399b4	2020-11-13 06:59:04 -08:00
Wang Xu	759a548d6e	add dependency check in cost_aware_partition (#47856 ) Summary: In the cost_aware_partition, check the circular dependency in try_combining_partitions. Also fix the calculate of communication time between partitions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47856 Reviewed By: gcatron Differential Revision: D24926591 Pulled By: scottxu0730 fbshipit-source-id: c634608675ac14b13b2370a727e4fb05e1bb94f0	2020-11-13 02:49:39 -08:00
Ivan Yashchuk	ffd0003022	Added support for complex input for torch.lu_solve (#46862 ) Summary: `torch.lu_solve` now works for complex inputs both on CPU and GPU. I moved the existing tests to `test_linalg.py` and modified them to test complex dtypes, but I didn't modify/improve the body of the tests. Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46862 Reviewed By: nikithamalgifb Differential Revision: D24543682 Pulled By: anjali411 fbshipit-source-id: 165bde39ef95cafebf976c5ba4b487297efe8433	2020-11-13 02:35:31 -08:00
Tao Xu	2ed3430877	[GPU] Make permuteWeights inline (#47634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47634 Follow up on d16r's diff - D24710102. Make the function inline in order to get rid of the compiler checking `-Werror,-Wunused-function`. ghstack-source-id: 116607200 Test Plan: 1. Sandcastle Tests 2. CircleCI jobs Reviewed By: d16r Differential Revision: D24824637 fbshipit-source-id: c17e219b384b91ac4620aa23112a6cda1200a605	2020-11-13 02:00:29 -08:00
Meghan Lele	692726812b	[JIT] Fix function schema subtype checking (#47706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47706 Summary This commit fixes `FunctionSchema::isSubtypeOf` so that the subtyping rule it implements for `FunctionSchema` instances is contravariant in argument types and covariant in return type. At present, the rule is covariant in argument types and contravariant in return type, which is not correct. A brief but not rigourous explanation follows. Suppose there are two `FunctionSchema`s, `M = (x: T) -> R` and `N = (x: U) -> S`. For `M <= N` to be true (i.e. that `M` is a subtype of `N`), it must be true that `U <= T` and `R <= S`. This generalizes to functions with multiple arguments. Test Plan This commit extends `TestModuleInterface.test_module_interface_subtype` with two new tests cases that test the contravariance of argument types and covariance of return types in determining whether a `Module` implements an interface type. Fixes This commit closes #47631. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D24934099 Pulled By: SplitInfinity fbshipit-source-id: bd07e7b47d2a3a56d676f2f572de09fb18ececd8	2020-11-13 00:43:53 -08:00
Scott Wolchok	1aeac97712	[PyTorch] Remove unnecessary shared_ptr copies in ThreadLocalDebugInfo::get (#47791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47791 `debug_info` is `thread_local` and this function is a leaf, so nobody else could free it out from under us. Regular pointer should be fine. ghstack-source-id: 116456975 Test Plan: Run framework overhead benchmarks Reviewed By: bhosmer Differential Revision: D24901749 fbshipit-source-id: c01a60b609fd08e5200264d8e98d356e2c78cf28	2020-11-13 00:04:37 -08:00
Vasiliy Kuznetsov	b787e748f0	torch.Assert: make it torch.jit.script'able (#47399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47399 Currently torch.Assert is not scriptable, which makes it not very useful for production code. According to jamesr66a , moving this to c++ op land will help with scriptability. This PR implements the change. Note: with the current code the Assert is scriptable but the Assert is a no-op after being scripted. Would love suggestions on how to address that (can be in future PR). Test Plan: ``` python test/test_utils.py TestAssert.test_assert_scriptable python test/test_utils.py TestAssert.test_assert_true python test/test_fx.py TestFX.test_symbolic_trace_assert ``` Imported from OSS Reviewed By: eellison Differential Revision: D24740727 fbshipit-source-id: c7888e769c921408a3020ca8332f4dae33f2bc0e	2020-11-13 00:02:19 -08:00
Vasiliy Kuznetsov	a8ca042ec0	rename torch.Assert to torch._assert (#47763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47763 Changing the name due to the discussion in https://github.com/pytorch/pytorch/pull/47399. Test Plan: ``` python test/test_utils.py TestAssert.test_assert_true python test/test_fx.py TestFX.test_symbolic_trace_assert python test/test_fx_experimental.py ``` Imported from OSS Reviewed By: ezyang Differential Revision: D24891767 fbshipit-source-id: 01c7a5acd83bf9c962751552780930c242134dd2	2020-11-12 23:59:34 -08:00
Scott Wolchok	16d6af74e6	[PyTorch] Optimize ~intrusive_ptr for the case of zero weak references (#47834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47834 We can determine if (as is likely) there are no outstanding weak references without bothering to decrement the count. `std::shared_ptr` does this same optimization in libc++: `229db36474/libcxx/src/memory.cpp (L69-L107)` ghstack-source-id: 116576326 Test Plan: Saw time spent in TensorImpl::release_resources drop in local profiling of empty benchmark Run framework overhead benchmarks. 9-10% savings on OutOfPlace, small single digit savings on empty, essentially none on InPlace. Reviewed By: bhosmer Differential Revision: D24914763 fbshipit-source-id: 19b03f960e32123bc72f7edce63fa1d18c3c143f	2020-11-12 23:50:48 -08:00
Supriya Rao	ed20e327d7	[quant] skip tests without fbgemm support (#47800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47800 Fixes #47748 Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: vkuzo Differential Revision: D24904885 fbshipit-source-id: 76d27659e73c7f60b3fcc25606657ee9305117be	2020-11-12 23:35:10 -08:00
Yang Wang	9ee4f499f0	[OpBench] add _consume_op.list for processing input with type of List[Tensor] (#47890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47890 As titled. In order to fix issue when running `chunk_test`, `split_test`, `qobserver` , `sort in qunary` in jit mode, because the output of `chunk_op` is a list of tensors which can not be handled by the current `_consume_op` Test Plan: OSS: python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1 --use_jit Reviewed By: mingzhe09088 Differential Revision: D24774105 fbshipit-source-id: 210a0345b8526ebf3c24f4d0794e20b2ff6cef3d	2020-11-12 23:29:40 -08:00
Gao, Xiang	0652d755d3	Fix some flaky tests in test_torch.py and test_nn.py (#46941 ) Summary: Fixed test: - `test_is_nonzero`, this is asserting exact match, which is flaky when `TORCH_SHOW_CPP_STACKTRACES=1`, I changed this to non-exact assert - `test_pinverse` TF32 - `test_symeig` TF32 - `test_triangular_solve_batched_many_batches_cpu_float64` precision on CPU BLAS - `test_qr` TF32, as well as the tensor factory forgets a `dtype=dtype` - `test_lu` TF32 - `ConvTranspose2d` TF32 - `Conv3d_1x1x1_no_bias` TF32 - `Transformer*` TF32 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46941 Reviewed By: heitorschueroff Differential Revision: D24852725 Pulled By: mruberry fbshipit-source-id: ccd4740cc643476178d81059d1c78da34e5082ed	2020-11-12 22:35:42 -08:00
Xiang Gao	2712acbd53	CUDA BFloat16 Dropout (#45005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45005 Reviewed By: mruberry Differential Revision: D24934761 Pulled By: ngimel fbshipit-source-id: 8f615b97fb93dcd04a46e1d8eeb817ade5082990	2020-11-12 22:28:11 -08:00
Jerry Zhang	1589ede8dd	[quant][graphmode][fx] insert_observer_for_input_arg_of_observed_node (#47785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47785 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24900302 fbshipit-source-id: 61d6287c462898837aed85d5c3a48b6e47b4a41b	2020-11-12 22:19:51 -08:00
Ning Dong	dfd946871a	Move eq.device to lite interpreter Reviewed By: iseeyuan Differential Revision: D24866273 fbshipit-source-id: 113dc50c7f083fa50fd431ffbac224101f8d3c4e	2020-11-12 22:03:57 -08:00
Ilia Cherniavskii	a97c7e2ef0	Profiler benchmark fix (#47713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47713 Fix the import and also always use internal Timer Test Plan: python benchmarks/profiler_benchmark/profiler_bench.py Reviewed By: dzhulgakov Differential Revision: D24873991 Pulled By: ilia-cher fbshipit-source-id: 1c3950d7d289a4fb5bd7043ba2d842a35c263eaa	2020-11-12 21:47:30 -08:00
Jerry Zhang	1afdcbfbb3	[quant][graphmode][fx][refactor] insert_observer_for_output_of_the_node (#47784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47784 Test Plan: python test/test_quantization.py TestQuantizeFx Imported from OSS Reviewed By: vkuzo Differential Revision: D24900301 fbshipit-source-id: abaeae1b5747e517adeb0d50cec5998a8a3fc24d	2020-11-12 21:39:29 -08:00
Alberto Alfarano	59e96c55f7	Support MatMul in c2_pt_converter Summary: Added the MatMul operator for caffe2 Test Plan: buck test //caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test Reviewed By: bugra Differential Revision: D24920937 fbshipit-source-id: 7ba09ba0439cb9bd15d6a41fd8ff1a86d8d11437	2020-11-12 20:56:58 -08:00
Jerry Zhang	c4ecbcdcb3	[quant][graphmode][fx][refactor] insert_observer_for_special_module (#47783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47783 Test Plan: python test/test_quantization.py TestQuantizeFx Imported from OSS Reviewed By: vkuzo Differential Revision: D24900304 fbshipit-source-id: 11cc3dd4ea5e272209db9f3c419deadd40db5f42	2020-11-12 20:48:34 -08:00
Ksenija Stanojevic	9fa681c5e0	[ONNX] Add export of prim::dtype, prim::tolist (#46019 ) Summary: Add export of prim::dtype, prim::tolist. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46019 Reviewed By: malfet Differential Revision: D24870870 Pulled By: bzinodev fbshipit-source-id: 7f59e2c8f5ac2dbf83c889c73bd61f96587a296e	2020-11-12 20:34:40 -08:00
David	85c43c3da1	[ONNX] Convert _len based on the first dimension length (#47538 ) Summary: This PR is a bug fix. As UT shows, for multiple-dimensional tensors, the current conversion for _len returns the total number of the tensors. But it should return the first dimension length, as pytorch _len defines. Need `Squeeze` op at the end to ensure it outputs a scalar value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47538 Reviewed By: malfet Differential Revision: D24870717 Pulled By: bzinodev fbshipit-source-id: c53c745baa6d2fb7cc1de55a19bd2eedb2ad5272	2020-11-12 20:25:39 -08:00
Nick Gibson	eab809377d	[NNC] Remove all deferred expansion from Reductions (#47709 ) Summary: Refactors the ReduceOp node to remove the last remaining deferred functionality: completing the interaction between the accumulator buffer and the body. This fixes two issues with reductions: 1. Nodes inside the interaction could not be visited or modified, meaning we could generate bad code when the interaction was complex. 2. The accumulator load was created at expansion time and so could not be modified in some ways (ie. vectorization couldn't act on these loads). This simplifies reduction logic quite a bit, but theres a bit more involved in the rfactor transform. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47709 Reviewed By: ZolotukhinM Differential Revision: D24904220 Pulled By: nickgg fbshipit-source-id: 159e5fd967d2d1f8697cfa96ce1bb5fc44920a40	2020-11-12 20:17:52 -08:00
Natalia Gimelshein	eb8331e759	Revert D24524219: Remove `balance` and `devices` parameter from Pipe. Test Plan: revert-hammer Differential Revision: D24524219 (`8da7576303`) Original commit changeset: 9973172c2bb7 fbshipit-source-id: b187c80270adb2a412e3882863a2d7de2a52ed56	2020-11-12 19:31:19 -08:00
Jiakai Liu	4f538a2ba4	[pytorch][bot] update mobile op deps (#47825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47825 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D24913587 Pulled By: ljk53 fbshipit-source-id: b6219573c3238fb453d88019197a00c9f9dbabb8	2020-11-12 19:19:25 -08:00
Jiakai Liu	a376d3dd5d	[pytorch] strip out warning message ifdef STRIP_ERROR_MESSAGES (#47827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47827 Similar to TORCH_CHECK_WITH_MSG, strip messages for TORCH_WARN/TORCH_WARN_ONCE. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24913586 Pulled By: ljk53 fbshipit-source-id: 00f0f2bf33a48d5d7008b70ff5820623586dfd4e	2020-11-12 19:16:42 -08:00
Yang Wang	8ff0b6fef8	[OpBenchMobile] Enable operator_benchmark to run the benchmark on mobile through AiBench (#47767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47767 This diff implements the functionality of running benchmark on mobile on top of operator_benchmark framework. It does so through a few steps: 1. create a scripted module from existing benchmark case. 2. run mobile specific optimization pass on the scripted module 3. run the scripted module on AiBench by calling its Python API A small change in the way of writing a benchmark case is introduced so that both local and mobile run can share the same interface. The change is about having inputs as arguments of the `forward` function, so that mobile optimization pass can be run successfully (otherwise everything will be optimized away by constant propagation). Test Plan: ## local op_bench run buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --warmup_iterations 1 buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --warmup_iterations 1 --use_jit Exceptions: `py_module` op in `FakeQuantizePerTensorBaseOpBenchmark` and `FakeQuantizePerChannelBaseOpBenchmark` under JIT mode. These tests also failed in the base version ``` RuntimeError: Module 'FakeQuantizePerChannelOpBenchmark' has no attribute 'op_func' (This function exists as an attribute on the Python module, but we failed to compile it to a TorchScript function. The error stack is reproduced here: Python builtin <built-in method apply of FunctionMeta object at 0x619000c652a0> is currently not supported in Torchscript: File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 260 quant_min: int, quant_max: int ): return _LearnableFakeQuantizePerChannelOp.apply(input, scale, zero_point, axis, quant_min, quant_max, 1.0) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE : File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 313 axis: int, quant_min: int, quant_max: int ): return self.op_func(input, scale, zero_point, axis, quant_min, quant_max) ~~~~~~~~~~~~ <--- HERE ``` `_consume_op` typing mismatch: chunk, split, qobserver, sort in qunary. These will be fixed in D24774105 ## OSS test python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1 --use_jit python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1 ## saved module graph ``` module __torch__.mobile_benchmark_utils.OpBenchmarkMobile { parameters { } attributes { training = True num_iters = 1 benchmark = <__torch__.pt.add_test.___torch_mangle_4.AddBenchmark object at 0x6070001b8b50> } methods { method forward { graph(%self : __torch__.mobile_benchmark_utils.OpBenchmarkMobile): %12 : None = prim::Constant() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:9:4 %4 : bool = prim::Constant[value=1]() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8 %1 : int = prim::GetAttr[name="num_iters"](%self) = prim::Loop(%1, %4) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8 block0(%i : int): %6 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self) %7 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self) %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]() %9 : Tensor, %10 : Tensor = prim::TupleUnpack(%self.inputs_tuple) %23 : int = prim::Constant[value=1]() %24 : Tensor = aten::add(%9, %10, %23) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15 -> (%4) return (%12) } } submodules { module __torch__.pt.add_test.___torch_mangle_4.AddBenchmark { parameters { } attributes { mobile_optimized = True } methods { method forward { graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark, %input_one.1 : Tensor, %input_two.1 : Tensor): %3 : int = prim::Constant[value=1]() %4 : Tensor = aten::add(%input_one.1, %input_two.1, %3) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15 return (%4) } method get_inputs { graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark): %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]() return (%self.inputs_tuple) } } submodules { } } } } ``` Reviewed By: kimishpatel Differential Revision: D24322214 fbshipit-source-id: 335317eca4f40c4083883eb41dc47caf25cbdfd1	2020-11-12 17:15:05 -08:00
Sebastian Messmer	edf751ca2f	Make empty c10-full (#46092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46092 Make empty c10-full without using hacky-wrapper, i.e. port the kernel to the new style signature. This PR also changes the signature of some helpers called by empty to the new style. ghstack-source-id: 116544203 (Note: this ignores all push blocking failures!) Test Plan: vs prev diff (outdated, before c10::optional fix): https://www.internalfb.com/intern/fblearner/details/224735103/ after c10::optional fix: https://www.internalfb.com/intern/fblearner/details/231391773/ Also, after the c10::optional fix, the instruction counting benchmark shows a 2% regression for calling empty from Python. We decided this is acceptable and decided against landing D24425836 which would fix the regression. Reviewed By: ezyang Differential Revision: D24219944 fbshipit-source-id: e554096e90ce438c75b679131c3151ff8e5c5d50	2020-11-12 17:08:21 -08:00
kshitij12345	3649a2c170	[numpy] `torch.sqrt` : promote integer inputs to float (#47293 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47293 Reviewed By: malfet Differential Revision: D24855994 Pulled By: mruberry fbshipit-source-id: 1e6752f2eeba6d638dea0bdea0c650cf722718c9	2020-11-12 16:16:09 -08:00
Rong Rong	7391edb591	[hotfix] fix misleadingly summary BLAS=MKL when there's no BLAS install (#47803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47803 Reviewed By: samestep Differential Revision: D24907453 Pulled By: walterddr fbshipit-source-id: a3e41041f6aa506b054eb0ffc61f8525ba02cbf1	2020-11-12 16:05:14 -08:00
James Reed	9734c042b8	[FX] Fix submodule naming for subgraph split (#47869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47869 Test Plan: Imported from OSS Reviewed By: scottxu0730 Differential Revision: D24925283 Pulled By: jamesr66a fbshipit-source-id: a33bff20667405a3bbfc81e1e640c2649c0db03b	2020-11-12 15:58:45 -08:00
Garret Catron	21f447ee2c	Added serialization of parameters for leaf modules (#47729 ) Summary: This adds the serialization of parameters of leaf nodes to the json serialization. Specifically __constants__ of the leaf module is serialized as parameters in the JSON. It also adds type/shape to leaf modules as well. ``` { "shape": "[3, 3, 1, 1]", "dtype": "torch.float32", "parameters": { "name": "Conv2d", "stride": [ 1, 1 ], "padding": [ 0, 0 ], "dilation": [ 1, 1 ], "groups": 1, "padding_mode": "zeros", "output_padding": [ 0, 0 ], "in_channels": 3, "out_channels": 3, "kernel_size": [ 2, 2 ] }, "target": "conv", "op_code": "call_module", "name": "conv", "args": [ { "is_node": true, "name": "c" } ], "kwargs": {} }, ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47729 Reviewed By: ailzhang Differential Revision: D24901632 Pulled By: gcatron fbshipit-source-id: 7f2d923937042b60819c58fd180b426a3733ff5f	2020-11-12 14:28:31 -08:00
Pritam Damania	8da7576303	Remove `balance` and `devices` parameter from Pipe. (#46804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46804 As per our design in https://github.com/pytorch/pytorch/issues/44827, changign the API such that the user places modules on appropriate devices instead of having a `balance` and `devices` parameter that decides this. This design allows us to use RemoteModule in the future. ghstack-source-id: 116479842 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D24524219 fbshipit-source-id: 9973172c2bb7636572cdc37ce06bf8368638a463	2020-11-12 14:20:23 -08:00
Sam Estep	65d5004b09	Update, appease, and enable fail-on for shellcheck (#47786 ) Summary: Currently ([example](https://github.com/pytorch/pytorch/runs/1381883195)), ShellCheck is run on `*.sh` files in `.jenkins/pytorch`, but it uses a three-and-a-half-year-old version, and doesn't fail the lint job despite yielding many warnings. This PR does the following: - update ShellCheck to v0.7.1 (and generally make it always use the latest `"stable"` release), to get more warnings and also enable the directory-wide directives that were introduced in v0.7.0 (see the next bullet) - move the rule exclusions list from a variable in `.jenkins/run-shellcheck.sh` to a [declarative file](https://github.com/koalaman/shellcheck/issues/725#issuecomment-469102071) `.jenkins/pytorch/.shellcheckrc`, so now editor integrations such as [vscode-shellcheck](https://github.com/timonwong/vscode-shellcheck) give the same warnings as the CLI script - fix all ShellCheck warnings in `.jenkins/pytorch` - remove the suppression of ShellCheck's return value, so now it will fail the lint job if new warnings are introduced --- While working on this, I was confused because I was getting fairly different results from running ShellCheck locally versus what I saw in the CI logs, and also different results among the laptop and devservers I was using. Part of this was due to different versions of ShellCheck, but there were even differences within the same version. For instance, this command should reproduce the results in CI by using (almost) exactly the same environment: ```bash act -P ubuntu-latest=nektos/act-environments-ubuntu:18.04 -j quick-checks \ \| sed '1,/Run Shellcheck Jenkins scripts/d;/Success - Shellcheck Jenkins scripts/,$d' \ \| cut -c25- ``` But the various warnings were being displayed in different orders, so it was hard to tell at a glance whether I was getting the same result set or not. However, piping the results into this ShellCheck-output-sorting Python script showed that they were in fact the same: ```python import fileinput items = ''.join(fileinput.input()).split('\n\n') print(''.join(sorted(f'\n{item.strip()}\n\n' for item in items)), end='') ``` Note that while the above little script worked for the old version (v0.4.6) that was previously being used in CI, it is a bit brittle, and will not give great results in more recent ShellCheck versions (since they give more different kinds of output besides just a list of warnings). Pull Request resolved: https://github.com/pytorch/pytorch/pull/47786 Reviewed By: seemethere Differential Revision: D24900522 Pulled By: samestep fbshipit-source-id: 92d66e1d5d28a77de5a4274411598cdd28b7d436	2020-11-12 14:00:16 -08:00
Sam Estep	8304c25c67	Give hash in commit messages in doc push scripts (#47694 ) Summary: This PR replaces the current auto-generated commit messages like pytorch/pytorch.github.io@fb217ab34a (currently includes no information) and pytorch/cppdocs@7efd67e8f1 (currently includes only a timestamp, which is redundant since it's a Git commit) with more descriptive ones that specify the pytorch/pytorch commit they originated from. This information would be useful for debugging issues such as https://github.com/pytorch/pytorch/issues/47462. GitHub will also [autolink](https://docs.github.com/en/free-pro-team@latest/github/writing-on-github/autolinked-references-and-urls#commit-shas) these new messages (similar to ezyang/pytorch-ci-hud@bc25ae770d), and so they will now also mostly follow Git commit message conventions by starting with a capital letter, using the imperative voice, and (at least in the autolink-rendered form on GitHub, although not in the raw text) staying under 50 characters. Question for reviewers: Will my `export CIRCLE_SHA1="$CIRCLE_SHA1"` work here? Is it necessary? Pull Request resolved: https://github.com/pytorch/pytorch/pull/47694 Reviewed By: walterddr Differential Revision: D24868240 Pulled By: samestep fbshipit-source-id: 4907341e7b57ed6818ab550dc1ec423f2c2450c1	2020-11-12 13:36:01 -08:00
Nick Gibson	b1a4170ab3	[NNC] Fix lowering of aten::pow (#47795 ) Summary: NNC lowering of aten::pow assumes that the types of the exponent is either float or int cast to to float, which doesn't work great with double (or half for that matter). Fixes https://github.com/pytorch/pytorch/issues/47304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47795 Reviewed By: ZolotukhinM Differential Revision: D24904201 Pulled By: nickgg fbshipit-source-id: 43c3ea704399ebb36c33cd222db16c60e5b7ada5	2020-11-12 12:33:07 -08:00
Ivan Yashchuk	149190c014	Added CUDA support for complex input for torch.solve (#47045 ) Summary: `torch.solve` now works for complex inputs on GPU. I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes. Differentiation also works correctly with complex inputs. Fixes https://github.com/pytorch/pytorch/issues/41084 Ref. https://github.com/pytorch/pytorch/issues/33152 anjali411 I hope you don't mind that I took over https://github.com/pytorch/pytorch/pull/42737 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47045 Reviewed By: nikithamalgifb Differential Revision: D24921503 Pulled By: anjali411 fbshipit-source-id: 4c3fc4f193a84b6e28c43c08672d480715000923	2020-11-12 12:22:59 -08:00
Omkar Salpekar	275a89a7ee	[Docs] Store Docs fixes about HashStore API (#47643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47643 Updating the docs to indicate the `num_keys` and `delete_key` APIs are now supported by the HashStore (not just TCPStore). ghstack-source-id: 116459958 Test Plan: CI Reviewed By: jiayisuse, mrshenli Differential Revision: D24633570 fbshipit-source-id: 549479dd99f9ec6decbfffcb74b9792403d05ba2	2020-11-12 12:14:52 -08:00
Tao Xu	6aaf04616b	[Metal] Remove undefined tests Summary: As title Test Plan: - Circle CI - Sandcastle Reviewed By: husthyc Differential Revision: D24915370 fbshipit-source-id: fe05ac37a25c804695a13fb5a7eabbc60442a102	2020-11-12 11:54:43 -08:00
James Reed	f51be328ae	[FX] Fix __tensor_constants not scriptable (#47817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47817 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D24908959 Pulled By: jamesr66a fbshipit-source-id: c0cadae2091e917b72684262b8655f8813ac9d91	2020-11-12 11:39:07 -08:00
Nick Gibson	76ff557de7	[NNC] add hazard analysis to Bounds Inference (#47684 ) Summary: Adds a helper function to Bounds Inference / Memory Analaysis infrastructure which returns the kind of hazard found between two Stmts (e.g. Blocks or Loops). E.g. ``` for (int i = 0; i < 10; ++i) { A[x] = i * 2; } for (int j = 0; j < 10; ++j) { B[x] = A[x] / 2; } ``` The two loops have a `ReadAfterWrite` hazard, while in this example: ``` for (int i = 0; i < 10; ++i) { A[x] = i * 2; } for (int j = 0; j < 10; ++j) { A[x] = B[x] / 2; } ``` The loops have a `WriteAfterWrite` hazard. This isn't 100% of what we need for loop fusion, for example we don't check the strides of the loop to see if they match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47684 Reviewed By: malfet Differential Revision: D24873587 Pulled By: nickgg fbshipit-source-id: 991149e5942e769612298ada855687469a219d62	2020-11-12 11:34:31 -08:00
Elias Ellison	664d2f48cf	[NNC] Enable unary op cpu testing (#47374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47374 A few small fixes needed to enable unary op cpu testing. If reviewers would prefer I split them up let me know. Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805248 Pulled By: eellison fbshipit-source-id: c2cfe2e3319a633e64da3366e68f5bf21d390cb7	2020-11-12 11:14:03 -08:00
Elias Ellison	dcca712d3c	[NNC] refactor cuda half support to more general file (#47373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47373 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805246 Pulled By: eellison fbshipit-source-id: 33b5c84c9212d51bac3968e02aae2434dde40cd8	2020-11-12 11:14:00 -08:00
Elias Ellison	346a71d29c	[NNC] More cpu tests (#47372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47372 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805254 Pulled By: eellison fbshipit-source-id: b7e5ee044ef816e024b6fc5c4041fff5f2049bb3	2020-11-12 11:13:57 -08:00
Elias Ellison	450738441b	[NNC] Add more CPU Tests (#47371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47371 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805252 Pulled By: eellison fbshipit-source-id: 16472960d09f6c981adca2a45b2a4efb75a09d4f	2020-11-12 11:13:54 -08:00
Elias Ellison	e618bd858e	[NNC] Fix llvm min lowering for int inputs (#47370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47370 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805249 Pulled By: eellison fbshipit-source-id: e13d956899e8651600fab94dab04aa39ca427769	2020-11-12 11:13:50 -08:00
Elias Ellison	fe81faee5f	Add more CPU tests (#47369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47369 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805251 Pulled By: eellison fbshipit-source-id: f1a8210ffdc3cc88354cb4896652151d83a0345a	2020-11-12 11:13:47 -08:00
Elias Ellison	b8a1070ec0	[TensorExpr][CPU] Fix bool -> int casting (#46951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46951 If e.g. we're casting from torch.int -> torch.bool, previously we would just truncate from int32 -> i8. Since torch.bool has 8 bits but only uses one of them, we need to makes sure that one bit is set. Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805253 Pulled By: eellison fbshipit-source-id: af3aa323f10820d189827eb51037adfa7d80fed9	2020-11-12 11:13:44 -08:00
Elias Ellison	ad5be26b2f	Small changes/cleanup (#46950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46950 Make sure that we're fusing in a fuse tests, and refactor to more concise API to check if fusions have happened. Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805250 Pulled By: eellison fbshipit-source-id: f898008a64b74e761bb5fe85f91b3cdf2dbdf878	2020-11-12 11:13:38 -08:00
Elias Ellison	f221a19a7f	Force LLVM Compilation for CPU Tests (#46949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46949 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805247 Pulled By: eellison fbshipit-source-id: 4fcaf02d8a78cc5cbcbde36940d0a2c85fba3fc5	2020-11-12 11:12:08 -08:00
Nick Gibson	f42cdc2e43	[NNC] Fix printing of integral doubles (#47799 ) Summary: When printing doubles, we don't do anything to distinguish intregal doubles (ie, 1 or 2) from ints. Added decoration of these doubles with `.0` if they are integral (i.e. DoubleImm(1) will print as `1.0`). This is an issue specifically on Cuda where some intrinsics do not have type coercion. Added a test which covers this case (without the fix it tries to look up pow(double, int) which doesn't exist). Fixes https://github.com/pytorch/pytorch/issues/47304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47799 Reviewed By: ZolotukhinM Differential Revision: D24904185 Pulled By: nickgg fbshipit-source-id: baa38726966c94ee50473cc046b9ded5c4e748f7	2020-11-12 11:02:34 -08:00
Jerry Zhang	1478e5ec2a	[quant] Remove nn.quantized.ReLU module and nn.quantized.functional.relu (#47415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47415 nn.ReLU works for both float and quantized input, we don't want to define an nn.quantized.ReLU that does the same thing as nn.ReLU, similarly for nn.quantized.functional.relu this also removes the numerical inconsistency for models quantizes nn.ReLU independently in qat mode Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D24747035 fbshipit-source-id: b8fdf13e513a0d5f0c4c6c9835635bdf9fdc2769	2020-11-12 10:56:30 -08:00
Mingzhe Li	66f9b1de1b	[NCCL] enable p2p tests (#47797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47797 NCCL p2p tests had hang issues before, the reason is that there were some unexpected context switches. For example, process 1 which is supposed to only use GPU1 could use GPU0 as a result of missing explicitly setting device. ghstack-source-id: 116461969 Test Plan: waitforsandcastle Reviewed By: jiayisuse Differential Revision: D24863808 fbshipit-source-id: 92bd3a4874be8334210c7c8ee6363648893c963e	2020-11-12 10:44:50 -08:00
David Fan	9ea7a6c7c5	[ONNX] Update ONNX doc for writing pytorch model (#46961 ) Summary: For tracing successfully, we need write pytorch model in torch way. So we add instructions with examples here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46961 Reviewed By: ailzhang Differential Revision: D24900040 Pulled By: bzinodev fbshipit-source-id: b375b533396b11dbc9656fa61e84a3f92f352e4b	2020-11-12 10:16:45 -08:00
Nikita Shulga	d7c8d3cccb	Remove references to `typing` module from setup.py (#47677 ) Summary: It is part of core Python-3.6.2+ Fixes https://github.com/pytorch/pytorch/issues/47596 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47677 Reviewed By: walterddr Differential Revision: D24860188 Pulled By: malfet fbshipit-source-id: ad72b433a4493ebe5caca97c2e8a9d4b3c8172d4	2020-11-12 10:04:38 -08:00
Edward Yang	809660ffa4	ATen DerivedType is dead, long live ATen RegisterDispatchKey (#47011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47011 smessmer has complained about how it is difficult to find generated code. Well hopefully this diffs helps a bit with that. There are three components to this refactor: - Rename TypeDerived (CPUType) to RegisterDispatchKey (RegisterCPU). The 'Type' nomenclature is vestigial and I think Register says what these files do a lot more clearly. I also got rid of the CPUType namespace; everything just goes in anonymous namespace now, less moving parts this way. - Give Math and DefaultBackend their own files (RegisterMath and RegisterDefaultBackend) - Restructure code generation so that schema definition is done completely separately from RegisterDispatchKey I decided to name the files RegisterCPU rather than the old convention BackendSelectRegister, because it seems better to me if these files clump together in an alphabetical listing rather than being spread out everywhere. There are a few manual registration files which should probably get similar renaming. I also did a little garden cleaning about how we identify if a dispatch key is a cuda key or a generic key (previously called KEYWORD_ALL_BACKENDS but I like my naming better). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D24600806 Test Plan: Imported from OSS Reviewed By: smessmer Pulled By: ezyang fbshipit-source-id: c1b510dd7515bd95e3ad25b8edf961b2fb30a25a	2020-11-12 09:53:48 -08:00
Yanan Cao	00a3add425	[TorchBind] Support using lambda function as TorchBind constructor (#47819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47819 Reviewed By: wanchaol Differential Revision: D24910065 Pulled By: gmagogsfm fbshipit-source-id: ad5b4f67b0367e44fe486d31a060d9ad1e0cf568	2020-11-12 09:29:34 -08:00
Gregory Chanan	b6cb2caa68	Revert "Fixed einsum compatibility/performance issues (#46398 )" (#47821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47821 This reverts commit a5c65b86ce249f5f2d365169e6315593fbd47b61. Conflicts: test/test_linalg.py Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24909923 Pulled By: gchanan fbshipit-source-id: 9dcf98e7c4a3c7e5aaffe475867fa086f3bb6ff2	2020-11-12 08:11:40 -08:00
Stephen Jia	cfe3defd88	[vulkan] Enable prepacked addmm/mm for linear layers (#47815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47815 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D24908605 Pulled By: SS-JIA fbshipit-source-id: e658bc2dbf23d5d911b979d3b8f467508f2fdf0c	2020-11-12 08:04:01 -08:00
anjali411	e1ee3bfc0e	Port bmm and baddbmm from TH to ATen (#42553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42553 Ports `torch.bmm` and `torch.baddbmm` from TH to ATen, as well as adds support for complex dtypes. Also removes dead TH code for Level 2 functions. Closes #24539 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24893511 Pulled By: anjali411 fbshipit-source-id: 0eba3f2aec99c48b3018a5264ee7789279cfab58	2020-11-12 07:57:42 -08:00
Wanchao Liang	553ccccc54	[c10d] switch ProcessGroup to be managed by intrusive_ptr (#47343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47343 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24723418 Pulled By: wanchaol fbshipit-source-id: 0463819b96c53b12bdbb3905431110d7b21beb77	2020-11-12 07:36:23 -08:00
Kyle Chen	859e054314	skip test_all_reduce_sum_cuda_async test case for ROCM (#47630 ) Summary: Skip the following test case for rocm (When PYTORCH_TEST_WITH_ROCM=1): - test_all_reduce_sum_cuda_async (__main__.TestDistBackendWithFork) jeffdaily pruthvistony Pull Request resolved: https://github.com/pytorch/pytorch/pull/47630 Reviewed By: seemethere, heitorschueroff Differential Revision: D24849755 Pulled By: walterddr fbshipit-source-id: b952c81677df2dfd35d459b94ce0f7a5b12c0d5c	2020-11-12 07:19:32 -08:00
Jeff Daily	2df5600155	[ROCm] add skipCUDAIfRocm to test_lingalg test_norm_fro_2_equivalence_old (#47809 ) Summary: This test started failing when ROCm CI moved to 3.9. Skip until triage is complete. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47809 Reviewed By: seemethere Differential Revision: D24906319 Pulled By: walterddr fbshipit-source-id: 0c425f3b21190cfbc5e0d1c3f477d834af40f0ca	2020-11-12 07:12:43 -08:00
ArtistBanda	2907447c97	Spurious numpy writable warning (#47271 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47271 Reviewed By: ailzhang Differential Revision: D24855889 Pulled By: mruberry fbshipit-source-id: beaf232b115872f20fb0292e995a876cdc429868	2020-11-12 00:14:56 -08:00
kshitij12345	4b25d83e9b	torch.dropout: fix non-contiguous layout input (#47552 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47176 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47552 Reviewed By: ailzhang Differential Revision: D24903435 Pulled By: ngimel fbshipit-source-id: ef5398931dddf452f5f734b4aa40c11f4ee61664	2020-11-11 22:56:31 -08:00
Wanchao Liang	a02baa0c7a	[reland][c10d] switch ProcessGroupNCCL:Options to be managed by intrusive_ptr (#47807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47807 reland https://github.com/pytorch/pytorch/pull/47075 Test Plan: wait for ci Reviewed By: gmagogsfm Differential Revision: D24905247 fbshipit-source-id: abd9731d86b3bd48d60bbc90d534823e0c037b93	2020-11-11 22:53:22 -08:00
Wanchao Liang	665ac2f7b0	[reland] [c10d] switch Store to be managed by intrusive_ptr (#47808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47808 reland https://github.com/pytorch/pytorch/pull/47074 Test Plan: wait for ci Reviewed By: gmagogsfm Differential Revision: D24905246 fbshipit-source-id: edeb7e6e486570ce889f12512e9dc02061d6cc03	2020-11-11 22:53:20 -08:00
Wanchao Liang	70ae5685f9	[reland][c10d] switch ProcessGroup::Work to be managed by intrusive_ptr (#47806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47806 reland https://github.com/pytorch/pytorch/pull/44046 Test Plan: wait for ci Reviewed By: gmagogsfm Differential Revision: D24905245 fbshipit-source-id: ad75ace5432fcfd22d513878f5a73c4bb017324e	2020-11-11 22:51:03 -08:00
Supriya Rao	89b371bc28	[quant] Add support for 2D indices for quantized embedding operators (#47766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47766 The operator now supports accepting 2D indices as inputs. For embedding operators, we set the default offsets in the op since the FBGEMM kernel expects it to be set Output shape depends on the shape if the indices. For embedding_bag operator, if indices is 2D (B, N) then offsets should be set to None by user. In this case the input is interpreted as B bags each of fixed length N. Output shape is still 2-D in this case. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingOps.test_embedding_bag_2d_indices python test/test_quantization.py TestQuantizedEmbeddingOps.test_embedding_2d_indices Imported from OSS Reviewed By: jerryzh168 Differential Revision: D24895048 fbshipit-source-id: 2020910e1d85ed8673eedee2e504611ba260d801	2020-11-11 22:44:07 -08:00
Jerry Zhang	47386722da	[quant][graphmode][fx][refactor] insert_observer (#47782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47782 Test Plan: python test/test_quantization.py TestQuantizeFx Imported from OSS Reviewed By: supriyar Differential Revision: D24900305 fbshipit-source-id: b00a90ab85badea7d18ae007cc68d0bcd58ab15c	2020-11-11 21:31:24 -08:00
Jerry Zhang	dd77d5a1d4	[quant][refactor] factor out get_combined_dict function (#47781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47781 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D24900303 fbshipit-source-id: 1a2cb0ec536384abcd140e0d073f0965ed2800cd	2020-11-11 21:01:31 -08:00
Wang Xu	b46787d6d7	add cost_aware_partition (#47673 ) Summary: [WIP]This PR adds cost_aware_partition method in Partitioner class. The method partitions the fx graph module based on the latency of the whole graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47673 Reviewed By: gcatron Differential Revision: D24896685 Pulled By: scottxu0730 fbshipit-source-id: 1b1651fe82ce56554f99d68da116e585c74099ed	2020-11-11 19:31:37 -08:00
Mehdi Mirzazadeh	c5834b6a23	Look in named-buffers of module for tensors (#47641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47641 ghstack-source-id: 116450114 Test Plan: Presubmit tests Reviewed By: jamesr66a Differential Revision: D24848318 fbshipit-source-id: f6ede3def9d6f1357c4fd3406f97721dea06b9f1	2020-11-11 19:08:16 -08:00
Rohan Varma	c9f6e70c09	Refactor DDP uneven inputs control flags (#47394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47394 This is a preliminary refactor for the next diff that will add an additional flag to control whether we throw a StopIteration or not. We basically move the flags for ddp uneven inputs to a simple class. ghstack-source-id: 116428177 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D24739509 fbshipit-source-id: 96bf41bd1c02dd27e68f6f37d08e22f33129b319	2020-11-11 16:51:56 -08:00
Nikita Shulga	e8a73fbf34	Workaround PyTorch debug build crash using old GCC (#47805 ) Summary: gcc-7.4.x or older fails to compile XNNPACK in debug mode with internal compiler error Workaround this in a build script by pasing -O1 optimisation flag to XNNPACK if compiled on older compilers Fixes https://github.com/pytorch/pytorch/issues/47292 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47805 Reviewed By: seemethere Differential Revision: D24905758 Pulled By: malfet fbshipit-source-id: 93f4e3b3b5c10b69734627c50e36b2eb544699c8	2020-11-11 16:33:47 -08:00
Ivan Yashchuk	52ec8b9340	Added CUDA support for complex input for torch.triangular_solve (#46916 ) Summary: `torch.triangular_solve` now works for complex inputs on GPU. I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes. Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46916 Reviewed By: navahgar, agolynski Differential Revision: D24706647 Pulled By: anjali411 fbshipit-source-id: fe780eac93d2ae1b2549539bb385e5fac25213b3	2020-11-11 16:08:11 -08:00
Akshit Khurana	a0c4aae3d5	Free original weight after prepacking in XNNPACK based op (#46541 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46541 When weights are prepacked XNNPACK packs the weights in a separate memory. After that original weights are not needed for inference. Having those weights lying around increase memory footprint, so we would like to remove the original weights once prepacking is done. Test Plan: buck test //caffe2/aten:mobile_memory_cleanup Reviewed By: kimishpatel Differential Revision: D24280928 fbshipit-source-id: 90ffc53b1eabdc545a3ccffcd17fa3137d500cbb	2020-11-11 15:58:35 -08:00
Akshit Khurana	545f624a4a	Mark overriden Tensor method `override` (#47198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47198 Fixes: ```xplat/caffe2/aten/src/ATen/native/xnnpack/OpContext.h:77:10: error: 'run' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override] Tensor run(const Tensor& input);``` Test Plan: CI tests Reviewed By: kimishpatel Differential Revision: D24678573 fbshipit-source-id: 244769cc36d3c1126973a67441aa2d06d2b83b9c	2020-11-11 15:55:52 -08:00
Michael Suo	d4fa84bf5f	Properly serialize types that only appear at function input (#47775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47775 When serializing graphs, we check every node for named types referenced, so that we can register them as dependencies. We were skipping this check for the graph inputs themselves. Since types used at input are almost always used somewhere in the graph, we never noticed this gap until a user reported an issue with NamedTuples. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D24896289 Pulled By: suo fbshipit-source-id: 4ce76816cb7997a7b65e7cea152ea52ed8f27276	2020-11-11 15:27:00 -08:00
Omkar Salpekar	32b4b51254	[Docs] Minor doc fixes for init_process_group (#47644 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47644 Minor Update to the init_process_group docs. ghstack-source-id: 116441798 Test Plan: CI Reviewed By: jiayisuse, mrshenli Differential Revision: D24633432 fbshipit-source-id: fbd38dab464ee156d119f9f0b22ffd0e416c4fd7	2020-11-11 15:21:30 -08:00
Scott Wolchok	0c54ea50bd	[PyTorch] Avoid atomic refcounting in intrusive_ptr::make (#47100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47100 Profiling with Linux `perf` shows that we spend at least 1% of our time doing this increment in our framework overhead benchmark. Here's the inline function breakdown for empty_cpu, which takes 6.91% of the total time: ``` - at::native::empty_cpu - 1.91% at::detail::make_tensor<c10::TensorImpl, c10::intrusive_ptr<c10::StorageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl> >, c10::DispatchKey, caffe2::TypeMeta&> (inlined) - 0.98% c10::make_intrusive<c10::TensorImpl, c10::detail::intrusive_target_default_null_type<c10::TensorImpl>, c10::intrusive_ptr<c10::StorageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl> >, c10::DispatchKey, caffe2::TypeMeta&> (inlined 0.97% c10::intrusive_ptr<c10::TensorImpl, c10::detail::intrusive_target_default_null_type<c10::TensorImpl> >::make<c10::intrusive_ptr<c10::StorageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl> >, c10::DispatchKey, caffe2::TypeMeta&> 0.84% intrusive_ptr<c10::TensorImpl, c10::detail::intrusive_target_default_null_type<c10::TensorImpl> > (inlined) - 1.44% c10::make_intrusive<c10::StorageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl>, c10::StorageImpl::use_byte_size_t, long&, c10::DataPtr, c10::Allocator&, bool> (inlined) - 1.44% c10::intrusive_ptr<c10::StorageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl> >::make<c10::StorageImpl::use_byte_size_t, long&, c10::DataPtr, c10::Allocator&, bool> (inlined) 1.02% std::__atomic_base<unsigned long>::operator++ (inlined) - 0.80% ~DataPtr (inlined) ~UniqueVoidPtr (inlined) ~unique_ptr (inlined) - 0.78% c10::TensorOptions::memory_format (inlined) - c10::TensorOptions::set_memory_format (inlined) - c10::optional<c10::MemoryFormat>::operator bool (inlined) c10::optional<c10::MemoryFormat>::initialized (inlined) ``` This change comes with a caveat: if we have constructors where `this` escapes to another thread before returning, we cannot make this assumption, because that other thread may have called `intrusive_ptr::make` already. I chose to just mandate that `instrusive_ptr_target`s's ctors hand back exclusive ownership of `this`, which seems like a reasonable requirement for a ctor anyway. If that turns out to be unacceptable, we could provide an opt-out from this optimization via a traits struct or similar template metaprogramming shenanigan. ghstack-source-id: 116368592 Test Plan: Run framework overhead benchmark. Results look promising, ranging from a tiny regression (? presumably noise) on the InPlace benchmark, 2.5% - 4% on OutOfPlace, to 9% on the empty benchmarks and 10-12% on the view benchmarks. Reviewed By: ezyang Differential Revision: D24606531 fbshipit-source-id: 1cf022063dab71cd1538535c72c4844d8dd7bb25	2020-11-11 15:09:56 -08:00
Facebook Community Bot	f2b7c38735	Automated submodule update: FBGEMM (#47605 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `eb55572e55` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47605 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jianyuh Differential Revision: D24833658 Pulled By: heitorschueroff fbshipit-source-id: 7a577c75d244a58d94c249c0e50992078a3b62cb	2020-11-11 14:50:45 -08:00
skyline75489	fcd44ce698	Add instruction on how to handle the potential linker error on Linux (#47593 ) Summary: The original issue is https://github.com/pytorch/pytorch/issues/16683, which contains a https://github.com/pytorch/pytorch/issues/16683#issuecomment-459982988 that suggests manually un-shadowing the `ld`. A better approach can be found at https://github.com/ContinuumIO/anaconda-issues/issues/11152#issuecomment-573120962, which suggests that using a newer version can effectively fix this. It took me quite some time to realize that this is in fact an issue caused by Anaconda. I think we should add it in README. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47593 Reviewed By: ailzhang Differential Revision: D24866092 Pulled By: heitorschueroff fbshipit-source-id: c1f51864d23fd6f4f63a117496d8619053e35196	2020-11-11 14:24:33 -08:00
Sebastian Messmer	7864ae9f98	Improve error messages for operator registration API (#47636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47636 Previously: ``` terminate called after throwing an instance of 'c10::Error' what(): cpp_signature == cpp_signature_->signature INTERNAL ASSERT FAILED at "caffe2/aten/src/ATen/core/dispatch/OperatorEntry.cpp":92, please report a bug to PyTorch. Tried to register a kernel (registered at buck-out/dev/gen/caffe2/generate-code/autograd/generated/TraceType_2.cpp:9847) for operator aten::div.out (registered at buck-out/dev/gen/caffe2/aten/gen_aten=TypeDefault.cpp/TypeDefault.cpp:3541) for dispatch key Tracer, but the C++ function signature at::Tensor& (at::Tensor const&, at::Tensor const&, at::Tensor&) mismatched with a previous kernel (registered at buck-out/dev/gen/caffe2/aten/gen_aten=CPUType.cpp/CPUType.cpp:2166) that had the signature at::Tensor& (at::Tensor&, at::Tensor const&, at::Tensor const&) ``` Now: ``` terminate called after throwing an instance of 'c10::Error' what(): cpp_signature == cpp_signature_->signature INTERNAL ASSERT FAILED at "caffe2/aten/src/ATen/core/dispatch/OperatorEntry.cpp":96, please report a bug to PyTorch. Mismatch in kernel C++ signatures operator: aten::div.out(Tensor self, Tensor other, *, Tensor(a!) out) -> (Tensor(a!)) registered at buck-out/dev/gen/caffe2/aten/gen_aten=TypeDefault.cpp/TypeDefault.cpp:3541 kernel 1: at::Tensor& (at::Tensor&, at::Tensor const&, at::Tensor const&) dispatch key: CPU registered at buck-out/dev/gen/caffe2/aten/gen_aten=CPUType.cpp/CPUType.cpp:2166 kernel 2: at::Tensor& (at::Tensor const&, at::Tensor const&, at::Tensor&) dispatch key: Tracer registered at buck-out/dev/gen/caffe2/generate-code/autograd/generated/TraceType_2.cpp:9847 ``` Previously: ``` W1109 13:38:52.464170 1644302 OperatorEntry.cpp:117] Warning: Registering a kernel (registered at caffe2/torch/csrc/autograd/VariableTypeManual.cpp:310) for operator aten::_backward (registered at buck-out/dev/gen/caffe2/aten/gen_aten=TypeDefault.cpp/TypeDefault.cpp:3549) for dispatch key Autograd that overwrote a previously registered kernel (registered at caffe2/torch/csrc/autograd/VariableTypeManual.cpp:310) with the same dispatch key for the same operator. (function registerKernel) ``` Now: ``` W1109 13:49:40.501817 1698959 OperatorEntry.cpp:118] Warning: Overriding a previously registered kernel for the same operator and the same dispatch key operator: aten::_backward(Tensor self, Tensor[] inputs, Tensor? gradient=None, bool? retain_graph=None, bool create_graph=False) -> () registered at buck-out/dev/gen/caffe2/aten/gen_aten=TypeDefault.cpp/TypeDefault.cpp:3549 dispatch key: Autograd previous kernel: registered at caffe2/torch/csrc/autograd/VariableTypeManual.cpp:310 new kernel: registered at caffe2/torch/csrc/autograd/VariableTypeManual.cpp:310 (function registerKernel) ``` Previously: ``` terminate called after throwing an instance of 'c10::Error' what(): In registration for dummy_library::dummy_op: expected schema of operator to be "dummy_library::dummy_op(Tensor a) -> (Tensor)" (registered at caffe2/torch/csrc/autograd/VariableTypeManual.cpp:298), but got inferred schema "(Tensor _0) -> ()" (registered at caffe2/torch/csrc/autograd/VariableTypeManual.cpp:298). The number of returns is different. 1 vs 0 ``` Now: ``` terminate called after throwing an instance of 'c10::Error' what(): Inferred operator schema for a C++ kernel function doesn't match the expected function schema. operator: dummy_library::dummy_op expected schema: dummy_library::dummy_op(Tensor a) -> (Tensor) registered at caffe2/torch/csrc/autograd/VariableTypeManual.cpp:298 inferred schema: (Tensor _0) -> () registered at caffe2/torch/csrc/autograd/VariableTypeManual.cpp:298 reason: The number of returns is different. 1 vs 0 ```` Previously: ``` terminate called after throwing an instance of 'c10::Error' what(): !cpp_signature_.has_value() \|\| (CppSignature::make<FuncType>() == cpp_signature_->signature) INTERNAL ASSERT FAILED at "caffe2/aten/src/ATen/core/dispatch/OperatorEntry.h":170, please report a bug to PyTorch. Tried to access operator _test::dummy with a wrong signature. Accessed with void (at::Tensor, long) but the operator was registered with void (at::Tensor) (schema: registered by RegisterOperators, kernel: registered by RegisterOperators) This likely happened in a call to OperatorHandle::typed<Return (Args...)>(). Please make sure that the function signature matches the signature in the operator registration call. ``` Now: ``` terminate called after throwing an instance of 'c10::Error' what(): !cpp_signature_.has_value() \|\| (CppSignature::make<FuncType>() == cpp_signature_->signature) INTERNAL ASSERT FAILED at "caffe2/aten/src/ATen/core/dispatch/OperatorEntry.h":169, please report a bug to PyTorch. Tried to access or call an operator with a wrong signature. operator: _test::dummy(Tensor dummy) -> () registered by RegisterOperators correct signature: void (at::Tensor) registered by RegisterOperators accessed/called as: void (at::Tensor, long) This likely happened in a call to OperatorHandle::typed<Return (Args...)>(). Please make sure that the function signature matches the signature in the operator registration call. ``` ghstack-source-id: 116359052 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D24846523 fbshipit-source-id: 0ce7d487b725bfbdf2261e36027cb34ef50c1fea	2020-11-11 14:19:38 -08:00
Richard Zou	05a76ed705	Batching rule for torch.squeeze(tensor) (#47632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47632 This one is fun because we have to be careful not to squeeze out any of the batch dims (it is the dims of the per-example tensor that are being squeezed). Test Plan: - new tests Reviewed By: anjali411 Differential Revision: D24859022 Pulled By: zou3519 fbshipit-source-id: 8adbd80963081efb683f62ea074a286a10da288f	2020-11-11 14:08:39 -08:00
Richard Zou	df887936a4	Fix transpose batching rule (#47628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47628 Pytorch has a special case where scalar_tensor.transpose(0, 0) works and returns the scalar tensor. If the following happens: ```py >>> x = torch.randn(B0) # the per-examples are all scalars >>> vmap(lambda x: x.transpose(0, 0), x) ``` then we replicate this behavior Test Plan: - new tests Reviewed By: anjali411 Differential Revision: D24843658 Pulled By: zou3519 fbshipit-source-id: e33834122652473e34a18ca1cecf98e8a3b84bc1	2020-11-11 14:08:37 -08:00
Richard Zou	f6ff6478cf	Make kwargs argument optional in _batched_grad_test (#47625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47625 kwargs is {} most of the time so this PR makes it optional. Note that it is bad practice for {} to be a default argument; we work around this by using None as the default and handling it accordingly. Test Plan - `pytest test/test_vmap.py -v` Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D24842571 Pulled By: zou3519 fbshipit-source-id: a46b0c6d5240addbe3b231b8268cdc67708fa9e0	2020-11-11 14:08:35 -08:00
Richard Zou	fc24d0656a	Tensor.contiguous, Tensor.is_contiguous batch rule (#47621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47621 Followup to #47365. is_contiguous on BatchedTensorImpl is implemented as: - Whenever one creates a BatchedTensorImpl, we cache the strides of the per-examples, just like how we cache the sizes of the per-examples. - With the cached strides, we use TensorImpl::refresh_contiguous() to compute if the tensor is contiguous or not. - is_contiguous checks the `is_contiguous_` flag that refresh_contiguous() populates. Both contiguous and is_contiguous only support torch.contiguous_format. I'm not sure what the semantics should be for other memory formats; they are also rank dependent (e.g., channels_last tensor must have 4 dimensions) which makes this a bit tricky. Test Plan: - new tests Reviewed By: Chillee, anjali411 Differential Revision: D24840975 Pulled By: zou3519 fbshipit-source-id: 4d86dbf11e2eec45f3f08300ae3f2d79615bb99d	2020-11-11 14:06:05 -08:00
Shen Li	6c815c71b3	Revert to use NCCL 2.7.8-1 (#47638 ) Summary: Only depend on stable NCCL releases Pull Request resolved: https://github.com/pytorch/pytorch/pull/47638 Reviewed By: mingzhe09088 Differential Revision: D24847765 Pulled By: mrshenli fbshipit-source-id: 2c5f29602aa7403c110797cb07f8fb6151a1b60d	2020-11-11 13:05:09 -08:00
shubhambhokare1	1abe6e5ad4	[ONNX] Bool inputs to index_put updated symbolic (#46866 ) Summary: Cases with bool inputs to index_put nodes were handled for tracing purposes. This PR adds support for similar situations in scripting Pull Request resolved: https://github.com/pytorch/pytorch/pull/46866 Reviewed By: malfet Differential Revision: D24870818 Pulled By: bzinodev fbshipit-source-id: 2d75ca6f5f4b79d8c5ace337633c5aed3bdc4be7	2020-11-11 12:45:31 -08:00
Negin Raoof	da2e2336b6	[ONNX] Export and shape inference for prim uninitialized in If subblock (#46094 ) Summary: Enable export of prim::Uninitialized in If subblock outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46094 Reviewed By: houseroad Differential Revision: D24838537 Pulled By: bzinodev fbshipit-source-id: d0719b140393595e6df114ef5cc1bb845e919c14	2020-11-11 12:10:49 -08:00
Peiyao Zhou	4078f44668	[TB][embedding supporting] Modify histogram to accept multipy types to skip Castop and avoid OOMing in Castop Summary: To support min/max/mean/std, SummarizeOp need to skip size checking (similar to the LpNorm error mentioned above) and accept multiple types Test Plan: unit test: `buck test //caffe2/caffe2/fb/tensorboard/tests:tensorboard_accumulate_histogram_op_test` https://our.intern.facebook.com/intern/testinfra/testrun/1407375057859572 `buck test //caffe2/caffe2/fb/tensorboard/tests:tensorboard_accumulate_histogram_op_test --stress-runs 1000` https://our.intern.facebook.com/intern/testinfra/testrun/2533274832166362 Reviewed By: cryptopic Differential Revision: D24605507 fbshipit-source-id: fa08372d7c9970083c38abd432d4c86e84fb10e0	2020-11-11 12:03:54 -08:00
Rong Rong	513f62b45b	[hotfix] fix collect_env not working when torch compile/install fails (#47752 ) Summary: fix collect env not working when pytorch compile from source failed mid-way. ``` Traceback (most recent call last): OSError: /home/rongr/local/pytorch/torch/lib/libtorch_global_deps.so: cannot open shared object file: No such file or directory ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47752 Reviewed By: janeyx99 Differential Revision: D24888576 Pulled By: walterddr fbshipit-source-id: 3b20daeddbb4118491fb0cca9fb59d861f683da7	2020-11-11 11:47:49 -08:00
Ivan Yashchuk	a1db5b0f2b	Added CUDA support for complex input for torch.inverse #2 (#47595 ) Summary: `torch.inverse` now works for complex inputs on GPU. Opening a new PR here. The previous PR was merged and reverted due to a bug in tests marked with `slowTest`. Previous PR https://github.com/pytorch/pytorch/pull/45034 Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47595 Reviewed By: navahgar Differential Revision: D24840955 Pulled By: anjali411 fbshipit-source-id: ec49fffdc4b3cb4ae7507270fa24e127be14f59b	2020-11-11 11:06:08 -08:00
James Reed	dbfee42a7d	[FX] Fix uses not updating when erasing a node (#47720 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47720 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D24875880 Pulled By: jamesr66a fbshipit-source-id: aae9ffd10f8085b599e7923152287c6e6950ff49	2020-11-11 11:02:15 -08:00
James Reed	d1351c66a8	[FX] Add a bunch of docstrings (#47719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47719 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D24875400 Pulled By: jamesr66a fbshipit-source-id: a1dd43d2eee914a441eff43c4f2efe61a399e8a5	2020-11-11 10:59:57 -08:00
Wanchao Liang	dac0192148	Revert D23632280: [c10d] switch ProcessGroup::Work to be managed by intrusive_ptr Test Plan: revert-hammer Differential Revision: D23632280 (`0650a6166f`) Original commit changeset: 0a4642a8ffab fbshipit-source-id: 2aa8ddb874fab11f773f4c08d740afcd865482e9	2020-11-11 10:54:08 -08:00
Wanchao Liang	1f946e942d	Revert D24667128: [c10d] switch Store to be managed by intrusive_ptr Test Plan: revert-hammer Differential Revision: D24667128 (`0cfe3451d4`) Original commit changeset: 9b6024c31c85 fbshipit-source-id: d8ddf9eb2fccef5023e05698e0c4662708fe4945	2020-11-11 10:49:58 -08:00
Wanchao Liang	2204374fd4	Revert D24667127: [c10d] switch ProcessGroupNCCL:Options to be managed by intrusive_ptr Test Plan: revert-hammer Differential Revision: D24667127 (`ae5c2febb9`) Original commit changeset: 54986193ba1b fbshipit-source-id: 12e1ebea1981c0b1b6dff4c8a2e2045878d44537	2020-11-11 10:42:33 -08:00
Edward Yang	0c64f9f526	Convert from higher order functions to classes in tools.codegen.gen (#47008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47008 bhosmer has been complaining about how it is difficult to distinguish between local variables and closed over variables in the higher order functions. Well, closures and objects do basically the same thing, so just convert all these HOFs into objects. The decoder ring: - Higher order function => Constructor for object - Access to closed over variable => Access to member variable on object - with_native_function => method_with_native_function (because it's hard writing decorators that work for both functions and methods) I didn't even have to change indentation (much). When there is no need for closed over variables (a few functions), I kept them as plain old functions, no need for an object with no members. While I was at it, I also deleted the kwargs, since the types are enough to prevent mistakes. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24600805 Pulled By: ezyang fbshipit-source-id: 7e3ce8cb2446e3788f934ddcc17f7da6e9299511	2020-11-11 10:30:50 -08:00
Hameer Abbasi	d478605dec	Fix classmethod override argument passing. (#47114 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47069. Fixes https://github.com/pytorch/pytorch/issues/46824. Fixes https://github.com/pytorch/pytorch/issues/47186 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47114 Reviewed By: ngimel Differential Revision: D24649598 Pulled By: ezyang fbshipit-source-id: af077affece7eceb1e4faf9c94d15484796b0f0e	2020-11-11 09:25:48 -08:00
Jerry Zhang	1239d067ae	[quant][graphmode][fx] Support standalone_module_class (#47705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47705 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D24872380 fbshipit-source-id: db2ec7ba03da27203033fbebc11666be572622bb	2020-11-11 09:15:14 -08:00
Ansley Ussery	4cb73f5a4c	Allow for string literal return during symbolic tracing (#47618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47618 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D24870422 Pulled By: ansley fbshipit-source-id: 41c56c2f4f1f7bb360cea0fb346f6e4d495f5c2b	2020-11-11 08:54:39 -08:00
Edward Yang	48ed577fbd	Stop including TypeDefault.h from MPSCNNTests.mm (#46998 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46998 It's not using any TypeDefault symbols directly; running CI to see if it was being included for other headers. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D24621920 Pulled By: ezyang fbshipit-source-id: f868e5412ff3e5a616c3fc38110f203ca545eed5	2020-11-11 08:46:56 -08:00
Xingying Cheng	88ec72e1c2	[fbcode][pytorch mobile] Create model reader utilities. Summary: For some of the end to end flow projects, we will need the capabilities to read module information during model validation or model publishing. Creating this model_reader.py for utilities for model content reading, this diff we included the following functionalities: 1. read the model bytecode version; 2. check if a model is lite PyTorch script module; 3. check if a model is PyTorch script module. This diff is recreated from the reverted diff: D24655999 (`7f056e99dd`). Test Plan: ``` [xcheng16@devvm1099]/data/users/xcheng16/fbsource/fbcode% buck test //caffe2/torch/fb/mobile/tests:mobile_model_reader_tests Action graph will be rebuilt because files have been added or removed. Parsing buck files: finished in 10.4 sec Creating action graph: finished in 22.2 sec Building: finished in 01:29.1 min (100%) 10619/10619 jobs, 1145 updated Total time: 02:01.8 min More details at https://www.internalfb.com/intern/buck/build/f962dfad-76f9-457a-aca3-768ce20f0c31 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 172633f6-6b5b-49e9-a632-b4efa083a001 Trace available for this run at /tmp/tpx-20201109-165156.109798/trace.log Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649712677511 ✓ ListingSuccess: caffe2/torch/fb/mobile/tests:mobile_model_reader_tests - main (18.229) ✓ Pass: caffe2/torch/fb/mobile/tests:mobile_model_reader_tests - test_is_pytorch_lite_module (caffe2.torch.fb.mobile.tests.test_model_reader.TestModelLoader) (8.975) ✓ Pass: caffe2/torch/fb/mobile/tests:mobile_model_reader_tests - test_is_pytorch_script_module (caffe2.torch.fb.mobile.tests.test_model_reader.TestModelLoader) (9.136) ✓ Pass: caffe2/torch/fb/mobile/tests:mobile_model_reader_tests - test_read_module_bytecode_version (caffe2.torch.fb.mobile.tests.test_model_reader.TestModelLoader) (9.152) Summary Pass: 3 ListingSuccess: 1 Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649712677511 ``` Reviewed By: husthyc Differential Revision: D24848563 fbshipit-source-id: ab3371e111206a4bb4d07715c3314596cdc38d2c	2020-11-11 08:11:28 -08:00
Erjia Guan	5647f0ca7c	Revert D24859919: [pytorch][PR] Grammatically updated the tech docs Test Plan: revert-hammer Differential Revision: D24859919 (`a843d48ead`) Original commit changeset: 5c6a8bc8e785 fbshipit-source-id: f757995fb64cfd4212c978618d572367e7296758	2020-11-11 07:43:17 -08:00
Wanchao Liang	ae5c2febb9	[c10d] switch ProcessGroupNCCL:Options to be managed by intrusive_ptr (#47075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47075 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D24667127 Pulled By: wanchaol fbshipit-source-id: 54986193ba1b22480622a2e9d6d41d9472d201f3	2020-11-10 23:36:47 -08:00
Wanchao Liang	0cfe3451d4	[c10d] switch Store to be managed by intrusive_ptr (#47074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47074 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24667128 Pulled By: wanchaol fbshipit-source-id: 9b6024c31c851b7c3243540f460ae57323da523b	2020-11-10 23:36:44 -08:00
Wanchao Liang	0650a6166f	[c10d] switch ProcessGroup::Work to be managed by intrusive_ptr (#44046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44046 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23632280 Pulled By: wanchaol fbshipit-source-id: 0a4642a8ffabdd26c52c1baabfa30c0f446c3c85	2020-11-10 23:30:22 -08:00
Natalia Gimelshein	cbf439caf1	Unbreak backward compatibility tests (#47726 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/47726 Reviewed By: Chillee Differential Revision: D24880651 Pulled By: ngimel fbshipit-source-id: 1e70f42d98c7a14265aed743669592b4fc08c8d4	2020-11-10 21:37:39 -08:00
Ivan Kobzarev	bfec376e9f	[vulkan] Apply new changes to vulkan api v1 (#47721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47721 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D24877403 Pulled By: IvanKobzarev fbshipit-source-id: acfa8217c10d14bf38472abfc1e6f6216557c359	2020-11-10 20:10:29 -08:00
peter	d73a8db2d2	Use local env for building CUDA extensions on Windows (#47150 ) Summary: Fixes https://github.com/pytorch/vision/pull/2818#issuecomment-719167504 After activating the VC env multiple times, the following error will be raised when building a CUDA extension. ``` FAILED: C:/tools/MINICO~1/CONDA-~2/TORCHV~1/work/build/temp.win-amd64-3.8/Release/tools/MINICO~1/CONDA-~2/TORCHV~1/work/torchvision/csrc/cuda/PSROIAlign_cuda.obj C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\nvcc -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -DWITH_CUDA -Dtorchvision_EXPORTS -IC:\tools\MINICO~1\CONDA-~2\TORCHV~1\work\torchvision\csrc -I%PREFIX%\lib\site-packages\torch\include -I%PREFIX%\lib\site-packages\torch\include\torch\csrc\api\include -I%PREFIX%\lib\site-packages\torch\include\TH -I%PREFIX%\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include" -I%PREFIX%\include -I%PREFIX%\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -I%PREFIX%\Library\include -c C:\tools\MINICO~1\CONDA-~2\TORCHV~1\work\torchvision\csrc\cuda\PSROIAlign_cuda.cu -o C:\tools\MINICO~1\CONDA-~2\TORCHV~1\work\build\temp.win-amd64-3.8\Release\tools\MINICO~1\CONDA-~2\TORCHV~1\work\torchvision\csrc\cuda\PSROIAlign_cuda.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_50,code=compute_50 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 'cl.exe' is not recognized as an internal or external command, operable program or batch file. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47150 Reviewed By: agolynski Differential Revision: D24706019 Pulled By: ezyang fbshipit-source-id: c13dc29f62d2d12d6a56f33dd450b467a1bf193b	2020-11-10 20:02:06 -08:00
Hameer Abbasi	7908bf27d5	Fix output type of torch.max for Tensor subclasses. (#47110 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47090 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47110 Reviewed By: ngimel Differential Revision: D24649568 Pulled By: ezyang fbshipit-source-id: 9374cf0c562de78e520bcb03415db273c1dd76a3	2020-11-10 19:45:36 -08:00
Heitor Schueroff	a5c65b86ce	Fixed einsum compatibility/performance issues (#46398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46398 This PR makes torch.einsum compatible with numpy.einsum except for the sublist input option as requested here https://github.com/pytorch/pytorch/issues/21412. It also fixed 2 performance issues linked below and adds a check for reducing to torch.dot instead of torch.bmm which is faster in some cases. fixes #45854, #37628, #30194, #15671 fixes #41467 with benchmark below ```python import torch from torch.utils.benchmark import Timer a = torch.randn(10000, 100, 101, device='cuda') b = torch.randn(10000, 101, 3, device='cuda') c = torch.randn(10000, 100, 1, device='cuda') d = torch.randn(10000, 100, 1, 3, device='cuda') print(Timer( stmt='torch.einsum("bij,bjf->bif", a, b)', globals={'a': a, 'b': b} ).blocked_autorange()) print() print(Timer( stmt='torch.einsum("bic,bicf->bif", c, d)', globals={'c': c, 'd': d} ).blocked_autorange()) ``` ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413850> torch.einsum("bij,bjf->bif", a, b) Median: 4.53 ms IQR: 0.00 ms (4.53 to 4.53) 45 measurements, 1 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413700> torch.einsum("bic,bicf->bif", c, d) Median: 63.86 us IQR: 1.52 us (63.22 to 64.73) 4 measurements, 1000 runs per measurement, 1 thread ``` fixes #32591 with benchmark below ```python import torch from torch.utils.benchmark import Timer a = torch.rand(1, 1, 16, 2, 16, 2, 16, 2, 2, 2, 2, device="cuda") b = torch.rand(729, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, device="cuda") print(Timer( stmt='(a * b).sum(dim = (-3, -2, -1))', globals={'a': a, 'b': b} ).blocked_autorange()) print() print(Timer( stmt='torch.einsum("...ijk, ...ijk -> ...", a, b)', globals={'a': a, 'b': b} ).blocked_autorange()) ``` ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de28850> (a * b).sum(dim = (-3, -2, -1)) Median: 17.86 ms 2 measurements, 10 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de286a0> torch.einsum("...ijk, ...ijk -> ...", a, b) Median: 296.11 us IQR: 1.38 us (295.42 to 296.81) 662 measurements, 1 runs per measurement, 1 thread ``` TODO - [x] add support for ellipsis broadcasting - [x] fix corner case issues with sumproduct_pair - [x] update docs and add more comments - [x] add tests for error cases Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24860367 Pulled By: heitorschueroff fbshipit-source-id: 31110ee598fd598a43acccf07929b67daee160f9	2020-11-10 19:38:43 -08:00
Stephen Jia	51a661c027	[vulkan] tentative fix for conv2d_pw, and fix checks for addmm (#47723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47723 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D24878293 Pulled By: SS-JIA fbshipit-source-id: 04abb544b87bd047ffe8af7ed52ec2569c61add4	2020-11-10 19:24:33 -08:00
Ansley Ussery	e914a1b976	Support default args in symbolic tracing (#47615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47615 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D24865060 Pulled By: ansley fbshipit-source-id: 32ff105a1fa9c4a8f00adc20e8d40d1b6bd7157f	2020-11-10 18:57:00 -08:00
Will Feng (DPER)	a5e9fa1b0d	Add max_src_column_width to autograd profiler (#46257 ) Summary: Currently the max `src_column_width` is hardcoded to 75 which might not be sufficient for modules with long file names. This PR exposes `max_src_column_width` as a changeable parameter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46257 Reviewed By: malfet Differential Revision: D24280834 Pulled By: yf225 fbshipit-source-id: 8a90a433c6257ff2d2d79f67a944450fdf5dd494	2020-11-10 18:51:39 -08:00
Shen Li	1b954749d0	Disable test_distributed_for for multigpu test env (#47703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47703 Differential Revision: D24871454 Test Plan: Imported from OSS Reviewed By: rohan-varma Pulled By: mrshenli fbshipit-source-id: 2112867c2aa551392fab16b984c59bcb59ae16ad	2020-11-10 18:46:31 -08:00
BowenBao	4de40dad5d	[ONNX] Improve stability of gemm export (#46570 ) Summary: Export as `onnx::MatMul` if possible since it has less constraint. Resolves issue with exporting `weight_norm` in scripting that fails onnx shape inference with `onnx::Gemm` in unreachable `if` subgraph. Updates skipped tests list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46570 Reviewed By: ngimel Differential Revision: D24657480 Pulled By: bzinodev fbshipit-source-id: 08d47cc9fc01c4a73a9d78c964fef102d12cc21c	2020-11-10 18:32:33 -08:00
Ashkan Aliabadi	69532c4227	Vulkan MobileNetv2 unit test. (#47616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47616 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D24848639 Pulled By: AshkanAliabadi fbshipit-source-id: 81a432a14cdca444ec0f70a4f8692a3abf4d2ea9	2020-11-10 17:28:39 -08:00
Heitor Schueroff	bf6a156f64	Fix kthvalue error for scalar input (#47600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47600 fixes https://github.com/pytorch/pytorch/issues/30818 Note that the median case was already fixed by https://github.com/pytorch/pytorch/pull/45847 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24860337 Pulled By: heitorschueroff fbshipit-source-id: 69ccbbb6c7c86671e5712b1c2056c012d898b4f2	2020-11-10 17:21:52 -08:00
kshitij12345	6575e674ce	[numpy] torch.{all, any} : Extend Dtype Support (#44790 ) Summary: Reference https://github.com/pytorch/pytorch/issues/44779 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44790 Reviewed By: bdhirsh Differential Revision: D24393119 Pulled By: heitorschueroff fbshipit-source-id: a9b88e9d06b3c282f2e5360b6eaea4ae8ef77c1d	2020-11-10 17:11:39 -08:00
Natalia Gimelshein	c9d37675b2	Back out "[pytorch][PR] The dimension being reduced should not be coalesced by TensorIterator" (#47642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47642 Original commit changeset: 02bb2b15694c Test Plan: Covered by CI tests Reviewed By: anjali411 Differential Revision: D24849072 fbshipit-source-id: a8790cbf46936aee7a6f504dac8595997175fc65	2020-11-10 16:31:33 -08:00
Meng Wang	f692af209d	add unittest for operator benchmark (#47678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47678 add unittest for operator benchmark. Covers below cases: ``` generate_c2_test generate_c2_gradient_test generate_pt_test generate_pt_gradient_test generate_pt_tests_from_op_list ``` Also fixed two issues (incorrect fn signature) found by the unittest in `benchmark_caffe2.py` Test Plan: arc lint buck run caffe2/benchmarks/operator_benchmark:operator_benchmark_unittest ``` test_c2_single_op (operator_benchmark_unittest.BenchmarkTest) ... # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking Caffe2: add WARNING: Logging before InitGoogleLogging() is written to STDERR W1109 23:08:39.932207 639464 init.h:137] Caffe2 GlobalInit should be run before any other API calls. # Name: add_M8 # Input: M: 8 Forward Execution Time (us) : 36.474 # Benchmarking Caffe2: add # Name: add_M8 # Input: M: 8 Backward Execution Time (us) : 42.281 ok test_pt_list_of_ops (operator_benchmark_unittest.BenchmarkTest) ... # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking Caffe2: add # Name: add_M8 # Input: M: 8 Forward Execution Time (us) : 36.579 # Benchmarking Caffe2: add # Name: add_M8 # Input: M: 8 Backward Execution Time (us) : 42.734 # Benchmarking PyTorch: abs # Mode: Eager # Name: abs_M8 # Input: M: 8 Forward Execution Time (us) : 148.929 # Benchmarking PyTorch: abs_ # Mode: Eager # Name: abs__M8 # Input: M: 8 Forward Execution Time (us) : 71.909 ok test_pt_single_op (operator_benchmark_unittest.BenchmarkTest) ... # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking Caffe2: add # Name: add_M8 # Input: M: 8 Forward Execution Time (us) : 36.860 # Benchmarking Caffe2: add # Name: add_M8 # Input: M: 8 Backward Execution Time (us) : 42.293 # Benchmarking PyTorch: abs # Mode: Eager # Name: abs_M8 # Input: M: 8 Forward Execution Time (us) : 148.999 # Benchmarking PyTorch: abs_ # Mode: Eager # Name: abs__M8 # Input: M: 8 Forward Execution Time (us) : 71.941 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8 # Input: M: 8 Forward Execution Time (us) : 179.108 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8 # Input: M: 8 Backward Execution Time (us) : 1205.902 ok ``` buck run caffe2/benchmarks/operator_benchmark/c2:add_test ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking Caffe2: add WARNING: Logging before InitGoogleLogging() is written to STDERR W1109 23:20:11.551795 654290 init.h:137] Caffe2 GlobalInit should be run before any other API calls. # Name: add_M8_N16_K32_dtypeint # Input: M: 8, N: 16, K: 32, dtype: int Forward Execution Time (us) : 984.510 # Benchmarking Caffe2: add # Name: add_M16_N16_K64_dtypefloat # Input: M: 16, N: 16, K: 64, dtype: float Forward Execution Time (us) : 68.526 # Benchmarking Caffe2: add # Name: add_M64_N64_K128_dtypeint # Input: M: 64, N: 64, K: 128, dtype: int Forward Execution Time (us) : 101617.076 ``` Reviewed By: mingzhe09088 Differential Revision: D24854414 fbshipit-source-id: 6676549909da6700b42f322c4ad6e8e2ef5b86b5	2020-11-10 15:45:36 -08:00
ashishpandey2600	a843d48ead	Grammatically updated the tech docs (#47345 ) Summary: <img width="1440" alt="Screenshot 2020-11-04 at 1 07 21 PM" src="https://user-images.githubusercontent.com/72745540/98082455-c5f89200-1e9e-11eb-97e3-ae0eb62355f6.png"> small grammatical update to the torch tech docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47345 Reviewed By: malfet Differential Revision: D24859919 Pulled By: ejguan fbshipit-source-id: 5c6a8bc8e785c5295bf6f2f5b583dd6054b96fec	2020-11-10 15:33:26 -08:00
Rong Rong	febc76a5c6	fix assert_allclose doesnt check shape (#47580 ) Summary: fix assert_allclose doesnt check shape should fix https://github.com/pytorch/pytorch/issues/47449. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47580 Reviewed By: samestep Differential Revision: D24836399 Pulled By: walterddr fbshipit-source-id: 943f8c83864bc01e1a782048c234e9592d2f1a25	2020-11-10 15:03:25 -08:00
Jiakai Liu	8e3af9faa8	[pytorch] fix debug symbol flag for android clang (#46331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46331 Fix the android build size issue #46246. Test Plan: Imported from OSS Reviewed By: dhruvbird Differential Revision: D24390061 Pulled By: ljk53 fbshipit-source-id: b4a6f297e89b9c08dff4297c6a41aabd41d9fff5	2020-11-10 14:55:43 -08:00
kshitij12345	baa2f777c8	[complex] torch.sqrt: fix edge values (#47424 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47358 Replace the optimized path with a slower but correct `map(std::sqrt)` Benchmark posted below in comments. cc: dylanbespalko (original author of fast-path) Pull Request resolved: https://github.com/pytorch/pytorch/pull/47424 Reviewed By: walterddr Differential Revision: D24855914 Pulled By: mruberry fbshipit-source-id: c21a38f365d996645db70be96ff1216776bedd3a	2020-11-10 14:51:04 -08:00
Jeff Daily	7691cf175c	[ROCm] set ROCM_ARCH to gfx900 and gfx906 for CI builds (#47683 ) Summary: This change adds the arch settings for caffe2 builds, fixes some typos, and clarifies that this setting applies to both CircleCI and Jenkins. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47683 Reviewed By: zou3519 Differential Revision: D24864034 Pulled By: malfet fbshipit-source-id: 304b8a8e5c929ddaeb9c399f6219783a1369d842	2020-11-10 14:44:48 -08:00
Kyle Chen	ef5f54b2c6	added rocm 3.9 docker image (#47473 ) Summary: Added bionic rocm 3.9 docker image jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/47473 Reviewed By: zou3519 Differential Revision: D24860549 Pulled By: malfet fbshipit-source-id: d12c39970432ed5fc5051cac10a068fd7bb8f7f9	2020-11-10 14:42:10 -08:00
Bowen Bao	14f0675903	[ONNX] Fix dtype for log_softmax export (#46627 ) Summary: Previously dtype was not converted from constant node to python number properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46627 Reviewed By: houseroad Differential Revision: D24657535 Pulled By: bzinodev fbshipit-source-id: 33b0b9087d969f2cb0a2fa608fcf6e10956c06bf	2020-11-10 14:34:46 -08:00
BowenBao	0fb1356a98	[ONNX] Fix eye export (#47016 ) Summary: Previously did not considered the case were optional `m` is not provided in `torch.eye(n, m)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47016 Reviewed By: ejguan Differential Revision: D24735916 Pulled By: bzinodev fbshipit-source-id: ec9b410fc59f27d77d4ae40cb38a67537abb3cd8	2020-11-10 14:24:33 -08:00
Richard Zou	5ce9c70631	Revert D24735802: [pytorch][PR] [ONNX] Update batch_norm symbolic to handle track_running_stats=False Test Plan: revert-hammer Differential Revision: D24735802 (`1a55f5b3ea`) Original commit changeset: bbb29d92d46a fbshipit-source-id: dcd7af6d50e2776e63ee4bfcb9e4baf08a4771b4	2020-11-10 14:04:06 -08:00
Basil Hosmer	6b94830cdc	faithful signature support in BoxedKernelWrapper (#47267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47267 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24701488 Pulled By: bhosmer fbshipit-source-id: dbce246319670f9590c5762ad20c26cb24575fe8	2020-11-10 13:58:36 -08:00
Rohan Varma	0a7ebf00f8	[Reland] Add tests for DDP control flow models. (#47470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47470 Reland of https://github.com/pytorch/pytorch/pull/47206, which was reverted due to failing multigpu tests. The fix to make multigpu tests work is to compare against `torch.tensor([world_size, 0])`, not hardcode `torch.tensor([2, 0]` which assumes a world size of 2. Original commit description: As discussed offline with pritamdamania87, add testing to ensure per-iteration and rank-dependent control flow works as expected in DDP with find_unused_parameters=True. ghstack-source-id: 115993934 ghstack-source-id: 115993934 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D24767893 fbshipit-source-id: 7d7a2449270eb3e72b5061694e897166e16f9bbc	2020-11-10 12:22:59 -08:00
Richard Zou	17c58720fe	Revert D24346771: [caffe2][memonger] Add support for distributed inference predict nets in DAG memonger Test Plan: revert-hammer Differential Revision: D24346771 (`5882f2e540`) Original commit changeset: ad2dd2e63f3e fbshipit-source-id: 90346f08c890eebe71f068748a8e24e4db88c250	2020-11-10 12:11:22 -08:00
Radhakrishnan Venkataramani	163adb9fa7	Add HalfToFloat + FloatToHalf operators to PyTorch (#45092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45092 Adding two operators 1. at::float_to_half -> Converts FP32 tensor to FP16 tensor 2. at::half_to_float -> Converts FP16 tensor to FP32 tensor. These operators internally use the kernel provided by FBGeMM. Both C2 and PT will use the same FBGeMM kernel underneath. Test Plan: buck test //caffe2/test:torch -- .test_half_tensor. Run benchmark locally using ``` buck run //caffe2/benchmarks/operator_benchmark/pt:tensor_to_test ``` AI Bench results are pending. I expect that not to finish as we have large queue with jobs pending for 2+ days. Benchmark for 512x512 tensor with FbGeMM implementation ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: FloatToHalfTensorConversionBenchmark # Mode: Eager # Name: FloatToHalfTensorConversionBenchmark_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 1246.332 # Benchmarking PyTorch: HalfToFloatTensorConversionBenchmark # Mode: Eager # Name: HalfToFloatTensorConversionBenchmark_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 1734.304 ``` Benchmark for 512x512 tensor trunk with no FbGeMM integration. ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: FloatToHalfTensorConversionBenchmark # Mode: Eager # Name: FloatToHalfTensorConversionBenchmark_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 169045.724 # Benchmarking PyTorch: HalfToFloatTensorConversionBenchmark # Mode: Eager # Name: HalfToFloatTensorConversionBenchmark_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 152382.494 ``` Reviewed By: ngimel Differential Revision: D23824869 fbshipit-source-id: ef044459b6c8c6e5ddded72080204c6a0ab4582c	2020-11-10 12:00:53 -08:00
Garret Catron	497cd2506f	Add serialize GraphModule to JSON support (#47612 ) Summary: re-opening PR, missed mypy issues, they are now addressed. Example: class TestModule(torch.nn.Module): def __init__(self): super().__init__() self.linear = torch.nn.Linear(4, 4) self.e = torch.rand(4) def forward(self, a, b): add_1 = a + b linear = self.linear(add_1) add_2 = linear + self.e return add_2 JSON: { "modules": {}, "weights": { "linear.weight": { "dtype": "torch.float32", "is_quantized": false, "shape": "[4, 4]" }, "linear.bias": { "dtype": "torch.float32", "is_quantized": false, "shape": "[4]" }, "e": { "dtype": "torch.float32", "is_quantized": false, "shape": "[4]" } }, "nodes": [ { "shape": "[4]", "dtype": "torch.float32", "target": "a", "op_code": "placeholder", "name": "a", "args": [], "kwargs": {} }, { "shape": "[4]", "dtype": "torch.float32", "target": "b", "op_code": "placeholder", "name": "b", "args": [], "kwargs": {} }, { "shape": "[4]", "dtype": "torch.float32", "target": "_operator.add", "op_code": "call_function", "name": "add_1", "args": [ { "is_node": true, "name": "a" }, { "is_node": true, "name": "b" } ], "kwargs": {} }, { "target": "linear", "op_code": "call_module", "name": "linear_1", "args": [ { "is_node": true, "name": "add_1" } ], "kwargs": {} }, { "shape": "[4]", "dtype": "torch.float32", "target": "e", "op_code": "get_attr", "name": "e", "args": [], "kwargs": {} }, { "shape": "[4]", "dtype": "torch.float32", "target": "_operator.add", "op_code": "call_function", "name": "add_2", "args": [ { "is_node": true, "name": "linear_1" }, { "is_node": true, "name": "e" } ], "kwargs": {} }, { "shape": "[4]", "dtype": "torch.float32", "target": "output", "op_code": "output", "name": "output", "args": [ { "is_node": true, "name": "add_2" } ], "kwargs": {} } ] } Pull Request resolved: https://github.com/pytorch/pytorch/pull/47612 Reviewed By: scottxu0730 Differential Revision: D24836223 Pulled By: gcatron fbshipit-source-id: d3da2b5f90d143beba3b7f1f67462fb7430df906	2020-11-10 11:54:02 -08:00
Yuxin Wu	5cba3cec5a	fix extensions build flags on newer GPUs (#47585 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47352 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47585 Reviewed By: heitorschueroff Differential Revision: D24833654 Pulled By: ezyang fbshipit-source-id: eaec5b8db5f35cac0a74d2858cb054a3853b0990	2020-11-10 11:38:18 -08:00
shubhambhokare1	1a55f5b3ea	[ONNX] Update batch_norm symbolic to handle track_running_stats=False (#47135 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45333 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47135 Reviewed By: ejguan Differential Revision: D24735802 Pulled By: bzinodev fbshipit-source-id: bbb29d92d46a8a74dac0cb01639ddd4ec121a54c	2020-11-10 11:31:33 -08:00
Sam Estep	ccc53901bd	Update CONTRIBUTING and gitignore for docs build (#47539 ) Summary: This PR tries to make building the docs less confusing for new contributors: - `npm` is discouraged on devservers for Facebook employees, so I added another way to install `katex` - the path to `check-doxygen.sh` was wrong, so I fixed it - while generating the CPP docs, it created two new folders that weren't ignored by Git, so I added those to `.gitignore` - I wasn't able to get the SSH tunnel to work, so I added instructions to use `scp` as an alternative I'm not entirely sure how the `docs/cpp/source/{html,latex}/` directories were created since I haven't been able to reproduce them. I also think that it would be better to use the SSH tunnel since `scp` is so much slower, but I just wasn't able to figure it out; I followed the instructions from `CONTRIBUTING.md` and then ran a [Python `http.server`](https://docs.python.org/3/library/http.server.html) on my devserver: ```bash python -m http.server 8000 --bind 127.0.0.1 --directory build/html ``` but my browser failed to connect and my (local) terminal printed error messages (presumably from the SSH command). If anyone knows how to properly set up the SSH tunnel and HTTP server, I add those more detailed instructions to `CONTRIBUTING.md` and remove the `scp` instructions from this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47539 Reviewed By: malfet Differential Revision: D24806833 Pulled By: samestep fbshipit-source-id: 456691018a76efadde28fa5eb783b0895582e72d	2020-11-10 11:04:34 -08:00
Eli Uriegas	cc337069e0	.circleci: Add python 3.9 to linux binary build matrix (#47235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47235 Depends on https://github.com/pytorch/builder/pull/565 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24863739 Pulled By: seemethere fbshipit-source-id: ed78087bb7aae118af7a808d7b5620d6c9b8cb26	2020-11-10 10:56:50 -08:00
Jane Xu	22d56319ee	Moving hypothesis and other installations to Docker (#47451 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31136 This PR: 1. moves several installations to Docker from `test.sh` for both PyTorch and Caffe2 2. removes version fixing for numba and llvmlite as the issue linked has been resolved Pull Request resolved: https://github.com/pytorch/pytorch/pull/47451 Reviewed By: walterddr Differential Revision: D24791350 Pulled By: janeyx99 fbshipit-source-id: bf36cd419e30d9e02622ad7c7049fbc724c89579	2020-11-10 10:42:16 -08:00
Wanchao Liang	fa560ceb9c	[reland] make intrusive_ptr as a pybind holder type (#47586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47586 relanding PR of https://github.com/pytorch/pytorch/pull/44492, and add additional Capsule related wrapping to ensure we still have the correct type in pybind11 to resolve Capsule as torch._C.CapsuleType Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24822519 Pulled By: wanchaol fbshipit-source-id: eaaea446fb54b56ed3b0d04c31481c64096e9459	2020-11-10 10:09:08 -08:00
Zino Benaissa	780f854135	Clear Shape info in frozen modules (#47511 ) Summary: To ensure the frozen models produced from traced models are not over-optimized, clear shape info in the frozen model Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/47511 Reviewed By: eellison Differential Revision: D24792849 Pulled By: bzinodev fbshipit-source-id: 5dc7c4d713a113c23d59cabf5541b3c58b075b43	2020-11-10 09:49:58 -08:00
Iurii Zdebskyi	1c45631f10	Revert D24737050: [WIP] Adding bunch of unary foreach APIs Test Plan: revert-hammer Differential Revision: D24737050 (`b6a2444eff`) Original commit changeset: deb59b41ad1c fbshipit-source-id: 76cd85028114cfc8fc5b7bb49cd27efc2e315aa5	2020-11-10 09:41:41 -08:00
Ankur Singla	5882f2e540	[caffe2][memonger] Add support for distributed inference predict nets in DAG memonger Summary: Distributed Inference splits a predict net into multiple parts, part0 being the main part which contains ops to make remote calls to other parts. part0 predict net may contain AsyncIf ops to optimize rpc call usage. AsyncIf ops have internal nets which may refer to memongered blobs. This change handles AsyncIf ops to update internal nets to refer to memongered blobs. Here is one reference part0 predict net with AsyncIf ops: https://www.internalfb.com/intern/paste/P145812115/ As part of this change, I am also updating dag memonger traversal to always start from root op, i.e. ops with 0 in degree. Earlier logic will start traversing ops based on input head blobs and if one of the head inputs is getting used in a non-root op which gets visited before its parent, the traversal will throwing assertion error here: https://fburl.com/diffusion/ob110s9z . Almost for all the distributed inference part0 nets, it was throwing this assertion error. Reviewed By: hlu1 Differential Revision: D24346771 fbshipit-source-id: ad2dd2e63f3e822ad172682f6d63f8474492255d	2020-11-10 09:35:28 -08:00
Donny Greenberg	1bf3dc51ae	[JIT] Add `__prepare_scriptable__` duck typing to allow replacing nn.modules with scriptable preparations (#45645 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45072 As discussed with zdevito gchanan cpuhrsch and suo, this change allows developers to create custom preparations for their modules before scripting. This is done by adding a `__prepare_scriptable__` method to a module which returns the prepared scriptable module out-of-place. It does not expand the API surface for end users. Prior art by jamesr66a: https://github.com/pytorch/pytorch/pull/42244 cc: zhangguanheng66 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45645 Reviewed By: dongreenberg, ngimel Differential Revision: D24039990 Pulled By: zhangguanheng66 fbshipit-source-id: 4ddff2d353124af9c2ef22db037df7e3d26efe65	2020-11-10 08:59:45 -08:00
Supriya Rao	6bb18b24fb	[quant][qat] Ensure observer respects device affinity (#47514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47514 Previosuly the scale and zero_point were returned on the CPU even if the input tensor was on the GPU. This is because `copy_()` doesn't respect the device when copying over the tensor. Also fixed a bug where we were always setting the device to 'cuda' (irrespective of the device id) in the calculate_qparams function Test Plan: python test/test_quantization.py TestObserver.test_observer_qparams_respects_device_affinity Imported from OSS Reviewed By: vkuzo Differential Revision: D24800495 fbshipit-source-id: d7a76c59569842ed69029d0eb4fa9df63f87e28c	2020-11-10 08:43:52 -08:00
Simon Geisler	abae12ba41	only set ccbin flag if not provided by user (#47404 ) Summary: Avoid nvcc error if the user specifies c compiler (as pointed out in https://github.com/pytorch/pytorch/issues/47377) Fixes https://github.com/pytorch/pytorch/issues/47377 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47404 Reviewed By: ejguan Differential Revision: D24748833 Pulled By: malfet fbshipit-source-id: 1a4ad1f851c8854795f7f98e28f479a0ff458a00	2020-11-10 07:55:57 -08:00
Gregory Chanan	65a72cae2c	Fix type promotion for trace on CPU. (#47305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47305 Fixes https://github.com/pytorch/pytorch/issues/47127. Ideally this would just use diag and sum (as the CUDA implementation does), but that seems to have performance problems, which I'll link in the github PR. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D24729627 Pulled By: gchanan fbshipit-source-id: 151b786b53e7b958f0929c803dbf8e95981c6884	2020-11-10 07:46:03 -08:00
Richard Zou	57dcb04239	Batched gradient support for view+inplace operations (#47227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47227 Motivation ---------- We would like to compute batched gradients for view+inplace operations. This most notably shows up in internal implementation of operations. For example, many view backward functions (SelectBackward, DiagonalBackward) are implemented with view+inplace, so to support vectorized hessian computation for e.g. torch.select and torch.diagonal we would need a way to handle or workaround view+inplace. Approach -------- view+inplace creates a CopySlices node and transmute view backward nodes into an AsStrided node. For example, ``` leaf = torch.randn(4, 5, requires_grad=True) base = leaf * leaf view = base[0] view.cos_() ``` base.grad_fn is CopySlices and view.grad_fn is AsStridedBackward. To support vmap over CopySlices and AsStridedBackward: - We use `new_empty_strided` instead of `empty_strided` in CopySlices so that the batch dims get propagated - We use `new_zeros` inside AsStridedBackward so that the batch dims get propagated. Test Plan --------- - New tests. When we get closer to having most operations support batched grad computation via vmap, I'd like to add it as an option to gradcheck and turn it on for our tests. Test Plan: Imported from OSS Reviewed By: kwanmacher, glaringlee Differential Revision: D24741687 Pulled By: zou3519 fbshipit-source-id: 8210064f782a0a7a193752029a4340e505ffb5d8	2020-11-10 07:38:02 -08:00
Richard Zou	22d21414d7	Revert D24574649: [pytorch][PR] Utility that loads a DP/DDP model state dict into a non-DDP model with the same architecture. Test Plan: revert-hammer Differential Revision: D24574649 (`b631c872c9`) Original commit changeset: 17d29ab16ae2 fbshipit-source-id: 6766c6b21b82c9463143da0370192d9c68dbce6c	2020-11-10 06:55:45 -08:00
Nick Gibson	f2eac5df18	[NNC] Fix lowering of aten::remainder (#47611 ) Summary: Fix an issue with the TensorExpr lowering of aten::remainder with integral inputs. We were always lowering to fmod and never to Mod. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47611 Reviewed By: bertmaher, heitorschueroff Differential Revision: D24846929 Pulled By: nickgg fbshipit-source-id: adac4322ced5761a11a8e914debc9abe09cf5637	2020-11-09 21:45:42 -08:00
Nick Gibson	0b30a8d007	[NNC] Simplify and fix some bugs in Bounds Inference (#47450 ) Summary: Refactors NNC bounds inference to use the dependency analysis added in https://github.com/pytorch/pytorch/issues/46952. This ends up being a pretty good simplification because we no longer need the complicated bound merging code that we used to determine contiguous ranges. There were no usages of that code and the memory dependency analyzer is closer to what we want for those use cases anyway. Added tests for a few cases uncovered by the existing bounds inference test - much of the coverage for this feature is in tests of it's uses: rfactor, computeAt and cacheAccesses. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47450 Reviewed By: heitorschueroff Differential Revision: D24834458 Pulled By: nickgg fbshipit-source-id: f93e40b09c0745dcc46c7e34359db594436d04f0	2020-11-09 21:37:04 -08:00
Pearu Peterson	c8a42c32a1	Allow large inputs to svd_lowrank. Fix inaccuracy in torch.svd docs. (#47440 ) Summary: As in title. Fixes https://github.com/pytorch/pytorch/issues/42062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47440 Reviewed By: bdhirsh Differential Revision: D24790628 Pulled By: mruberry fbshipit-source-id: 1442eb884fbe4ffe6d9c78a4d0186dd0b1482c9c	2020-11-09 21:04:48 -08:00
Jane Xu	52fe73a39e	Enable Python code coverage for onnx runs (#47387 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44120 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47387 Reviewed By: heitorschueroff Differential Revision: D24737378 Pulled By: janeyx99 fbshipit-source-id: 79e3d0b62f7da0617330f312fb1ed548c6be2a3b	2020-11-09 20:52:14 -08:00
Pradeep Ganesan	b631c872c9	Utility that loads a DP/DDP model state dict into a non-DDP model with the same architecture. (#45643 ) Summary: Added a convenience function that allows users to load models without DP/DDP from a DP/DDP state dict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45643 Reviewed By: rohan-varma Differential Revision: D24574649 fbshipit-source-id: 17d29ab16ae24a30890168fa84da6c63650e61e9	2020-11-09 20:49:29 -08:00
Wang Xu	49d5b4d1e1	move helper functions out of Partitioner class (#47515 ) Summary: This PR moves some helper functions out of Partitioner class. It will make Partitioner class cleaner and make these helper functions easier to use in the future Pull Request resolved: https://github.com/pytorch/pytorch/pull/47515 Reviewed By: gcatron, heitorschueroff Differential Revision: D24844751 Pulled By: scottxu0730 fbshipit-source-id: 04397d0ce995cf96943df0a2b9265a521177b4de	2020-11-09 20:42:10 -08:00
Ashkan Aliabadi	4841e9ef33	Add Vulkan op Conv2D. (#46900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46900 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D24568211 Pulled By: AshkanAliabadi fbshipit-source-id: 2819c8308292055aa4e8130109d8764d885c1340	2020-11-09 20:39:20 -08:00
Ashkan Aliabadi	ce11dbbb48	Vulkan tweaks (#47261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47261 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D24837714 Pulled By: AshkanAliabadi fbshipit-source-id: 221258c03a7f2304a3b34ad550c458c49a108cd0	2020-11-09 20:34:20 -08:00
John Kilpatrick	8aca85dbcd	Add diagflat complex support (#47564 ) Summary: Adds complex numbers support for `torch.diag` ``` python >>> import torch >>> a = torch.ones(2, dtype=torch.complex128) >>> torch.diagflat(a) tensor([[1.+0.j, 0.+0.j], [0.+0.j, 1.+0.j]], dtype=torch.complex128) >>> b = a.cuda() >>> torch.diagflat(b) tensor([[1.+0.j, 0.+0.j], [0.+0.j, 1.+0.j]], device='cuda:0', dtype=torch.complex128) ``` Note that automatic differentiation isn't implemented: ``` python >>> d = torch.ones(1, dtype=torch.complex128, requires_grad=True) >>> torch.diagflat(d) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: diag does not support automatic differentiation for outputs with complex dtype. ``` Fixes https://github.com/pytorch/pytorch/issues/47499 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47564 Reviewed By: heitorschueroff Differential Revision: D24844467 Pulled By: anjali411 fbshipit-source-id: 9c8cb795d52880b7dcffab0c059b0f6c2e5ef151	2020-11-09 20:28:23 -08:00
Ksenija Stanojevic	79f8582289	[ONNX] Add export of aten::is_floating point (#46442 ) Summary: Add export of aten::is_floating point Pull Request resolved: https://github.com/pytorch/pytorch/pull/46442 Reviewed By: mrshenli Differential Revision: D24566156 Pulled By: bzinodev fbshipit-source-id: 91ea95e2c4d4866e2ef51bffe07461de2e31c110	2020-11-09 18:02:47 -08:00
Zhicheng Chen	3dd266304c	Fix inaccurate note in DistributedDataParallel (#47156 ) Summary: Sorry for my previous inaccurate [PR](https://github.com/pytorch/pytorch/pull/42471#issue-462329192 ). Here are some toy code to illustrate my point: * non-DistributedDataParallel version ```python import torch if __name__ == "__main__": torch.manual_seed(0) inp = torch.randn(1,16) inp = torch.cat([inp, inp], dim=0) model = torch.nn.Linear(16, 2) loss_func = torch.nn.CrossEntropyLoss() opti = torch.optim.SGD(model.parameters(), lr=0.001) opti.zero_grad() loss = loss_func(model(inp), torch.tensor([0, 0])) loss.backward() opti.step() print("grad:", model.weight.grad) print("updated weight:\n", model.weight) ``` * DistributedDataParallel version ```python import os import torch import torch.nn as nn import torch.distributed as dist from torch.multiprocessing import Process def run(rank, size): torch.manual_seed(0) x = torch.randn(1,16) model = torch.nn.Linear(16, 2) model = torch.nn.parallel.DistributedDataParallel(model) loss_func = torch.nn.CrossEntropyLoss() opti = torch.optim.SGD(model.parameters(), lr=0.001) opti.zero_grad() y = model(x) label = torch.tensor([0]) loss = loss_func(y, label) loss.backward() opti.step() if rank == 0: print("grad:", model.module.weight.grad) print("updated weight:\n", model.module.weight) def init_process(rank, size, fn, backend="gloo"): os.environ['MASTER_ADDR'] = '127.0.0.1' os.environ['MASTER_PORT'] = '29500' dist.init_process_group(backend, rank=rank, world_size=size) fn(rank, size) if __name__ == "__main__": size = 2 process = [] for rank in range(size): p = Process(target=init_process, args=(rank, size, run)) p.start() process.append(p) for p in process: p.join() ``` Both of these two pieces of code have the same output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47156 Reviewed By: mruberry Differential Revision: D24675199 Pulled By: mrshenli fbshipit-source-id: 1238a63350a32a824b4b8c0018dc80454ea502bb	2020-11-09 17:42:57 -08:00
Gary Zheng	8b3f1d1288	[caffe2] Add __slots__ to all classes in schema.py (#47541 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47541 The profiler has guided us to `schema.py`. Since these `Field`s are used everywhere and in huge quantities, we can easily make some optimizations system wide by adding `__slots__`. From StackOverflow, benefits include: * faster attribute access. * space savings in memory. Read more: https://stackoverflow.com/a/28059785/ Reviewed By: dzhulgakov Differential Revision: D24771078 fbshipit-source-id: 13f6064d367440069767131a433c820eabfe931b	2020-11-09 16:16:28 -08:00
Rong Rong	2f617c5104	skip GPU test on sandcastle if sanitizer is enabled (#47626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47626 `caffe2/test:cuda` was safeguarded by a GPU availability check however most of the mixed CPU/GPU tests aren't. Use `TEST_WITH_*SAN` flags to safeguard test discovery for CUDA tests. Test Plan: sandcastle Reviewed By: janeyx99 Differential Revision: D24842333 fbshipit-source-id: 5e264344a0b7b98cd229e5bf73c17433751598ad	2020-11-09 16:06:58 -08:00
Erjia Guan	86bb413600	Optimize backward for torch.repeat (#46726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46726 Fixes #43192 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D24739840 Pulled By: ejguan fbshipit-source-id: ddf21fc52c4676de25ad7bfb0b5c1c23daa77ee6	2020-11-09 15:12:40 -08:00
Gary Zheng	4c52a56c40	[caffe2] Properly call super init in schema.py (#47542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47542 The previous way of doing `Field.__init__(self, [])` is just wrong. Switching to Python2 compatible way: `super(ObjectName, self).__init__(...)` Reviewed By: dzhulgakov Differential Revision: D24771077 fbshipit-source-id: d6798c72090c0264b6c583602cae441a1b14587c	2020-11-09 15:02:22 -08:00
Iurii Zdebskyi	b6a2444eff	[WIP] Adding bunch of unary foreach APIs (#47383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47383 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D24737050 Pulled By: izdeby fbshipit-source-id: deb59b41ad1c79b66cafbd9a9d3d6b069794e743	2020-11-09 14:14:28 -08:00
BowenBao	5686d2428c	[ONNX] Slightly improve indexing with ellipsis under scripting (#46571 ) Summary: Still depending on rank of original tensor being known. i.e. ```python x[i, j, k] = y # rank of x must be known at export time ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46571 Reviewed By: mrshenli Differential Revision: D24657502 Pulled By: bzinodev fbshipit-source-id: 6ec87edb67be06e34526225e701954fcfc5606c8	2020-11-09 14:05:56 -08:00
mfkasim91	a49367e9c9	Update the docs of torch.eig about derivative (#47598 ) Summary: Related: https://github.com/pytorch/pytorch/issues/33090 I just realized that I haven't updated the docs of `torch.eig` when implementing the backward. Here's the PR updating the docs about the grad of `torch.eig`. cc albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/47598 Reviewed By: heitorschueroff Differential Revision: D24829373 Pulled By: albanD fbshipit-source-id: 89963ce66b2933e6c34e2efc93ad0f2c3dd28c68	2020-11-09 13:28:27 -08:00
Jiakai Liu	4159191f0e	[pytorch] split out trace type generator and migrate to new codegen model (#47438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47438 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24808211 Pulled By: ljk53 fbshipit-source-id: 44dfadf550a255c05aa201e54b48101aaf722885	2020-11-09 12:39:39 -08:00
Jiakai Liu	499d2fad98	[pytorch] factor out return_names api (#47437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47437 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24808213 Pulled By: ljk53 fbshipit-source-id: 8ec6d58952fd677ab2d97e63b060cafda052411a	2020-11-09 12:39:37 -08:00
Jiakai Liu	8d1a6ae51d	[pytorch] TraceType codegen tweak - newline before redispatch call (#47436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47436 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24808212 Pulled By: ljk53 fbshipit-source-id: a78c27ff76e1f6324eb2ae25467dec72b6b09b87	2020-11-09 12:39:34 -08:00
Bowen Bao	e26c1726cf	[ONNX] Fix scripting rand/randn/where (#45793 ) Summary: - rand/randn: the type signature of int[] is different in scripting, thus failing the check. - where: scripting produces dynamic cases which are supported by `unbind` export of higher opsets. - test_list_pass: this test fails when using new scripting api, should be fixed by https://github.com/pytorch/pytorch/issues/45369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45793 Reviewed By: mrshenli Differential Revision: D24566096 Pulled By: bzinodev fbshipit-source-id: 6fe0925c66dee342106d71c9cbc3c95cabe639f7	2020-11-09 12:39:31 -08:00
peter	a08e8dd70c	Fix python 3.9 builds on Windows (#47602 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47460. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47602 Reviewed By: heitorschueroff Differential Revision: D24832487 Pulled By: malfet fbshipit-source-id: 8846caeac5e767e8066470d5c981218f147c88dc	2020-11-09 12:39:28 -08:00
Nikita Shulga	6214d0ad88	Update nccl commit tag to head of v2.8 branch (#47603 ) Summary: Previous head of v2.8 branch was force-updated from `cd5a9b73c3028d2496666201588111a8c8d84878` to `31b5bb6f6447da98b9110c605465f9c09621074e` Fixes https://github.com/pytorch/pytorch/issues/47529 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47603 Reviewed By: seemethere, janeyx99 Differential Revision: D24832450 Pulled By: malfet fbshipit-source-id: ea141b207d7d8e92300ba286cde3cda3773adf51	2020-11-09 12:36:27 -08:00
Richard Zou	ead86b2419	Add batching rule for torch.clone(tensor, torch.contiguous_format) (#47365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47365 I wanted to avoid defining vmap behavior over contiguous_format for as long as possible. This is potentially ambiguous, consider the following: ``` >>> x = torch.randn(3, B0, 5) >>> y = vmap(lambda x: x.clone(torch.contiguous_format), in_dims=1, out_dims=1)(x) >>> y[:,0].is_contiguous() # ?? ``` There are two possible ways to interpret this operation (if we choose to allow it to succeed): 1. Each per-sample becomes contiguous, so y[:,0] is contiguous. 2. The output of vmap is contiguous (so y is contiguous, but y[:,0] is not) (1) makes more sense because vmap operates on a per-sample level. This makes sense when combined with the vmap fallback: - there are places in the codebase where we perform .contiguous() and then pass the result to an operator `op` that only accepts contiguous inputs. - If we vmap over such code and don't have a batching rule implemented for `op`, then we want the per-samples to be contiguous so that when `op` goes through the vmap fallback, it receives contiguous per-samples. (1) is the approach we've selected for this PR. Motivation ---------- To vmap over CopySlices, we have to vmap over a clone(contiguous_format) call: `e4bc785dd5/torch/csrc/autograd/functions/tensor.cpp (L93)` Alternatives ------------ - Implementing (2) is difficult in the current design because vmap is allowed to move batch dimensions to the front of the tensor. We would need some global information about the in_dims and out_dims passed to vmap. - We could also error out if someone calls clone(contiguous_format) and the batch dims are not at the front. This would resolve the ambiguity at the cost of limiting what vmap can do. Future Work ----------- - Add to a "vmap gotchas" page the behavior of contiguous_format. - Implement is_contiguous, Tensor.contiguous() with the same semantics. Those currently error out. Test Plan --------- - new tests Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D24741683 Pulled By: zou3519 fbshipit-source-id: 3ef5ded1b646855f41d39dcefe81129176de8a70	2020-11-09 11:36:48 -08:00
Richard Zou	7bc8fdb6d7	as_strided batching rule (#47364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47364 This PR adds a batching rule for as_strided. `as_strided` is a really weird operation and I hope that users don't use it very much. Motivation ---------- The motivation for adding a batching rule for as_strided is for batched gradient computation. AsStridedBackward appears in PyTorch when handling view+in-place operations and calls `as_strided`. AsStridedBackward calls as_strided on a fresh tensor with storage_offset equal to 0. We would like to be able to vmap through the backward graph of view+in-place operations to for batched gradient computation, especially because internally we have a number of functions that are implemented as a view+in-place. Alternatives ------------ If we think that as_strided is too crazy to have a batching rule, we could either: - have a flag that controls the autograd view+in-place behavior - require that the input tensor's storage offset must be equal to 0 to make it easier to reason about. I think the batching rule makes sense, so I didn't pursue the alternatives. The batching rule ----------------- ``` y = vmap(lambda x: x.as_strided(sizes, strides, offset))(xs) ``` The result of the above should be "equivalent" to: - Assume that each x has storage offset equal to xs.storage_offset() (call that S). - Calling as_strided with (sizes, sizes, offset + x[i].storage_offset() - S) on each x. More concretely, this returns a view on `xs`, such that each y[i] has: - sizes: `sizes` - strides: `strides` - storage_offset: offset + i * x.stride(batch_dim) Why the behavior can be weird ----------------------------- The behavior of the batching rule may be different from actually running as_strided in a for-loop because `as_strided` takes in `offset` as a "absolute offset". As an example, consider ``` >>> x = torch.tensor([0., 1., 2., 3., 4.]) >>> z = [x[i].as_strided([1], [1], 0) for i in range(5)] ``` Each z[i] is actually the same view on x (z[i] == torch.tensor([0.]))! However, we consider the above for-loop comprehension to be a user error: a user should have written the following if they wanted to use as_strided in a per-sample way: ``` >>> z = [x[i].as_strided([1], [1], 0 + x[i].storage_offset()) for i in range(5)] ``` Test Plan --------- - Added some tests that compare vmap+as_strided to vmap+(the equivalent operator) Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D24741685 Pulled By: zou3519 fbshipit-source-id: c1429caff43bfa33661a80bffc0daf2c0eea5564	2020-11-09 11:36:44 -08:00
Bert Maher	77c49e65d5	[tensorexpr] Fix registration of intrinsics on llvm-fb (#47540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47540 In FB's hybrid llvm 7/8 flavor, we (read: I) forgot to register intrinsics. It was... a bit annoying to figure out how to do this, and I'm sure it could be done more efficiently by someone who isn't just cargo-culting the API from KaleidoscopeJIT. Anyways. There are kind of 3 independent changes here but they're a bit annoying to separate out, so: 0. (trivial) Add the correct #defines to the internal build to run test_llvm. 1. (easy) add an assertSuccess function to convert llvm::Errors into `TORCH_INTERNAL_ASSERT`s, for better/easier debugging. 2. (medium) Factor out the gigantic register-all-the-things function into a helper so we can call it from both the LLVM and LLVM-FB constructors. 3. (hard) Fix the symbol resolver in llvm-fb to do a lookup using the ExecutionSession. This is the bit I don't really understand; it feels like the CompileLayer lookup should find these symbols but it doesn't. Whatever. Test Plan: `buck test //caffe2/test/cpp/tensorexpr:tensorexpr` Reviewed By: asuhan Differential Revision: D24807361 fbshipit-source-id: 8bb0d632dff6a065963ed14a600614cd21fbb095	2020-11-09 11:36:40 -08:00
Zachary DeVito	70d34718b8	[fx] add missing modules for type annoations (#47537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47537 When a module only appears in a type constructor List[torch.Tensor], it previously didn't get added to the list of used modules. This fixes it by introspecting on the type constructor. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D24806317 Pulled By: zdevito fbshipit-source-id: 263391af71e1f2156cbefaab95b9818c6b9aaae1	2020-11-09 11:36:36 -08:00
Xiang Gao	fbffd959ca	Fix compiler warning variable "num_ivalue_args" was declared but never referenced detected during: (#47494 ) Summary: ``` /home/gaoxiang/.local/lib/python3.8/site-packages/torch/include/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h(326): warning: variable "num_ivalue_args" was declared but never referenced detected during: instantiation of "std::decay_t<c10::guts::infer_function_traits<Functor>::type::return_type> c10::impl::call_functor_with_args_from_stack_<Functor,AllowDeprecatedTypes,ivalue_arg_indices...>(Functor , c10::Stack , std::index_sequence<ivalue_arg_indices...>) [with Functor=c10::impl::WrapFunctionIntoRuntimeFunctor<std::decay_t<__nv_bool ()>>, AllowDeprecatedTypes=false, ivalue_arg_indices=<>]" (346): here instantiation of "std::decay_t<c10::guts::infer_function_traits<Functor>::type::return_type> c10::impl::call_functor_with_args_from_stack<Functor,AllowDeprecatedTypes>(Functor , c10::Stack ) [with Functor=c10::impl::WrapFunctionIntoRuntimeFunctor<std::decay_t<__nv_bool ()>>, AllowDeprecatedTypes=false]" (396): here instantiation of "void c10::impl::make_boxed_from_unboxed_functor<KernelFunctor, AllowDeprecatedTypes>::call(c10::OperatorKernel , const c10::OperatorHandle &, c10::Stack ) [with KernelFunctor=c10::impl::WrapFunctionIntoRuntimeFunctor<std::decay_t<__nv_bool ()>>, AllowDeprecatedTypes=false]" /home/gaoxiang/.local/lib/python3.8/site-packages/torch/include/ATen/core/boxing/KernelFunction_impl.h(109): here instantiation of "c10::KernelFunction c10::KernelFunction::makeFromUnboxedFunctor<AllowLegacyTypes,KernelFunctor>(std::unique_ptr<c10::OperatorKernel, std::default_delete<c10::OperatorKernel>>) [with AllowLegacyTypes=false, KernelFunctor=c10::impl::WrapFunctionIntoRuntimeFunctor<std::decay_t<__nv_bool ()>>]" /home/gaoxiang/.local/lib/python3.8/site-packages/torch/include/ATen/core/boxing/KernelFunction_impl.h(175): here instantiation of "c10::KernelFunction c10::KernelFunction::makeFromUnboxedRuntimeFunction(FuncType ) [with AllowLegacyTypes=false, FuncType=__nv_bool ()]" /home/gaoxiang/.local/lib/python3.8/site-packages/torch/include/torch/library.h(92): here instantiation of "torch::CppFunction::CppFunction(Func , std::enable_if_t<c10::guts::is_function_type<Func>::value, std::nullptr_t>) [with Func=__nv_bool ()]" /home/gaoxiang/.local/lib/python3.8/site-packages/torch/include/torch/library.h(457): here instantiation of "torch::Library &torch::Library::def(NameOrSchema &&, Func &&) & [with NameOrSchema=const char (&)[23], Func=__nv_bool (*)()]" /home/gaoxiang/extension-jit/test.cu(6): here ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47494 Reviewed By: bdhirsh Differential Revision: D24796223 Pulled By: ezyang fbshipit-source-id: 598b94b4012beaa74c6bde0b96a9136a8a6bc4f2	2020-11-09 11:32:07 -08:00
Natalia Gimelshein	4a2fb34042	check sparse sizes (#47148 ) Summary: checks sizes of sparse tensors when comparing them in assertEqual. Removes additional checks in safeCoalesce, safeCoalesce should not be a test for `.coalesce()` function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47148 Reviewed By: mruberry Differential Revision: D24823127 Pulled By: ngimel fbshipit-source-id: 9303a6ff74aa3c9d9207803d05c0be2325fe392a	2020-11-09 10:33:24 -08:00
Jerry Zhang	65e5bd23d8	[quant] Add _FusedModule type to capture all fused modules for quantization (#47484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47484 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D24774703 fbshipit-source-id: f0efc5d77035b9854ec3e31a1d34f05d5680bc22	2020-11-09 10:28:45 -08:00
anjali411	8339f88353	Add complex autograd support for torch.mean (#47566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47566 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D24817013 Pulled By: anjali411 fbshipit-source-id: f2b8411fb9abdc3e2d07c8e4fef3071b76605b12	2020-11-09 08:31:10 -08:00
Nikita Shulga	3d962430a9	Make gen_op_registration flake8 compliant (#47604 ) Summary: Fixes regression introduced by D24686838 (`8182558c22`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/47604 Reviewed By: walterddr Differential Revision: D24832687 Pulled By: malfet fbshipit-source-id: e9f7a35561c2b1705e11fd11abe402e3c83cf5cc	2020-11-09 08:31:07 -08:00
Richard Zou	b80da89891	Batching rule for Tensor.new_empty_strided (#47226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47226 The batching rule is a little weird because it's not immediately obvious what the strides of the result should be. If tensor.new_empty_strided(size, stride) is called inside vmap and `tensor` is being vmapped over, the result is a physical tensor with: - size `[batch_shape] + size` - strides `[S0, S1, ..., Sn] + stride` such that the S0...Sn are part of a contiguous subspace and Sn is equal to the size of the storage of `torch.empty_strided(size, stride)`. I refactored some of the logic that computes the storage size for `torch.empty_strided(size, stride)` into a helper function `native::storage_size_for` and use it in the batching rule. Test Plan: - New tests in test/test_vmap.py Reviewed By: ejguan Differential Revision: D24741690 Pulled By: zou3519 fbshipit-source-id: f09b5578e923470d456d50348d86687a03b598d2	2020-11-09 08:31:04 -08:00
Richard Zou	59aca02224	Implement Tensor.new_empty_strided(sizes, strides, , dtype, device, requires_grad) (#47225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47225 Summary ------- This PR implements Tensor.new_empty_strided. Many of our torch. factory functions have a corresponding new_* method (e.g., torch.empty and torch.new_empty), but there is no corresponding method to torch.empty_strided. This PR adds one. Motivation ---------- The real motivation behind this is for vmap to be able to work through CopySlices. CopySlices shows up a lot in double backwards because a lot of view functions have backward formulas that perform view+inplace. `e0fd590ec9/torch/csrc/autograd/functions/tensor.cpp (L78-L106)` To support vmap through CopySlices, the approach in this stack is to: - add `Tensor.new_empty_strided` and replace `empty_strided` in CopySlices with that so that we can propagate batch information. - Make some slight modifications to AsStridedBackward (and add as_strided batching rule) Please let me know if it would be better if I squashed everything related to supporting vmap over CopySlices together into a single big PR. Test Plan --------- - New tests. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D24741688 Pulled By: zou3519 fbshipit-source-id: b688047d2eb3f92998896373b2e9d87caf2c4c39	2020-11-09 08:31:01 -08:00
Gary Zheng	4a58f35bef	[caffe2] Fix duplicate name bug in Net.AddExternalInput (#47530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47530 `Net.AddExternalInput` should raise if there are duplicate names. The previous code would only raise if the addition of duplicates was in separate calls, but not if it was in the same call. Test Plan: Added two new regression tests ``` ✓ Pass: caffe2/caffe2/python:core_test - testSetInputRecordWithBlobs (caffe2.caffe2.python.core_test.TestExternalInputs) (9.622) ✓ Pass: caffe2/caffe2/python:core_test - testAddExternalInputShouldRaiseIfDuplicate (caffe2.caffe2.python.core_test.TestExternalInputs) (9.639) ✓ Pass: caffe2/caffe2/python:core_test - testSetInputRecordWithoutBlobs (caffe2.caffe2.python.core_test.TestExternalInputs) (9.883) ✓ Pass: caffe2/caffe2/python:core_test - testAddExternalInputShouldRaiseIfDuplicateInSameCall (caffe2.caffe2.python.core_test.TestExternalInputs) (10.153) ``` Test trained 2 models. No issues f230755456 f230754926 Reviewed By: dzhulgakov Differential Revision: D24763586 fbshipit-source-id: c87088441d76f7198f8b07508b2607aec13521ed	2020-11-09 08:30:58 -08:00
Nikita Shulga	6248e0621c	Revert D24801481: [pytorch][PR] Add AcceleratedGraphModule and serialzie GraphModule to JSON Test Plan: revert-hammer Differential Revision: D24801481 (`9e0102c10f`) Original commit changeset: 6b3fe69b51f7 fbshipit-source-id: f8287ef88b302e0f08d58090dc61603a4ef5cb3c	2020-11-09 08:28:22 -08:00
Garret Catron	9e0102c10f	Add AcceleratedGraphModule and serialzie GraphModule to JSON (#47233 ) Summary: Example: ``` class TestModule(torch.nn.Module): def __init__(self): super().__init__() self.linear = torch.nn.Linear(4, 4) self.e = torch.rand(4) def forward(self, a, b): add_1 = a + b linear = self.linear(add_1) add_2 = linear + self.e return add_2 ``` JSON: ``` { "modules": {}, "weights": { "linear.weight": { "dtype": "torch.float32", "is_quantized": false, "shape": "[4, 4]" }, "linear.bias": { "dtype": "torch.float32", "is_quantized": false, "shape": "[4]" }, "e": { "dtype": "torch.float32", "is_quantized": false, "shape": "[4]" } }, "nodes": [ { "shape": "[4]", "dtype": "torch.float32", "target": "a", "op_code": "placeholder", "name": "a", "args": [], "kwargs": {} }, { "shape": "[4]", "dtype": "torch.float32", "target": "b", "op_code": "placeholder", "name": "b", "args": [], "kwargs": {} }, { "shape": "[4]", "dtype": "torch.float32", "target": "_operator.add", "op_code": "call_function", "name": "add_1", "args": [ { "is_node": true, "name": "a" }, { "is_node": true, "name": "b" } ], "kwargs": {} }, { "target": "linear", "op_code": "call_module", "name": "linear_1", "args": [ { "is_node": true, "name": "add_1" } ], "kwargs": {} }, { "shape": "[4]", "dtype": "torch.float32", "target": "e", "op_code": "get_attr", "name": "e", "args": [], "kwargs": {} }, { "shape": "[4]", "dtype": "torch.float32", "target": "_operator.add", "op_code": "call_function", "name": "add_2", "args": [ { "is_node": true, "name": "linear_1" }, { "is_node": true, "name": "e" } ], "kwargs": {} }, { "shape": "[4]", "dtype": "torch.float32", "target": "output", "op_code": "output", "name": "output", "args": [ { "is_node": true, "name": "add_2" } ], "kwargs": {} } ] } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47233 Reviewed By: jackm321, yinghai Differential Revision: D24801481 Pulled By: gcatron fbshipit-source-id: 6b3fe69b51f7ac57f445675acdac36b0e563f73d	2020-11-08 19:26:02 -08:00
Martin Yuan	8182558c22	[PyTorch Mobile] Don't use __ROOT__ for inference only ops Summary: `__ROOT__` ops are only used in full-jit. To make size compact, disable using it in inference. Since FL is still in fill-jit, keep it for training only. It saves -17 KB for fbios. TODO: when FL is migrated to lite_trainer, remove `__ROOT__` to save size in training too. Test Plan: CI Reviewed By: dhruvbird Differential Revision: D24686838 fbshipit-source-id: 15214cebb9d8defa3fdac3aa0d73884b352aa753	2020-11-08 15:27:47 -08:00
Jiakai Liu	16c72a5a6b	[pytorch] continue to rewrite gen_python_functions.py with typed models (#46978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46978 Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24589210 Pulled By: ljk53 fbshipit-source-id: e0c7e5b3672b41983f321400c2e2330d1462e76e	2020-11-08 01:34:12 -08:00
Xiang Gao	4a7de2746f	Add docs on how to toggle TF32 flags on C++ (#47331 ) Summary: I have been asked several times how to toggle this flag on libtorch. I think it would be good to mention it in the docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47331 Reviewed By: glaringlee Differential Revision: D24777576 Pulled By: mruberry fbshipit-source-id: cc2a338c477bb57e0bb74b8960c47fde99665e41	2020-11-08 01:29:24 -08:00
Pritam Damania	781e0ed835	Support RRef.backward() for Owner RRefs. (#46641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46641 Second part of https://github.com/pytorch/pytorch/pull/46568, allows RRef.backward() to work for owner RRefs. ghstack-source-id: 115440252 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D24441300 fbshipit-source-id: 64af28e6b6ae47ea27e611a148f217bc344a4c5b	2020-11-07 21:25:32 -08:00
Lingyi Liu	5a5258cb0d	Support the strided tensor on input for torch.cat (#46859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46859 Current implementation, for non-contiguous, it will go to slow path. This change tries to enable fast path for non-contiguous input(up to 4-dim). Test Plan: #benchamark before ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1,1,1)_N2_dim0_cuda # Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 17.126 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 20.652 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(128,1024,2)_N2_dim1_cuda # Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 20.412 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim0_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 48.265 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1025,1023,2)_N2_dim1_cuda # Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 52.964 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim2_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 71.111 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f8a3cdc2440>,111,65]_N5_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f8a3cdc2440>, 111, 65], N: 5, dim: 0, device: cuda Forward Execution Time (us) : 39.492 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[96,<function<lambda>at0x7f8a3cdc2b90>,64]_N5_dim1_cuda # Input: sizes: [96, <function <lambda> at 0x7f8a3cdc2b90>, 64], N: 5, dim: 1, device: cuda Forward Execution Time (us) : 31.596 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[128,64,<function<lambda>at0x7f880e7db3b0>]_N5_dim2_cuda # Input: sizes: [128, 64, <function <lambda> at 0x7f880e7db3b0>], N: 5, dim: 2, device: cuda Forward Execution Time (us) : 66.668 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f880e7db5f0>,32,64]_N50_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f880e7db5f0>, 32, 64], N: 50, dim: 0, device: cuda Forward Execution Time (us) : 54.562 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[32,<function<lambda>at0x7f880e7db680>,64]_N50_dim1_cuda # Input: sizes: [32, <function <lambda> at 0x7f880e7db680>, 64], N: 50, dim: 1, device: cuda Forward Execution Time (us) : 53.255 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[33,65,<function<lambda>at0x7f880e7db710>]_N50_dim2_cuda # Input: sizes: [33, 65, <function <lambda> at 0x7f880e7db710>], N: 50, dim: 2, device: cuda Forward Execution Time (us) : 69.771 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda # Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 98.438 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda # Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda Forward Execution Time (us) : 115.045 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda # Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda Forward Execution Time (us) : 476.497 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f880e7db7a0>]_N100_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f880e7db7a0>], N: 100, dim: 0, device: cuda Forward Execution Time (us) : 86.307 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f880e7db830>]_N1000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f880e7db830>], N: 1000, dim: 0, device: cuda Forward Execution Time (us) : 453.269 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f880e7db8c0>]_N2000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f880e7db8c0>], N: 2000, dim: 0, device: cuda Forward Execution Time (us) : 935.365 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f880e7db950>]_N3000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f880e7db950>], N: 3000, dim: 0, device: cuda Forward Execution Time (us) : 1355.937 ``` after ``` WARNING:2020-11-01 21:14:23 3332963:3336757 EventProfilerController.cpp:143] (x1) Lost sample due to delays (ms): 488, 11, 4121, 0 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1,1,1)_N2_dim0_cuda # Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 17.174 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 20.399 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(128,1024,2)_N2_dim1_cuda # Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 23.349 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim0_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 47.847 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1025,1023,2)_N2_dim1_cuda # Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 53.463 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim2_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 72.789 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fd5b5567710>,111,65]_N5_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fd5b5567710>, 111, 65], N: 5, dim: 0, device: cuda Forward Execution Time (us) : 39.747 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[96,<function<lambda>at0x7fd5b56b1320>,64]_N5_dim1_cuda # Input: sizes: [96, <function <lambda> at 0x7fd5b56b1320>, 64], N: 5, dim: 1, device: cuda Forward Execution Time (us) : 31.814 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[128,64,<function<lambda>at0x7fd3a2289680>]_N5_dim2_cuda # Input: sizes: [128, 64, <function <lambda> at 0x7fd3a2289680>], N: 5, dim: 2, device: cuda Forward Execution Time (us) : 67.202 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fd3a2289710>,32,64]_N50_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fd3a2289710>, 32, 64], N: 50, dim: 0, device: cuda Forward Execution Time (us) : 65.229 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[32,<function<lambda>at0x7fd3a22897a0>,64]_N50_dim1_cuda # Input: sizes: [32, <function <lambda> at 0x7fd3a22897a0>, 64], N: 50, dim: 1, device: cuda Forward Execution Time (us) : 60.843 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[33,65,<function<lambda>at0x7fd3a2289830>]_N50_dim2_cuda # Input: sizes: [33, 65, <function <lambda> at 0x7fd3a2289830>], N: 50, dim: 2, device: cuda Forward Execution Time (us) : 69.756 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda # Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 98.222 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda # Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda Forward Execution Time (us) : 112.521 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda # Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda Forward Execution Time (us) : 477.736 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fd3a22898c0>]_N100_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fd3a22898c0>], N: 100, dim: 0, device: cuda Forward Execution Time (us) : 50.617 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fd3a2289950>]_N1000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fd3a2289950>], N: 1000, dim: 0, device: cuda Forward Execution Time (us) : 461.631 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fd3a22899e0>]_N2000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fd3a22899e0>], N: 2000, dim: 0, device: cuda Forward Execution Time (us) : 840.469 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fd3a2289a70>]_N3000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fd3a2289a70>], N: 3000, dim: 0, device: cuda Forward Execution Time (us) : 1317.866 ``` Reviewed By: ngimel Differential Revision: D24527676 fbshipit-source-id: 83d6431e59fa7e1748292b37f5d1fa4ab6242299	2020-11-07 17:24:44 -08:00
David Fan	6e69a24a1d	[ONNX] Reimplement _var_mean to ensure non-negative (#47240 ) Summary: The current `_var_mean` implementation cannot ensure non-negative for variance, because it is actually `E(X^2)-(E(X))^2`: numerically when the dimension number is large and X is close to 0, it can have negative numbers (like our UT shows). The new implementation is `(E(X-E(X))^2)`, it ensures non-negative because the expectation of square is non-negative for sure. The UT passes for the new implementation (but fails for the existing one). So it is good to go. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47240 Reviewed By: ejguan Differential Revision: D24735729 Pulled By: bzinodev fbshipit-source-id: 136f448dd16622b2b46f40cdf6cb2fccf357c48d	2020-11-07 12:27:09 -08:00
Xiang Gao	f23a2a1115	The dimension being reduced should not be coalesced by TensorIterator (#47237 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37583#issuecomment-720172838 Also add overload of `<<` for convenience of debugging. This PR is tested by `test_reduction_split_cuda` which was added in https://github.com/pytorch/pytorch/pull/37788. Reproduce ```python import torch a = torch.zeros(8, 1, 128, 1024, 1024) a.cuda().sum(1) ``` Before ``` TensorIterator @ 0x7ffd05b10ba0 { ntensors() = 2 noutputs() = 1 shape() = [1073741824] strides() = { (0) = [4] (1) = [4] } dtype() = { (0) = Float (1) = Float } is_reduction_ = 1 } ``` After ``` TensorIterator @ 0x7fffc9051010 { ntensors() = 2 noutputs() = 1 shape() = [1, 1073741824] strides() = { (0) = [0, 4] (1) = [536870912, 4] } dtype() = { (0) = Float (1) = Float } is_reduction_ = 1 } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47237 Reviewed By: ejguan Differential Revision: D24734763 Pulled By: ngimel fbshipit-source-id: 02bb2b15694c68f96434f55033b63b6e5ff7085b	2020-11-07 01:30:24 -08:00
Tugsbayasgalan Manlaibaatar	29184f86b0	Correctly print out sign of near-zero double values (#47081 ) Summary: inside IValue.h, we previously printed -0.0 as 0.0. Therefore, it was causing some inconsistency when using -0.0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47081 Test Plan: A new test case inside test_jit that divides a tensor by -0. and checks if it outputs -inf for all modes. Fixes https://github.com/pytorch/pytorch/issues/46848 Reviewed By: mrshenli Differential Revision: D24688572 Pulled By: gmagogsfm fbshipit-source-id: 01a9d3f782e0711dd10bf24e6f3aa62eee72c895	2020-11-07 01:25:47 -08:00
Shiyan Deng	c19eb4ad73	BoxWithNMSLimit support int `batch_splits` input (#47504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47504 allow int type input of `batch_splits` Test Plan: ``` buck test caffe2/caffe2/python/operator_test:torch_integration_test -- test_box_with_nms_limits ``` Reviewed By: jackm321 Differential Revision: D24629522 fbshipit-source-id: 61cb132e792bddd8f9f1bca5b808f1a9131808f0	2020-11-07 00:27:51 -08:00
Yanan Cao	9d0c6e9469	Implement Complex tensor support in all reduce and all gather (#47523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47523 Reviewed By: bdhirsh Differential Revision: D24806743 Pulled By: gmagogsfm fbshipit-source-id: 627a5a0654c603bc82b90e4cb3d924b4ca416fbe	2020-11-06 22:26:48 -08:00
Xiong Wei	f90da88d8f	Add complex support for torch.mean [CUDA] (#47048 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46982 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47048 Reviewed By: heitorschueroff Differential Revision: D24729895 Pulled By: anjali411 fbshipit-source-id: 8e948480eb87c37de810207edf909375c0380772	2020-11-06 21:29:19 -08:00
Howard Huang	451e7d3db4	Enable diag for bool Tensors (#47455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47455 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D24772483 Pulled By: H-Huang fbshipit-source-id: 08ea4af4352972617db3c6475943b326f36b3049	2020-11-06 21:29:17 -08:00
Howard Huang	3253ccbd9f	Add bool tensor support for where (#47454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47454 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D24772482 Pulled By: H-Huang fbshipit-source-id: ea488aae5bf64ac20f7a5d001e8edf55eed16eaf	2020-11-06 21:26:24 -08:00
Martin Yuan	a1fef453b6	Support extra files in _load_for_mobile (#47425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47425 Extra files can be exported in lite interpreter model, but it could not be loaded. This PR is to add the capability to load extra files from lite interpreter model. Because extra_files is a default argument, it should not affect the existing usage of _load_for_mobile. It's a simple assembly or a generic unordered_map. No additional dependency should be introduced and the size overhead should be small (to be tested). Test Plan: Imported from OSS Reviewed By: kwanmacher Differential Revision: D24770266 Pulled By: iseeyuan fbshipit-source-id: 7e8bd301ce734dbbf36ae56c9decb045aeb801ce	2020-11-06 20:26:54 -08:00
Tugsbayasgalan Manlaibaatar	3f9697b10e	Correctly compare Stream IValues (#47303 ) Summary: Stream IValue equality comparison was comparing wrong object. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47303 Test Plan: Added a new C++ test Fixes #{issue number} Reviewed By: bdhirsh Differential Revision: D24752434 Pulled By: gmagogsfm fbshipit-source-id: 78bc7a812740485ebbc7cf0c06c2e671a7ccd26f	2020-11-06 17:29:09 -08:00
Nikita Shulga	25d1fb519d	Build nightly binaries only for the latest ROCM (#47503 ) Summary: Because ROCM 3.7 is not longer supported, isn't it? Also, delete ROCM3.5.1 docker image generation job Pull Request resolved: https://github.com/pytorch/pytorch/pull/47503 Reviewed By: seemethere Differential Revision: D24789230 Pulled By: malfet fbshipit-source-id: 36964f8e1096964f0ee2112e6ee67f29bcbd4373	2020-11-06 16:34:03 -08:00
Ilqar Ramazanli	e09ec8eefa	Update the error message for retain_grad (#47084 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46588 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47084 Reviewed By: albanD Differential Revision: D24632403 Pulled By: iramazanli fbshipit-source-id: 8dfd50fcbb6ef585ea4f903e3755b5a807312235	2020-11-06 16:34:00 -08:00
Anthony Liu	7af9752fdc	Fix rounding error flakiness in quantized_test (#47468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47468 Summary: QuantizePerChannel4d and QuantizePerChannel4dChannelsLast have issues with flakiness on both ARM and x86 builds. The flakiness stems from two sources: 1. The rounding strategy used by quantization for half values is to round the number to the nearest even integer (e.g. `4.5->4`, `5.5 -> 6`, `6.5->6`; however the above tests are incorrect by expecting the values to be rounded away from zero. 2. On ARM devices, `quantize_val_arm` calculates `zero_point + round(val / scale)` which behaves differently from `quantize_val`, which calculates `zero_point + round(val * (1.0f/scale))`. This small distinction leaves enough room for the floating point arithmetic errors to change rounding behavior (e.g. `3 / .24 = 12.5` whereas `3 * (1.0f / .24) = 12.500001`). Test Plan: For local builds: ``` python setup.py develop ./build/bin/quantized_test --gtest_filter='TestQTensor.QuantizePerChannel4d' --gtest_repeat=10000 \| grep FAILURE ``` For ARM Neon: ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI="armeabi-v7a with NEON" ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON adb push ./build/bin/quantized_test /data/local/tmp adb shell "/data/local/tmp/quantized_test --gtest_filter='TestQTensor.QuantizePerChannel4d' --gtest_repeat=1000 \| grep FAILURE" ``` For ARM64: ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON adb push ./build/bin/quantized_test /data/local/tmp adb shell "/data/local/tmp/quantized_test --gtest_filter='TestQTensor.QuantizePerChannel4d' --gtest_repeat=1000 \| grep FAILURE" ``` Reviewers:* Subscribers: Tasks: T79019469 Tags: Test Plan: Imported from OSS Reviewed By: kimishpatel Differential Revision: D24769889 Pulled By: AJLiu fbshipit-source-id: 417e7339bac70df5b9f630a1e286fad435e49240	2020-11-06 16:31:04 -08:00
Lillian Johnson	637787797b	[JIT] add support for torch.jit.Final in python 3.6 (#47393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47393 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D24739402 Pulled By: Lilyjjo fbshipit-source-id: 46f003f0a4b1a36894050b72b8f2334c30268e54	2020-11-06 14:30:44 -08:00
Wanchao Liang	31d041c946	Back out "[c10] make intrusive_ptr available as a pybind holder type" Summary: Original commit changeset: b9796e15074d have weird issue happening with custom class + recursive scripting, unland this first to figure out more details Test Plan: wait for sandcastle Reviewed By: zhangguanheng66 Differential Revision: D24780498 fbshipit-source-id: 99a937a26908897556d3bd9f1b2b39f494836fe6	2020-11-06 14:27:48 -08:00
Raghavan Raman	8eb228a7f3	Add support for log_softmax (#47409 ) Summary: This diff adds support for `log_softmax` op in NNC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47409 Reviewed By: ejguan Differential Revision: D24750203 Pulled By: navahgar fbshipit-source-id: c4dacc7f62f9df65ae467f0d578ea03d3698273d	2020-11-06 13:29:27 -08:00
Gary Zheng	582e852fba	[caffe2] Add unittests for schema.Field init (#47512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47512 I deleted the last line of `__init__` -- `self._field_offsets.append(offset)` -- and the unittests didn't fail. So this diff is to improve test coverage. Test Plan: ``` ✓ Pass: caffe2/caffe2/python:schema_test - testInitShouldSetEmptyParent (caffe2.caffe2.python.schema_test.TestField) (8.225) ✓ Pass: caffe2/caffe2/python:schema_test - testInitShouldSetFieldOffsetsIfNoChildren (caffe2.caffe2.python.schema_test.TestField) (8.339) ✓ Pass: caffe2/caffe2/python:schema_test - testInitShouldSetFieldOffsets (caffe2.caffe2.python.schema_test.TestField) (8.381) ``` Reviewed By: dzhulgakov Differential Revision: D24767188 fbshipit-source-id: b6ce8cc96ecc61768b55360e0238f7317a2f18ea	2020-11-06 13:27:58 -08:00
Jerry Zhang	2572d7a671	[quant][eagermode][qat][test] Add numerical test for qat convert (#47376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47376 For sigmoid, hardsimoid, tanh, leaky_relu Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24734754 fbshipit-source-id: f42ff9410629fa344be97494ffdbe453a7943f65	2020-11-06 12:36:16 -08:00
Yuxin Wu	24b549ba84	[jit] better message for bad type annotation (#47464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47464 ``` ValueError: Unknown type annotation: 'typing.Sequence[torch.Tensor]' at File "xxx.py", line 223 images = [x["image"].to(self.device) for x in batched_inputs] images = [(x - self.pixel_mean) / self.pixel_std for x in images] images = ImageList.from_tensors(images, self.backbone.size_divisibility) ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE return images ``` Otherwise have no clue where the error is. Test Plan: sandcastle Reviewed By: glaringlee Differential Revision: D24764886 fbshipit-source-id: abd5734394e53b20baa6473134896e3a2b178662	2020-11-06 12:36:14 -08:00
Bugra Akyildiz	c26c4690fe	Add sub operator Summary: Add sub operator for caffe2 Test Plan: ``` buck test //caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test ``` Reviewed By: houseroad Differential Revision: D24685090 fbshipit-source-id: 60d745065d01b634ebd3087e533d8b9ddab77a1f	2020-11-06 12:31:17 -08:00
Tristan Rice	47198e3208	[caffe2] improve core.Net cloning/init performance (24x for large models!) (#47475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47475 This improves the core.Net cloning/init performance by quite a bit. It makes set_input_record run in linear time instead of O(n) by checking the external_input map instead of regenerating the external inputs each time and then iterating over it. Test Plan: unit tests + canary runs Reviewed By: dzhulgakov Differential Revision: D24765346 fbshipit-source-id: 92d9f6dec158512bd50513b78675174686f0f411	2020-11-06 11:34:12 -08:00
Guilherme Leobas	90a90ab1d6	Add type informations to torch/storage.py (#46876 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46875 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46876 Reviewed By: glaringlee Differential Revision: D24758448 Pulled By: ezyang fbshipit-source-id: afbc19637fbfaa1b0276cdd707043111aee3abc3	2020-11-06 11:34:10 -08:00
Nikita Shulga	d0d673b043	Improve reciprocal() and rsqrt() accuracy on arm64 (#47478 ) Summary: Neither `vrecpeq_f32` nor `vrsqrteq_f32` yield accurate results but just perform first of two steps in an iteration of the Newton-Raphson method, as documented at https://developer.arm.com/documentation/dui0472/j/using-neon-support/neon-intrinsics-for-reciprocal-and-sqrt Use appropriate NEON instruction to run two more steps of the Newton's method to improve results Before: ``` $ python -c "import torch;print(torch.arange(1.0, 17.0, 1.0, dtype=torch.float32).reciprocal())" tensor([0.9980, 0.4990, 0.3330, 0.2495, 0.1997, 0.1665, 0.1426, 0.1248, 0.1108, 0.0999, 0.0908, 0.0833, 0.0769, 0.0713, 0.0667, 0.0624]) $ python -c "import torch;print(torch.arange(1.0, 17.0, 1.0, dtype=torch.float32).rsqrt())" tensor([0.9980, 0.7051, 0.5762, 0.4990, 0.4463, 0.4082, 0.3779, 0.3525, 0.3330, 0.3154, 0.3008, 0.2881, 0.2773, 0.2666, 0.2578, 0.2495]) ``` After: ``` $ python -c "import torch;print(torch.arange(1.0, 17.0, 1.0, dtype=torch.float32).reciprocal())" tensor([1.0000, 0.5000, 0.3333, 0.2500, 0.2000, 0.1667, 0.1429, 0.1250, 0.1111, 0.1000, 0.0909, 0.0833, 0.0769, 0.0714, 0.0667, 0.0625]) $ python -c "import torch;print(torch.arange(1.0, 17.0, 1.0, dtype=torch.float32).rsqrt())" tensor([1.0000, 0.7071, 0.5774, 0.5000, 0.4472, 0.4082, 0.3780, 0.3536, 0.3333, 0.3162, 0.3015, 0.2887, 0.2774, 0.2673, 0.2582, 0.2500]) ``` Partially addresses https://github.com/pytorch/pytorch/issues/47476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47478 Reviewed By: walterddr Differential Revision: D24773443 Pulled By: malfet fbshipit-source-id: 224dca9725601d29fb229f8d71d968a30f25c829	2020-11-06 11:31:05 -08:00
Rong Rong	5614f72534	Suppres test issues in test_torch running in sandcastle (#47474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47474 After enabling GPU/Re, some issues were specific to those runs Test Plan: ``` buck test -c test.external_runner=tpx mode/opt //caffe2/test:torch_cuda -- --use-remote-execution --force-tpx --run-disabled ``` Reviewed By: malfet, janeyx99 Differential Revision: D24771578 fbshipit-source-id: 1ada79dae12c8cb6f795a0d261c60f038eee2dfb	2020-11-06 10:34:28 -08:00
Rong Rong	611080a118	[hot fix] cuda 11.0.x doesn't support sm86. (#47408 ) Summary: Bump condition check from >11.0 to >11.0.3 CMAKE 3.5 doesn't support VERSION_GREATER_EQUAL see [here](https://github.com/Dav1dde/glad/issues/134), so we might need to bump this again iv 11.0.4+ releases. should fix https://github.com/pytorch/pytorch/issues/47352 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47408 Reviewed By: glaringlee Differential Revision: D24759949 Pulled By: walterddr fbshipit-source-id: de384c7b150babaf799cce53ed198e5e931899da	2020-11-06 10:34:25 -08:00
Mehdi Mirzazadeh	160db3db4f	Adding profiling capability to c++ ddp collective functions (#46471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46471 ghstack-source-id: 116018837 Test Plan: Added unit tests: buck test mode/dev-nosan caffe2/test/distributed:distributed_gloo_fork buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork Reviewed By: rohan-varma Differential Revision: D23948397 fbshipit-source-id: 6d93a370aff26bf96c39e5d78a2492c5142a9156	2020-11-06 10:29:58 -08:00
Edward Yang	1aeefcdaa6	Revert D24730264: [pytorch][PR] Added CUDA support for complex input for torch.inverse Test Plan: revert-hammer Differential Revision: D24730264 (`33acbedace`) Original commit changeset: b9c94ec46301 fbshipit-source-id: beb9263700e9bc92685f74c37c46aa33f3b595b9	2020-11-06 07:28:14 -08:00
Elias Ellison	f3ad7b2919	[JIT][Reland] add list() support (#42382 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40869 Resubmit of https://github.com/pytorch/pytorch/pull/33818. Adds support for `list()` by desugaring it to a list comprehension. Last time I landed this it made one of the tests slow, and got unlanded. I think that's bc the previous PR changed the emission of `list()` on a list input or a str input to a list comprehension, which is the more general way of emitting `list()`, but also a little bit slower. I updated this version to emit to the builtin operators for these two case. Hopefully it can land without being reverted this time... Pull Request resolved: https://github.com/pytorch/pytorch/pull/42382 Reviewed By: navahgar Differential Revision: D24767674 Pulled By: eellison fbshipit-source-id: a1aa3d104499226b28f47c3698386d365809c23c	2020-11-06 01:28:54 -08:00
Xu Zhao	eaa993a2e0	Add type annotations to torch._C._distributed_rpc module. (#46624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46624 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24761656 Pulled By: xuzhao9 fbshipit-source-id: b55aee5dd2b97f573a50e5bbfddde7d984943fec	2020-11-06 01:28:51 -08:00
Xu Zhao	73a3e70b24	Add type annotations for torch._C._distributed_c10d module. (#46623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46623 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24761606 Pulled By: xuzhao9 fbshipit-source-id: 827eaf2502e381ee24d36741c1613b4c08208569	2020-11-06 01:28:48 -08:00
Xu Zhao	fe77ded48a	Add Python declaration of torch._C and torch._C._autograd modules. (#46622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46622 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24761503 Pulled By: xuzhao9 fbshipit-source-id: c7ff9a9e46480a83bf6961e09972b5d20bdeb67b	2020-11-06 01:25:47 -08:00
Yi Wang	fccfe7bd1a	[Gradient Compression] Add unit tests that test default Python comm hook implementations (#47158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47158 1. Test the default Python comm hook implementations ALLREDUCE and FP16_COMPRESS, besides an ad-hoc all-reduce implementation. 2. Typo fix. 3. Reformat default_hooks.py. 4. Publish register_comm_hook API for DDP module (This should be done in a separate diff, but got merged unintentionally.) The new style can be used for testing any new comm hook like PowerSGD easily. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 116012600 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl Reviewed By: rohan-varma Differential Revision: D24669639 fbshipit-source-id: 048c87084234edc2398f0ea6f01f2f083a707939	2020-11-06 00:28:09 -08:00
Alex Suhan	873652d9ac	[TensorExpr] Fix LLVM 12 build after LLVM API changes (#47480 ) Summary: PolySize was removed: https://reviews.llvm.org/D88982 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47480 Test Plan: Build against LLVM 12. Reviewed By: glaringlee Differential Revision: D24773973 Pulled By: asuhan fbshipit-source-id: d09566675c043d8b63032c52bdadd09e09ccfc39	2020-11-05 22:30:37 -08:00
Mikhail Zolotukhin	fd72ec53d4	[JIT] Optimize hot path in ProfilingGraphExecutorImpl::getPlanFor. (#47465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47465 This results in a ~10% improvement on a DeepAndWide model: 10 runs of a benchmark before the change: ``` 1.480785621330142 1.430812582373619 1.3845220785588026 1.4510653037577868 1.4827174227684736 1.3679781593382359 1.4239587392657995 1.5069784726947546 1.3988622818142176 1.4533461946994066 ``` 10 runs of the same benchmark after the change: ``` 1.3221493270248175 1.3624659553170204 1.3415213637053967 1.3560577500611544 1.3064174111932516 1.2934542261064053 1.379274770617485 1.3850531745702028 1.26725466363132 1.3738237638026476 ``` Link to benchmark: https://gist.github.com/ZolotukhinM/2308732eabb47685c6f7786e5a13b3d1 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D24767247 Pulled By: ZolotukhinM fbshipit-source-id: a77e89fdfb54286e6463533c86b3a4ba606ca1c7	2020-11-05 22:27:24 -08:00
David Reiss	9a9383ef2e	PyTorch NNAPI integration prototype (#46780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46780 This is in prototype status, but pretty functional. There are two major parts. - Model converter. This is a pure Python component that consumes a model in TorchScript format, converts the operations into NNAPI semantics, and serializes the model in a custom format. It then wraps the result in a new TorchScript model that can invoke NNAPI under the hood. - Runtime. This is a TorchBind object that deserializes the model and sends the result to NNAPI. This is fairly simple since the serialized format is basically just a list of NNAPI calls to make, so most of the code is spent on bounds checking. A few notes on the design. - Currently, all tensor sizes need to be fixed, and those fixed sizes are burned directly into the serialized model. This will probably need to change. NNAPI supports variable-sized tensors, but the important hardware backends do not. However, we're seeing use cases crop up where the input size is not known until around the time that the model is loaded (for example, it might depend on the camera aspect ratio). I think the proper fix here is to remove the code in the converter that eagerly calculates the sizes of the intermediate tensors and replace it with a code generator that will generate some TorchScript code that will perform those calculations at model load time. This way, we will be able to support models that have variable-sized inputs while still only showing fixed-sized operands to NNAPI. - The important hardware backends want operands to be in NHWC order, but PyTorch natively represents all tensors and NCHW. The strategy for this is to keep NCHW during most of the conversion process, but track and additional value per operand representing the "dimension order". The dimension order gets propagated through convolutions and pointwise ops. When we're ready to serialize the model, we reorder the dimensions for "channels last" operands to NHWC. Test Plan: Some local testing with FB prod models. I'll need to add some examples and automated tests. Reviewed By: iseeyuan Differential Revision: D24574040 Pulled By: dreiss fbshipit-source-id: 6adc8571b234877ee3666ec0c0de24da35c38a1f	2020-11-05 21:31:01 -08:00
David Reiss	ad8c0e57ef	Add a command-line flag for overriding pthreadpool size (#46781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46781 Test Plan: Passed it to speed_benchmark_torch and saw perf change. Reviewed By: iseeyuan Differential Revision: D24752889 Pulled By: dreiss fbshipit-source-id: 762981510f271d20f76e33b6e6f361c4a6f48e6c	2020-11-05 21:30:54 -08:00
Elias Ellison	a63f391c6f	[JIT] fix documentation typo (#46926 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46816 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46926 Reviewed By: glaringlee Differential Revision: D24762897 Pulled By: eellison fbshipit-source-id: f58c4db5f4dd037141c18ec1121816eba33f87b7	2020-11-05 21:26:27 -08:00
Cameron Burnett	ceb16d8836	[Bootcamp] add CUDA kernel checks to ATen/native/cuda (#47466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47466 - Add kernel launch check `TORCH_CUDA_KERNEL_LAUNCH_CHECK()` (D24309971 (`353e7f940f`)) to several files in aten/src/ATen/native/cuda - Get rid of old check `AT_CUDA_CHECK(cudaGetLastError())` in these same files Test Plan: Test build: ``` buck build //caffe2/aten:ATen-cu ``` To check for launches without checks: ``` python3 caffe2/torch/testing/check_kernel_launches.py ``` Make sure none of the updated files are in the returned list: {F343234608} Reviewed By: r-barnes Differential Revision: D24724947 fbshipit-source-id: a7c7d3c70ed8fb5dfd69997b50f9c838f8651791	2020-11-05 20:27:56 -08:00
Nick Gibson	e985503d80	[NNC] Fix an issue with half-scalar vars coerced to float (Take 2) (#47448 ) Summary: Take 2 of this fix, I removed the repro from the issue which is a bit flaky due to parallelism. It broke on Windows but isn't specific to Windows or this fix, I think. I'll make sure all the tests pass this time (cc zou3519). Fixes an issue where fp16 scalars created by the registerizer could be referenced as floats - causing invalid conversions which would crash in the NVRTX compile. I also noticed that we were inserting patterns like float(half(float(X))) and added a pass to collapse those down inside the CudaHalfScalarRewriter. Fixes https://github.com/pytorch/pytorch/issues/47138 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47448 Reviewed By: glaringlee Differential Revision: D24765070 Pulled By: nickgg fbshipit-source-id: 5297e647534d53657bef81f4798e8aa6a93d1fbd	2020-11-05 19:31:52 -08:00
Richard Zou	9c8f40516f	Batched grad for advanced indexing (index) (#47223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47223 This PR enables batched gradient computation for advanced indexing. Previously, the backward formula was writing parts of the grad tensori in-place to zeros_like(self). Since grad is a BatchedTensor and self is not a BatchedTensor, this is not possible. To solve the problem, we instead create a new tensor with `grad.new_zeros` and then write to that in-place. This new tensor will have the same batchedness as the `grad` tensor. To prevent regressions (the autograd codegen special cases zeros_like to avoid saving the `self` tensor for backward), we teach the autograd codegen how to save `self.options()`. Test Plan: - new tests - run old indexing tests Reviewed By: ejguan Differential Revision: D24741684 Pulled By: zou3519 fbshipit-source-id: e267999dc079f4fe58c3f0bdf5c263f1879dca92	2020-11-05 18:25:33 -08:00
Wang Xu	65241e3681	add remove_node in Partition class (#47452 ) Summary: add remove_node method in Partition class for the future use Pull Request resolved: https://github.com/pytorch/pytorch/pull/47452 Reviewed By: glaringlee, gcatron Differential Revision: D24762770 Pulled By: scottxu0730 fbshipit-source-id: 35473ab7322d8e6ecab1c2624b668342bfec4cca	2020-11-05 17:27:18 -08:00
Wang Xu	b4b0fa6371	add get_device_to_partitions_mapping (#47361 ) Summary: add get_device_to_partitions_mapping function in the Partitioner class to make size_based_partition more modular and organized. This function will also be used in the future cost_aware_partition Pull Request resolved: https://github.com/pytorch/pytorch/pull/47361 Reviewed By: gcatron Differential Revision: D24760911 Pulled By: scottxu0730 fbshipit-source-id: 8cdda51b9a1145f9d13ebabbb98b4d9df5ebb6cd	2020-11-05 16:33:02 -08:00
Ivan Yashchuk	33acbedace	Added CUDA support for complex input for torch.inverse (#45034 ) Summary: `torch.inverse` now works for complex inputs on GPU. Test cases with complex matrices are xfailed for now. For example, batched matmul does not work with complex yet. Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45034 Reviewed By: zou3519 Differential Revision: D24730264 Pulled By: anjali411 fbshipit-source-id: b9c94ec463012913c117278a884adeee96ea02aa	2020-11-05 16:30:11 -08:00
Jane Xu	c2d4a5b137	Disable unused docker-pytorch-linux-xenial-py3.6-gcc4.8 job (#47446 ) Summary: The `docker-pytorch-linux-xenial-py3.6-gcc4.8` job is not used for any builds anymore. This PR removes it from CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47446 Reviewed By: seemethere, samestep Differential Revision: D24759876 Pulled By: janeyx99 fbshipit-source-id: e7d420fc2c6c7ffa43001d83b449e9ef3070e902	2020-11-05 12:13:37 -08:00
Horace He	373246733d	[FX] get the correct error message (#47108 ) Summary: Currently, code like ``` class Test(nn.Module): def __init__(self): super(Test, self).__init__() self.W = torch.nn.Parameter(torch.randn(5)) def forward(self, x): return torch.dot(self.W, x) mod = Test() print(fx.symbolic_trace(Test())(5)) ``` gives an error like the below, which does not show the actual code that throws the error. ``` Traceback (most recent call last): File "t.py", line 20, in <module> print(fx.symbolic_trace(Test())(5)) File "/home/chilli/fb/pytorch/torch/nn/modules/module.py", line 744, in _call_impl result = self.forward(input, kwargs) File "/home/chilli/fb/pytorch/torch/fx/graph_module.py", line 191, in debug_forward return src_forward(self, args, kwargs) File "<eval_with_key_0>", line 5, in forward TypeError: dot(): argument 'tensor' (position 2) must be Tensor, not int ``` This is particularly annoying when your function has already been transformed several times. So, the really annoying thing is that the error clearly has the requisite information in `exception.__traceback__` - it just isn't printing it. I think the right way of doing this is simply replacing `sys.excepthook`. This appears to be the standard way to modify exception messages. Scratch the below** The 2 methods in the PR right now are: 1. Just prepend the final part of the traceback to the beginning of your error message. Looks like ``` Traceback (most recent call last): File "t.py", line 20, in <module> print(fx.symbolic_trace(Test())(5)) File "/home/chilli/fb/pytorch/torch/nn/modules/module.py", line 744, in _call_impl result = self.forward(input, kwargs) File "/home/chilli/fb/pytorch/torch/fx/graph_module.py", line 197, in debug_forward raise e File "/home/chilli/fb/pytorch/torch/fx/graph_module.py", line 192, in debug_forward return src_forward(self, args, *kwargs) File "<eval_with_key_0>", line 5, in forward TypeError: File "<eval_with_key_0>", line 5, in forward dot_1 = torch.dot(w, x) dot(): argument 'tensor' (position 2) must be Tensor, not int ``` 2. Use the `from exception` feature in Python. Looks like ``` Traceback (most recent call last): File "/home/chilli/fb/pytorch/torch/fx/graph_module.py", line 192, in debug_forward return src_forward(self, args, *kwargs) File "<eval_with_key_0>", line 5, in forward TypeError: File "<eval_with_key_0>", line 5, in forward dot_1 = torch.dot(w, x) dot(): argument 'tensor' (position 2) must be Tensor, not int The above exception was the direct cause of the following exception: Traceback (most recent call last): File "t.py", line 20, in <module> print(fx.symbolic_trace(Test())(5)) File "/home/chilli/fb/pytorch/torch/nn/modules/module.py", line 744, in _call_impl result = self.forward(input, **kwargs) File "/home/chilli/fb/pytorch/torch/fx/graph_module.py", line 197, in debug_forward raise Exception(last_tb) from e Exception: File "<eval_with_key_0>", line 5, in forward dot_1 = torch.dot(w, x) ``` I think the first one looks better, but it's pretty hacky since we're shoving the traceback in the message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47108 Reviewed By: jamesr66a Differential Revision: D24751019 Pulled By: Chillee fbshipit-source-id: 83e6ed0165f98632a77c73de75504fd6263fff40	2020-11-05 10:59:01 -08:00
Nikita Shulga	eed4a57d54	Speedup copysign for half and bfloat16 types (#47413 ) Summary: This also avoids internal compiler error exceptions on aarch64 platforms and transitively fixes https://github.com/pytorch/pytorch/issues/47395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47413 Reviewed By: walterddr Differential Revision: D24745921 Pulled By: malfet fbshipit-source-id: 790e5b91d9116670c882d838b3862d5b47178d68	2020-11-05 10:31:32 -08:00
Edward Yang	35491412d1	Revert D24649817: [pytorch][PR] Fix pickling for Tensor subclasses. Test Plan: revert-hammer Differential Revision: D24649817 (`c4209f1115`) Original commit changeset: 1872faa36030 fbshipit-source-id: b9832cea45552bd8776909118c4324fbd61fd414	2020-11-05 10:25:48 -08:00
Ksenija Stanojevic	7a599870b0	[ONNX] Update peephole pass for prim::ListUnpack (#46264 ) Summary: Update pass that handles prim::ListUnpack in peephole file, so that it also covers the case when input to the node is of ListType. Fixes https://github.com/pytorch/pytorch/issues/45816 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46264 Reviewed By: mrshenli Differential Revision: D24566070 Pulled By: bzinodev fbshipit-source-id: 32555487054f6a7fe02cc17c66bcbe81ddf9623e	2020-11-05 09:42:24 -08:00
Vasiliy Kuznetsov	5977d1d864	FixedQParamsFakeQuantize: adjust default quant_min and quant_max (#47423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47423 Since the dtype of this fake_quant is `quint8`, the output range should be from 0 to 255. Fixing. This should address the numerical inaccuracies with sigmoid and hardsigmoid with `FixedQParamsFakeQuantize` attached compared to their quantized counterparts. In a future PR, might be safer to also make the activation functions using `FixedQParamsFakeQuantize` to explicitly specify their expected output range and zero_point. Leaving that for later, as this bugfix should be landed urgently. Test Plan: Manual script which gives low SQNR before this PR and high SQNR after this PR: https://gist.github.com/vkuzo/9906bae29223da72b10d6b6aafadba42 https://github.com/pytorch/pytorch/pull/47376, which can be landed after this, adds a proper test. Imported from OSS Reviewed By: ayush29feb, jerryzh168 Differential Revision: D24751497 fbshipit-source-id: 4c32e22a30116caaceeedb4cd47146d066054a89	2020-11-05 09:06:55 -08:00
Richard Zou	745899f926	Revert D24706475: [pytorch][PR] [NNC] Fix an issue in Cuda fusion with fp16 scalar vars coerced to float Test Plan: revert-hammer Differential Revision: D24706475 (`33cf7fddd2`) Original commit changeset: 9df72bbbf203 fbshipit-source-id: f16ff04818de4294713d5b97eab5b298c1a75a6b	2020-11-05 08:25:48 -08:00
Richard Zou	9c8078cdfb	Revert D24659901: Add tests for DDP control flow models. Test Plan: revert-hammer Differential Revision: D24659901 (`31c9d2efcd`) Original commit changeset: 17fc2b3ebba9 fbshipit-source-id: 26b0bdbe83cba54da4f363cfa7fc85c503aa05ab	2020-11-05 08:08:59 -08:00
Gregory Chanan	1519c7145c	__noinline__ the top level igamma cuda kernel. (#47414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47414 This improves the build time of this file by 10x on my machine (~12 minutes to ~1 minute). Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24746458 Pulled By: gchanan fbshipit-source-id: cdef801199d4fdc2bbd740fe1b771285b1d71319	2020-11-05 07:50:59 -08:00
Richard Zou	e40a563050	Fix sum batching rule, add simple clone batching rule (#47189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47189 PyTorch has a special case where sum(scalar_tensor, dim=0) does not fail and instead returns a new copy of the original scalar_tensor. If we end up vmapping over per-example scalar tensors, e.g., ``` >>> x = torch.randn(B0) # the per-examples are all scalars >>> vmap(partial(torch.sum, dim=0), x) ``` then we should replicate the behavior of sum(scalar_tensor, dim=0) by returning a clone of the input tensor. This PR also adds a batching rule for clone(Tensor, MemoryFormat). The batching rule: - unwraps the BatchedTensor, calls clone(), and rewraps the BatchedTensor if MemoryFormat is torch.preserve_format (which is the default). - errors out with an NYI for all other memory formats, including torch.contiguous_format. There are some weird semantics for memory layouts with vmap that I need to go and figure out. Those are noted in the comments for `clone_batching_rule` Test Plan: - new tests Reviewed By: ejguan Differential Revision: D24741689 Pulled By: zou3519 fbshipit-source-id: e640344b4e4aa8c0d2dbacc5c49901f4c33c6613	2020-11-05 07:38:43 -08:00
Richard Zou	9a9529aa84	Batching rules for complex view functions (#47188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47188 Includes batching rules for: - torch.real, torch.imag, torch.view_as_real, and torch.view_as_complex Test Plan: - new tests Reviewed By: ejguan Differential Revision: D24741686 Pulled By: zou3519 fbshipit-source-id: c143bab9bb5ebbcd8529e12af7c117cbebd4447e	2020-11-05 07:37:15 -08:00
Gregory Chanan	ae374dc690	Move igamma cuda specific code to kernel file. (#47410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47410 This is a copy-paste except for: 1) The code is put in an anonymous namespace 1) The static declarations on functions (in the now-anonymous namespace) are removed Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24745597 Pulled By: gchanan fbshipit-source-id: 049b6bb10845cd8d7961b533782f582b3db25248	2020-11-05 07:21:39 -08:00
Shijun Kong	220b3bd667	Add op benchmark for batch box cox as baseline (#47275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47275 ``` # Benchmarking Caffe2: batch_box_cox # Name: batch_box_cox_M64_N64_dtypedouble # Input: M: 64, N: 64, dtype: double Forward Execution Time (us) : 49.005 ``` Test Plan: `buck run mode/opt caffe2/benchmarks/operator_benchmark/c2:batch_box_cox_test -- --iterations=1000 --warmup 100` Reviewed By: houseroad Differential Revision: D24675426 fbshipit-source-id: 8bb1f3076dc6b01e7b63468136ddf3d9b6d7e5d2	2020-11-05 07:16:32 -08:00
Alban Desmaison	68954fe897	Add release note scripts (#47360 ) Summary: First commit contains the initial code from Richard's branch. Second commit are the changes that I made during the writing process Third commit is the update to support category/topic pair for each commit Pull Request resolved: https://github.com/pytorch/pytorch/pull/47360 Reviewed By: ejguan Differential Revision: D24741003 Pulled By: albanD fbshipit-source-id: d0fcc6765968dc1732d8a515688d11372c7e653d	2020-11-05 06:43:24 -08:00
Heitor Schueroff	a4ba018e57	Updated docs/test for dot and vdot (#47242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47242 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D24733771 Pulled By: heitorschueroff fbshipit-source-id: 92e3b0e28e0565918335fa85d52abe5db9eeff57	2020-11-05 06:27:50 -08:00
Supriya Rao	d8c3b2b10c	[quant][pyper] Add support for pruned weights in embedding_bag_byte lookup (#47329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47329 Supports pruned weights along with mapping for the compressed indices Test Plan: python test/test_quantization.py TestQuantizedEmbeddingOps Imported from OSS Reviewed By: qizzzh Differential Revision: D24719909 fbshipit-source-id: f998f4039e84bbe1886e492a3bff6aa5f56b6b0f	2020-11-04 22:33:33 -08:00
Supriya Rao	433b55bc7c	[quant] Add testing coverage for 4-bit embedding_bag sparse lookup op (#47328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47328 Extend tests to cover case for pruned weights with mapping table. Support for 8-bits sparse lookup to follow Test Plan: python test/test_quantization.py TestQuantizedEmbeddingOps Imported from OSS Reviewed By: qizzzh Differential Revision: D24719910 fbshipit-source-id: d31db6304f446104ee8c7b10b902accd2919a513	2020-11-04 22:29:12 -08:00
Xiang Gao	f19637e6ee	Expand the test of torch.addbmm and torch.baddbmm (#47079 ) Summary: This is to satisfy the request at https://github.com/pytorch/pytorch/pull/42553#issuecomment-673673914. See also https://github.com/pytorch/pytorch/pull/47124 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47079 Reviewed By: ejguan Differential Revision: D24735356 Pulled By: ngimel fbshipit-source-id: 122fceb4902658f350c2fd6f92455adadd0ec2a4	2020-11-04 21:11:26 -08:00
Scott Wolchok	df5b4696cf	[Pytorch] Specialize guts of c10::optional for 32-bit scalars (#47015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47015 c10::optional has non-trivial copy and move operations always. This change specializes it for 32-bit scalars so that it has trivial copy and move operations in that case. Ideally, we would instead rely on P0602 "variant and optional should propagate copy/move triviality" and use `std::optional` (or implement that functionality ourselves). We can't use `std::optional` because we are stuck with C++14. Implementing the full P0602 ourselves would add even more complexity. We could do it, but this should be a helpful first step. ghstack-source-id: 115886743 Test Plan: Collect Callgrind instruction counts for `torch.empty(())`. Data: Make empty c10-ful (https://github.com/pytorch/pytorch/pull/46092): ``` <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7ffaed1128e0> torch.empty(()) All Noisy symbols removed Instructions: 648005 632899 Baseline: 4144 3736 100 runs per measurement, 1 thread ``` This diff atop #46092: ``` <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f943f1dc8e0> torch.empty(()) All Noisy symbols removed Instructions: 602347 591005 Baseline: 4106 3736 100 runs per measurement, 1 thread ``` (6.6% improvement vs #46092) Pass optionals by const reference (https://github.com/pytorch/pytorch/pull/46598) ``` <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f1abb3988e0> torch.empty(()) All Noisy symbols removed Instructions: 601349 590005 Baseline: 4162 3736 100 runs per measurement, 1 thread ``` (6.8% improvement vs #46092) This diff atop #46598 (i.e., both together) ``` <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f9577c22850> torch.empty(()) All Noisy symbols removed Instructions: 596095 582451 Baseline: 4162 3736 100 runs per measurement, 1 thread Warning: PyTorch was not built with debug symbols. Source information may be limited. Rebuild with REL_WITH_DEB_INFO=1 for more detailed results. ``` (another 1.3% savings!) #46598 outperformed this change slightly, and combining the two leads to further benefits. I guess we should do both! (Though I still don't understand why passing optionals that should fit in a register by const reference would help...) Reviewed By: smessmer Differential Revision: D24552280 fbshipit-source-id: 4d93bfcffafebd8c01559398513fa6b9db959d11	2020-11-04 21:08:50 -08:00
Nick Gibson	0edc6a39c8	[NNC] Read/Write Dependency analysis (#46952 ) Summary: Adds a new piece of infrastructure to the NNC fused-kernel generation compiler, which builds a dependency graph of the reads and writes to memory regions in a kernel. It can be used to generate graphs like this from the GEMM benchmark (not this only represents memory hierarchy not compute hierarchy): ![image](https://user-images.githubusercontent.com/701287/97368797-e99d5600-1868-11eb-9a7e-ceeb91ce72b8.png) Or to answer questions like this: ``` Tensor* c = Compute(...); Tensor* d = Compute(...); LoopNest loop({d}); MemDependencyChecker analyzer; loop.root_stmt()->accept(analyzer); if (analyzer.dependsDirectly(loop.getLoopStmtsFor(d)[0], loop.getLoopStmtsFor(c)[0]) { // do something, maybe computeInline } ``` Or this: ``` Tensor* d = Compute(...); LoopNest loop({d}); MemDependencyChecker analyzer(loop.getInputs(), loop.getOutputs()); const Buf* output = d->buf(); for (const Buf* input : inputs) { if (!analyzer.dependsIndirectly(output, input)) { // signal that this input is unused } } ``` This is a monster of a diff, and I apologize. I've tested it as well as possible for now, but it's not hooked up to anything yet so should not affect any current usages of the NNC fuser. How it works: Similar to the registerizer, the MemDependencyChecker walks the IR aggregating memory accesses into scopes, then merges those scopes into their parent scope and tracks which writes are responsible for the last write to a particular region of memory, adding dependency links where that region is used. This relies on a bunch of math on symbolic contiguous regions which I've pulled out into its own file (bounds_overlap.h/cpp). Sometimes this wont be able to infer dependence with 100% accuracy but I think it should always be conservative and occaisionally add false positives but I'm aware of no false negatives. The hardest part of the analysis is determining when a Load inside a For loop depends on a Store that is lower in the IR from a previous iteration of the loop. This depends on a whole bunch of factors, including whether or not we should consider loop iteration order. The analyzer comes with configuration of this setting. For example this loop: ``` for (int i = 0; i < 10; ++i) { A[x] = B[x] + 1; } ``` has no inter loop dependence, since each iteration uses a distinct slice of both A and B. But this one: ``` for (int i = 0; i < 10; ++i) { A[0] = A[0] + B[x]; } ``` Has a self loop dependence between the Load and the Store of A. This applies to many cases that are not reductions as well. In this example: ``` for (int i =0; i < 10; ++i) { A[x] = A[x+1] + x; } ``` Whether or not it has self-loop dependence depends on if we are assuming the execution order is fixed (or whether this loop could later be parallelized). If the read from `A[x+1]` always comes before the write to that same region then it has no dependence. The analyzer can correctly handle dynamic shapes, but we may need more test coverage of real world usages of dynamic shapes. I unit test some simple and pathological cases, but coverage could be better. Next Steps: Since the PR was already so big I didn't actually hook it up anywhere, but I had planned on rewriting bounds inference based on the dependency graph. Will do that next. There are few gaps in this code which could be filled in later if we need it: * Upgrading the bound math to work with write strides, which will reduce false positive dependencies. * Better handling of Conditions, reducing false positive dependencies when a range is written in both branches of a Cond. * Support for AtomicAdd node added in Cuda codegen. Testing: See new unit tests, I've tried to be verbose about what is being tested. I ran the python tests but there shouldn't be any way for this work to affect them yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46952 Reviewed By: ejguan Differential Revision: D24730346 Pulled By: nickgg fbshipit-source-id: 654c67c71e9880495afd3ae0efc142e95d5190df	2020-11-04 19:52:20 -08:00
Hameer Abbasi	c4209f1115	Fix pickling for Tensor subclasses. (#47115 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47051 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47115 Reviewed By: ejguan Differential Revision: D24649817 Pulled By: ezyang fbshipit-source-id: 1872faa3603085f07c0a8a026404161d0715720d	2020-11-04 19:25:32 -08:00
Hameer Abbasi	60ae84754e	Add torch.overrides checks for submodules. (#47285 ) Summary: Partially addresses the override component of https://github.com/pytorch/pytorch/issues/42666 and https://github.com/pytorch/pytorch/issues/42175. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47285 Reviewed By: agolynski Differential Revision: D24706493 Pulled By: ezyang fbshipit-source-id: bf5a742ac7002dce5a9a454a945f1994b4c8b93e	2020-11-04 19:14:04 -08:00
Boris Valkov	6c5a1c50bf	Benchmark combining Distributed Data Parallel and Distributed RPC (#46993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46993 Introducing benchmark that combines Distributed Data Parallelism with Distributed Model Parallelism. The benchmark measures distributed training iteration time. The number of trainer nodes and parameter servers are configurable. The default setup has 8 trainers, 1 master node and 8 parameter servers. The training process is executed as follows: 1) The master creates embedding tables on each of the 8 Parameter Servers and holds an RRef to it. 2) The master, then kicks off the training loop on the 8 trainers and passes the embedding table RRef to the trainers. 3) The trainers create a `HybridModel` which performs embedding lookups in all 8 Parameter Servers using the embedding table RRef provided by the master and then executes the FC layer which is wrapped and replicated via DDP (DistributedDataParallel). 4) The trainer executes the forward pass of the model and uses the loss to execute the backward pass using Distributed Autograd. 5) As part of the backward pass, the gradients for the FC layer are computed first and synced to all trainers via allreduce in DDP. 6) Next, Distributed Autograd propagates the gradients to the parameter servers, where the gradients for the embedding table are updated. 7) Finally, the Distributed Optimizer is used to update all parameters. Test Plan: waitforbuildbot Benchmark output: ---------- Info --------- * PyTorch version: 1.7.0 * CUDA version: 9.2.0 ---------- nvidia-smi topo -m --------- GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity GPU0 X NV2 NV1 NV2 NV1 NODE NODE NODE 0-19,40-59 GPU1 NV2 X NV2 NV1 NODE NV1 NODE NODE 0-19,40-59 GPU2 NV1 NV2 X NV1 NODE NODE NV2 NODE 0-19,40-59 GPU3 NV2 NV1 NV1 X NODE NODE NODE NV2 0-19,40-59 GPU4 NV1 NODE NODE NODE X NV2 NV1 NV2 0-19,40-59 GPU5 NODE NV1 NODE NODE NV2 X NV2 NV1 0-19,40-59 GPU6 NODE NODE NV2 NODE NV1 NV2 X NV1 0-19,40-59 GPU7 NODE NODE NODE NV2 NV2 NV1 NV1 X 0-19,40-59 Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge) PIX = Connection traversing a single PCIe switch NV# = Connection traversing a bonded set of # NVLinks ------------------ PyTorch Distributed Benchmark (DDP and RPC) --------------------- sec/iter ex/sec sec/iter ex/sec sec/iter ex/sec sec/iter ex/sec Trainer0: p50: 0.376s 185/s p75: 0.384s 182/s p90: 0.390s 179/s p95: 0.396s 176/s Trainer1: p50: 0.377s 204/s p75: 0.384s 200/s p90: 0.389s 197/s p95: 0.393s 195/s Trainer2: p50: 0.377s 175/s p75: 0.384s 172/s p90: 0.390s 169/s p95: 0.395s 166/s Trainer3: p50: 0.377s 161/s p75: 0.384s 158/s p90: 0.390s 156/s p95: 0.393s 155/s Trainer4: p50: 0.377s 172/s p75: 0.383s 169/s p90: 0.389s 166/s p95: 0.395s 164/s Trainer5: p50: 0.377s 180/s p75: 0.383s 177/s p90: 0.389s 174/s p95: 0.395s 172/s Trainer6: p50: 0.377s 204/s p75: 0.384s 200/s p90: 0.390s 197/s p95: 0.394s 195/s Trainer7: p50: 0.377s 185/s p75: 0.384s 182/s p90: 0.389s 179/s p95: 0.394s 177/s All: p50: 0.377s 1470/s p75: 0.384s 1443/s p90: 0.390s 1421/s p95: 0.396s 1398/s Reviewed By: pritamdamania87 Differential Revision: D24409230 fbshipit-source-id: 61de31dd4b69914198cb4becc2e616b17d47ef1a	2020-11-04 18:53:19 -08:00
Mikhail Zolotukhin	ca293ec4e7	[TensorExpr] Run constant pooling in fusion groups to dedupe constants. (#47402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47402 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D24740957 Pulled By: ZolotukhinM fbshipit-source-id: 741cbddc4bf2decd95d444235c424a4ae003d0de	2020-11-04 18:44:12 -08:00
Wang Xu	5107a411cd	add partition_by_partition_cost (#47280 ) Summary: This PR adds the support to calculate the cost of a partitioned graph partition by partition based on the node cost. In a partitioned graph, top partitions (partitions without parents) are collected as the starting points, then use DFS to find the critical path among all partitions in the graph Pull Request resolved: https://github.com/pytorch/pytorch/pull/47280 Reviewed By: gcatron Differential Revision: D24735932 Pulled By: scottxu0730 fbshipit-source-id: 96653a8208554d2c3624e6c8718628f7c13e320b	2020-11-04 18:21:18 -08:00
Ksenija Stanojevic	878032d387	[ONNX] Add export of prim::data (#45747 ) Summary: Add export of prim::data Pull Request resolved: https://github.com/pytorch/pytorch/pull/45747 Reviewed By: bdhirsh Differential Revision: D24280334 Pulled By: bzinodev fbshipit-source-id: d21eda84eaba9e690852a72c0e63cbb40eae89bc	2020-11-04 18:15:28 -08:00
Jerry Zhang	192b2967a5	[quant][graphmode][fx][test] Add test for nn.Sequential (#47411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47411 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D24745678 fbshipit-source-id: f8a4858748402db6e72a21bf051f5542b9215ffa	2020-11-04 18:04:19 -08:00
Pritam Damania	c8872051e6	Validate number of GPUs in distributed_test. (#47259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47259 As described in https://github.com/pytorch/pytorch/issues/47257, not using enough number of GPUs would result in an error. As a result, before we call `init_process_group` in distributed_test, we validate we have enough GPUs. #Closes: https://github.com/pytorch/pytorch/issues/47257 ghstack-source-id: 115790475 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D24699122 fbshipit-source-id: 59c78d191881d1e063c43623dcf4d7eb75a2e94e	2020-11-04 17:55:34 -08:00
Nikita Vedeneev	8a3728c819	Make `torch.det()` support complex input. (#45980 ) Summary: As per title. A minor fix required to make it available for the CPU (`fmod` does not support complex). For CUDA requires [https://github.com/pytorch/pytorch/issues/45898 ](https://github.com/pytorch/pytorch/pull/45898). Pull Request resolved: https://github.com/pytorch/pytorch/pull/45980 Reviewed By: izdeby Differential Revision: D24539097 Pulled By: anjali411 fbshipit-source-id: 508830dbfd7794ab73e19320d07c69a051c91819	2020-11-04 17:47:03 -08:00
Xiang Gao	030caa190f	Expand the test of torch.bmm on CUDA (#47124 ) Summary: basically https://github.com/pytorch/pytorch/pull/47070, enabled on all CI with `ci-all` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47124 Reviewed By: ejguan Differential Revision: D24735130 Pulled By: ngimel fbshipit-source-id: c2124562a9f9d1caf24686e5d8a1106c79366233	2020-11-04 17:29:34 -08:00
Gregory Chanan	32c76dbecc	Split IGamma cuda kernel into it's own file to speed up compilation times. (#47401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47401 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24740657 Pulled By: gchanan fbshipit-source-id: 78244dba8624ca7be8761a8f4bf1aa078602e5cc	2020-11-04 17:23:25 -08:00
Gaoxiang Liu	735f8cc6c2	[DI] Allow explicit taskLauncher for torchscript interpreter (#46865 ) Summary: By default, TorchScript execution is single threaded and uses the caller's thread pool. For the use case of distributed inference, we hope there is a way to customize the behavior where the interpreter in torch script can be executed in other places. This diff allows an explicit taskLauncher for torchscript interpreter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46865 Test Plan: unit test is passed. fbshipit-source-id: 1d7b003926c0d1f8facc53206efb960cff8897ac Fixes #{issue number} Reviewed By: houseroad Differential Revision: D24616102 Pulled By: garroud fbshipit-source-id: 79202b62f92d0b0baf72e4bf7aa3f05e0da91d59	2020-11-04 17:07:55 -08:00
James Reed	b704cbeffe	[FX] Speed up non-parameter tensor lookup (#47325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47325 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D24715484 Pulled By: jamesr66a fbshipit-source-id: 983eef6212ae95f5ddd3255adc8a585fb336074c	2020-11-04 16:59:02 -08:00
Gregory Chanan	ff3e1de6d7	Clean up some imports in cuda kernel code. (#47400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47400 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24740655 Pulled By: gchanan fbshipit-source-id: b56a602637c375575444c074c4be0a698441a4ab	2020-11-04 16:56:48 -08:00
Nikita Shulga	848901f276	Fix collect_env when pytorch is not installed (#47398 ) Summary: Moved all torch specific checks under `if TORCH_AVAILABLE` block Embed gpu_info dict back into SystemEnv constructor creation and deduplicate some code between HIP and CUDA cases Fixes https://github.com/pytorch/pytorch/issues/47397 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47398 Reviewed By: walterddr Differential Revision: D24740421 Pulled By: malfet fbshipit-source-id: d0a1fe5b428617cb1a9d027324d24d7371c68d64	2020-11-04 16:54:08 -08:00
Gregory Chanan	da491d7535	Split up BinaryMiscOpKernels.cu because it's slow to compile. (#47362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47362 Test Plan: Imported from OSS Reviewed By: bwasti Differential Revision: D24730228 Pulled By: gchanan fbshipit-source-id: 17edc203fcc06aa5f64174305b184868c7f3e67b	2020-11-04 15:56:11 -08:00
Nick Gibson	33cf7fddd2	[NNC] Fix an issue in Cuda fusion with fp16 scalar vars coerced to float (#47229 ) Summary: Fixes an issue where fp16 scalars created by the registerizer could be referenced as floats - causing invalid conversions which would crash in the NVRTX compile. I also noticed that we were inserting patterns like `float(half(float(X)))` and added a pass to collapse those down inside the CudaHalfScalarRewriter. Fixes https://github.com/pytorch/pytorch/issues/47138 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47229 Reviewed By: agolynski Differential Revision: D24706475 Pulled By: nickgg fbshipit-source-id: 9df72bbbf203353009e98b9cce7ab735efff8b21	2020-11-04 15:48:12 -08:00
Rohan Varma	31c9d2efcd	Add tests for DDP control flow models. (#47206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47206 As discussed offline with pritamdamania87, add testing to ensure per-iteration and rank-dependent control flow works as expected in DDP with `find_unused_parameters=True`. ghstack-source-id: 115854944 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D24659901 fbshipit-source-id: 17fc2b3ebba9cef2dd01d2877bad5702174b9767	2020-11-04 15:40:57 -08:00
Jeffrey Wan	2e5bfa9824	Add `input` argument to `autograd.backward()` cpp api (#47214 ) Summary: Helps fix https://github.com/pytorch/pytorch/issues/46373 for the cpp api. Follow up to https://github.com/pytorch/pytorch/pull/46855/ which only changed the api for python only Pull Request resolved: https://github.com/pytorch/pytorch/pull/47214 Reviewed By: agolynski Differential Revision: D24716139 Pulled By: soulitzer fbshipit-source-id: 3e1f35968e8dee132985b883481cfd0d1872ccdd	2020-11-04 14:43:59 -08:00
Nikita Shulga	6f6025183f	Skip iomp5 emebedding if torch_cpu could not be found (#47390 ) Summary: This would be the case when package is build for local development rather than for installation Pull Request resolved: https://github.com/pytorch/pytorch/pull/47390 Reviewed By: janeyx99 Differential Revision: D24738416 Pulled By: malfet fbshipit-source-id: 22bd676bc46e5d50a09539c969ce56d37cfe5952	2020-11-04 14:22:53 -08:00
Scott Wolchok	ae7063788c	[Pytorch] Add basic c10::optional tests (#47014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47014 Some tests are better than zero tests. ghstack-source-id: 115769678 Test Plan: Run new tests, passes Reviewed By: smessmer Differential Revision: D24558649 fbshipit-source-id: 50b8872f4f15c9a6e1f39b945124a31b57dd61d9	2020-11-04 14:19:46 -08:00
Scott Wolchok	17be8ae11a	[pytorch] Remove c10::nullopt_t::init (#47013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47013 It was getting used in client code, and it's not part of `std::optional`. ghstack-source-id: 115769682 Test Plan: Existing tests Reviewed By: smessmer Differential Revision: D24547710 fbshipit-source-id: a24e0fd03aba1cd996c85b12bb5dcdb3e7af46b5	2020-11-04 14:14:55 -08:00
Elias Ellison	7ab843e78b	[JIT] add freeze to docs (#47120 ) Summary: freeze was temporarily renamed to _freeze in a reorg, and then removed from doc [here](https://github.com/pytorch/pytorch/pull/43473). add it back to docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47120 Reviewed By: suo Differential Revision: D24650712 Pulled By: eellison fbshipit-source-id: 399e31586b8093de66937ba1266007ee291f509e	2020-11-04 13:50:36 -08:00
Jonas Teuwen	a11bc04997	Expand GRADIENT_IMPLEMENTED_FOR_COMPLEX to allow named tensors (#47289 ) Summary: Complex-valued named tensors do not support backpropagation currently. This is due to `tools/autograd/gen_variable_type.py` not containing `alias` in `GRADIENT_IMPLEMENTED_FOR_COMPLEX` which is required to constructed named tensors. This fixes https://github.com/pytorch/pytorch/issues/47157. Also removed a duplicate `cholesky` in the list and added a test in `test_autograd.py`. Apologies, this is a duplicate of https://github.com/pytorch/pytorch/issues/47181 as I accidently removed my pytorch fork. cc: zou3519 anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47289 Reviewed By: agolynski Differential Revision: D24706571 Pulled By: zou3519 fbshipit-source-id: 2cc48ce38eb180183c5b4ce2f8f4eef8bcac0316	2020-11-04 13:30:44 -08:00
Stephen Jia	5d82311f0d	Add vulkan reshape op (hack) (#47252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47252 For now, just use the CPU to reshape. Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D24733906 Pulled By: SS-JIA fbshipit-source-id: df0e4c0f21379cb2533a1717300b2f7275936e55	2020-11-04 13:14:26 -08:00
Yi Wang	6b3802a711	[Gradient Compression] Export sizes, along with length and offset of each variable to GradBucket for PowerSGD (#47203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47203 1. Create a new field in BucketReplica to store sizes info for each variable. 2. Export sizes list, along with lengths and offsets to GradBuceket. These fields are needed for PowerSGD. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 115875194 Test Plan: Checked the field values from log. Reviewed By: rohan-varma Differential Revision: D24644137 fbshipit-source-id: bcec0daf0d02cbf25389bfd9be90df1e6fd8fc56	2020-11-04 12:34:53 -08:00
Iurii Zdebskyi	2c55426610	Renamed a TensorListMetaData property. Cleaned up a test (#46662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46662 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24453346 Pulled By: izdeby fbshipit-source-id: f88ac21708befa2e8f3edeffe5805b69a4634d12	2020-11-04 12:01:28 -08:00
Jerry Zhang	f588ad6a35	[quant][graphmode][fx] Test to make sure dequantize node are placed properly (#47332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47332 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D24719736 fbshipit-source-id: 51b1f14b479edbc5d7f28d85920faf5fee8dd5ea	2020-11-04 11:13:01 -08:00
Erjia Guan	bba5a31176	Revert D24481801: Optimize backward for torch.repeat Test Plan: revert-hammer Differential Revision: D24481801 (`4e6f2440d8`) Original commit changeset: 95c155e0de83 fbshipit-source-id: 0fb0afde760b0f5e17bd75df950a5d76aee5370b	2020-11-04 10:44:40 -08:00
Jane Xu	4189c3ca76	Fix onnx test-reports path in CI (#47315 ) Summary: Currently, no test reports are uploaded to CI because the paths for the `onnx` runs are incorrect. This PR attempts to change that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47315 Reviewed By: malfet Differential Revision: D24727607 Pulled By: janeyx99 fbshipit-source-id: f6d91698fdb15a39e01ef812032d4cd30621f864	2020-11-04 10:30:52 -08:00
Jane Xu	01da0fe5ff	Including generator param in randperm documentation (#47231 ) Summary: The `randperm` documentation is outdated and did not use to include the optional `generator` parameter. This PR just adds that along with the `pin_memory` parameter. This PR was brought up in [PR 47022](https://github.com/pytorch/pytorch/pull/47022), but is now rebased onto master. New docs look like: ![image](https://user-images.githubusercontent.com/31798555/97923963-e6084400-1d2c-11eb-9d46-573ba3189ad6.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/47231 Reviewed By: mruberry Differential Revision: D24711960 Pulled By: janeyx99 fbshipit-source-id: 3ff8be62ec33e34ef87d017ea97bb950621a3064	2020-11-04 09:37:41 -08:00
Brian Hirsh	fe17269e75	Revert "Revert D24335982: explicitly error out in comparison ops when the types don't match" (#47288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47288 This reverts commit b3eb0c86cf21d8dad5744a917c70d846a8715e69. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24706531 Pulled By: bdhirsh fbshipit-source-id: f3bf34ddba7882932155819251b6c7dcb5c6b56c	2020-11-04 09:27:47 -08:00
Jane Xu	e4bc785dd5	randperm: add torch check to ensure generator device = tensor device (#47022 ) Summary: BC-breaking Note: This PR disallows passing in a generator of a different device than the tensor being created during `randperm` execution. For example, the following code which used to work no longer works. ``` > torch.randperm(3, device='cuda', generator=torch.Generator(device='cpu')) tensor([0, 1, 2], device='cuda:0') ``` It now errors: ``` > torch.randperm(3, device='cuda', generator=torch.Generator(device='cpu')) RuntimeError: Expected a 'cuda:0' generator device but found 'cpu' ``` PR Summary: Fixes https://github.com/pytorch/pytorch/issues/44714 Also added + ran tests to ensure this functionality. Disclaimer: More work needs to be done with regards to small cuda tensors when a generator is specified, look at the issue thread for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47022 Reviewed By: samestep Differential Revision: D24608237 Pulled By: janeyx99 fbshipit-source-id: b83c47219c7816d93f938f7ce86dc8857513961b	2020-11-04 08:29:31 -08:00
Jane (Yuan) Xu	07e8f48e6b	Removing caffe2 and third_party from our code coverage (#47310 ) Summary: Our tests do not test these folders (as they shouldn't), and their inclusion in codecov obfuscates our coverage metrics. We ask codecov to ignore these folders when calculating our coverage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47310 Reviewed By: walterddr Differential Revision: D24711775 Pulled By: janeyx99 fbshipit-source-id: 6095bb5e8d52202c7930114d2f357163d2271022	2020-11-04 08:18:13 -08:00
Erjia Guan	f1ac63d324	Implement copysign (#46396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46396 Related #38349 [numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: byte, char, bool, int, short, long, float, double, half - Integral promoted to float - Not available: float/double complex `c = np.copysign(a, b)` \| a \| b \| c \| a.grad \| \| -1 \| -1 \| -1 \| 1 \| \| -0 \| -1 \| -0 \| 0 \| \| 0 \| -1 \| -0 \| 0 \| \| 1 \| -1 \| -1 \| -1 \| \| -1 \| -0 \| -1 \| 1 \| \| -0 \| -0 \| 0 \| 0 \| \| 0 \| -0 \| 0 \| 0 \| \| 1 \| -0 \| -1 \| -1 \| \| -1 \| 0 \| 1 \| -1 \| \| -0 \| 0 \| 0 \| 0 \| \| 0 \| 0 \| 0 \| 0 \| \| 1 \| 0 \| 1 \| 1 \| \| -1 \| 1 \| 1 \| -1 \| \| -0 \| 1 \| 0 \| 0 \| \| 0 \| 1 \| 0 \| 0 \| \| 1 \| 1 \| 1 \| 1 \| This function becomes non-differentiable at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [x] test (cpu/gpu) - [x] doc - [x] ~kernel_vec~ Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24401366 Pulled By: ejguan fbshipit-source-id: 3621c5ff74b185376a3705589983bb5197ab896d	2020-11-04 08:08:57 -08:00
Hao Lu	996f444c00	[pt][static_runtime] Memory model (#46896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46896 The idea of the memory model is quite similar to that of BlackBoxPredictor, however, it's more complicated in pt due to 1) tensor views that share storage with storage refcount bumps but with different TensorImpls, 2) tensors sharing the same TensorImpl and the same storage, but with no refcount bump of the StorageImpl, 3) data types such as TensorList and Tuples that have Tensors in them, 4) need to support non-out/out variant mix while we move the aten ops to out variants. As a result, I have to make the following adjustments: 1) remove tensors in output Tuples from internal blob list; 2) for memory allocation/deallocation, get candidate Tensors from the outputs of ops with out variant, extract StorageImpls from the Tensors, dedup, and remove output tensor StorageImpls, and get the final list of blobs for memory planning; 3) during the clean_up_memory pass, clean up memory held by the StorageImpls as well as Tensors/Lists/Tuples in IValues that don't participate in memory planning to reduce overall memory usage Risk: PyTorch team is planning to deprecate the current resize_outout api, which we do rely on. This is a pretty big risk. https://www.internalfb.com/intern/diffusion/FBS/browsefile/master/fbcode/caffe2/aten/src/ATen/native/Resize.cpp?commit=6457b329847607553d34e788a3a7092f41f38895&lines=9-23 Test Plan: ``` buck test //caffe2/test:static_runtime buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest buck test //caffe2/caffe2/fb/predictor:pytorch_predictor_test ``` Benchmarks: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 13 \ buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \ --scripted_model=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge/traced_precomputation.pt \ --pt_inputs=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge/container_precomputation_bs1.pt \ --iters=1000 --warmup_iters=10000 --num_threads=1 --pt_enable_static_runtime=true \ --pt_cleanup_activations=true --pt_enable_out_variant=false ``` \|pt_cleanup_activations \|pt_enable_out_variant \|old ms/iter \|new ms/iter \| \|--- \|--- \|--- \|--- \| \|0 \|0 \|0.31873 \|0.30228 \| \|0 \|1 \|0.30018 \|0.29184 \| \|1 \|0 \|0.35246 \|0.31895 \| \|1 \|1 \|0.35742 \|0.30417 \| Reviewed By: bwasti, raziel Differential Revision: D24471854 fbshipit-source-id: 4ac37dca7d2a0c362120a7f02fd3995460c9a55c	2020-11-03 23:47:59 -08:00
Yanan Cao	5c4bd9a38f	Move python-independent c10d implementations to torch/lib (#47309 ) Summary: * This is a pre-step to build c10d into libtorch * Includes a minor cleanup in c10d/CMakeLists.txt Pull Request resolved: https://github.com/pytorch/pytorch/pull/47309 Reviewed By: wanchaol Differential Revision: D24711768 Pulled By: gmagogsfm fbshipit-source-id: 6f9e0a6a73c30f5ac7dafde9082efcc4b725dde1	2020-11-03 23:39:54 -08:00
Qi Zhou	0ec717c830	Support int32 indices and offsets in nn.EmbeddingBag (#46758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46758 It's in general helpful to support int32 indices and offsets, especially when such tensors are large and need to be transferred to accelerator backends. Since it may not be very useful to support the combination of int32 indices and int64 offsets, here we enforce that these two must have the same type. Test Plan: unit tests Reviewed By: ngimel Differential Revision: D24470808 fbshipit-source-id: 94b8a1d0b7fc9fe3d128247aa042c04d7c227f0b	2020-11-03 23:33:50 -08:00
Lu Fang	a2f9c7d4e3	Expose SparseLengthsSum8BitRowwiseSparse to C10 (#47306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47306 Expose SparseLengthsSum8BitRowwiseSparse to PyTorch, since pt's 8bit embedding doesn't support pruning yet. It's temporary solution to unblock the QRT test, not best for performance. Test Plan: ci Reviewed By: ashishenoyp Differential Revision: D24709524 fbshipit-source-id: 725dfc9d803e4a555dd71fa5ab75dc175e671563	2020-11-03 22:51:12 -08:00
Jerry Zhang	0cba3e3704	[quant][graphmode][fx] Add support for qat convbn{relu}1d (#47248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47248 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24696524 fbshipit-source-id: 684db12be201307acbdc89a44192cf2270491dba	2020-11-03 22:43:33 -08:00
Nikita Shulga	3a0024574d	Do not delete rpath from torch.dylib on Darwin (#47337 ) Summary: Fixes CI regressions introduced by https://github.com/pytorch/pytorch/issues/47262 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47337 Reviewed By: ngimel Differential Revision: D24721954 Pulled By: malfet fbshipit-source-id: 395b037b29c0fc3b62ca50bba9be940ad72e0c5b	2020-11-03 22:36:35 -08:00
Jerry Zhang	53a5f08e0c	[quant][eagermode] Avoid inserting fakequant for sigmoid/hardsigmoid/tanh in eval mode (#47297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47297 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D24708270 fbshipit-source-id: a19b6dbe07d5c80f3cc78a987742d345d86e1cd1	2020-11-03 21:33:35 -08:00
Jerry Zhang	c6fe65bf90	[quant][graphmode][fx][fix] Fix error that DefaultQuantizer is not inserted after a module configured with None qconfig (#47316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47316 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D24713727 fbshipit-source-id: e604ef2274ff4bb4e8b6ebbb6ba681018e9ae248	2020-11-03 20:08:41 -08:00
Ansley Ussery	dec1c36487	Create prototype for AST rewriter (#47216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47216 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24687539 Pulled By: ansley fbshipit-source-id: 421108d066ff93ee18f4312ee67c287ca1cef881	2020-11-03 19:21:58 -08:00
Yi Wang	f91fcefc81	[Gradient Compression] Surface C++ comm hooks to Python API as built-in comm hooks (#47270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47270 This is almost same as #46959, except that in caffe2/torch/nn/parallel/distributed.py, BuiltinCommHookType should be imported conditionally, only when dist.is_available(). Otherwise, this Python enum type defined in caffe2/torch/scrc/distributed/c10d/init.cpp cannot be imported. See https://github.com/pytorch/pytorch/issues/47153 I tried to follow another enum type enum type ReduceOp defined in the same file, but did not work, because the C++ enum class is defined torch/lib/c10d library, but BuiltinCommHookType is defined in torch/csrc/distributed library. These two libraries are compiled in two different ways. To avoid adding typing to distributed package, which can be a new project, I simply removed the arg type of BuiltinCommHookType in this file. To review the diff on top of #46959, compare V1 vs Latest: https://www.internalfb.com/diff/D24700959?src_version_fbid=270445741055617 Main Changes in V1 (#46959): 1. Implemented the Pybind part. 2. In the reducer, once the builtin_comm_hook_type is set, a c++ comm hook instance will be created in Reducer::autograd_hook. 3. Added unit tests for the builit-in comm hooks. Original PR issue: C++ DDP Communication Hook https://github.com/pytorch/pytorch/issues/46348 ghstack-source-id: 115783237 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl //arvr/projects/eye_tracking/Masquerade:python_test USE_DISTRIBUTED=0 USE_GLOO=0 BUILD_TEST=0 USE_CUDA=1 USE_MKLDNN=0 DEBUG=0 python setup.py install Reviewed By: mrshenli Differential Revision: D24700959 fbshipit-source-id: 69f303a48ae275aa856e6e9b50e12ad8602e1c7a	2020-11-03 18:33:50 -08:00
Iurii Zdebskyi	2652f2e334	Optimize arguments checks (#46661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46661 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D24453342 Pulled By: izdeby fbshipit-source-id: 26866fdbc9dc2b5410b3b728b175a171cc6a4521	2020-11-03 17:43:10 -08:00
Raghavan Raman	2caa3bd453	Inlining all non-output buffers, including intermediate buffers. (#47258 ) Summary: This diff enables inlining for all non-output buffers, including the intermediate buffers that are created as part of an op. However, the buffers that correspond to reductions will not be inlined. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47258 Reviewed By: anjali411 Differential Revision: D24707015 Pulled By: navahgar fbshipit-source-id: ad8b03e38497600cd69980424db6d586bf93db74	2020-11-03 17:00:32 -08:00
Stephen Jia	464c569dbf	[vulkan] Add mean.dim op for vulkan (#47312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47312 Test Plan: ``` cd ~/pytorch BUILD_CUSTOM_PROTOBUF=OFF \ BUILD_TEST=ON \ USE_EIGEN_FOR_BLAS=OFF \ USE_FBGEMM=OFF \ USE_MKLDNN=OFF \ USE_NNPACK=OFF \ USE_NUMPY=OFF \ USE_OBSERVERS=OFF \ USE_PYTORCH_QNNPACK=OFF \ USE_QNNPACK=OFF \ USE_VULKAN=ON \ USE_VULKAN_API=ON \ USE_VULKAN_SHADERC_RUNTIME=ON \ USE_VULKAN_WRAPPER=OFF \ MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python3 setup.py develop --cmake && ./build/bin/vulkan_api_test ``` Reviewed By: IvanKobzarev Differential Revision: D24713617 Pulled By: SS-JIA fbshipit-source-id: 20c0f411fb390ad2114c7deff27cc6fc77448089	2020-11-03 16:45:21 -08:00
Mikhail Zolotukhin	9b168a1fed	[TensorExpr] Pick meaningful names for functions in TE codegen. (#47255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47255 As a result of this change, the generated CUDA code for the following fusion group: ``` graph(%0 : Float(32, 32, 1, 1, strides=[32, 1, 1, 1], requires_grad=0, device=cuda:0), %1 : Float(32, 32, strides=[32, 1], requires_grad=0, device=cuda:0), %2 : Float(32, 32, 1, strides=[32, 1, 1], requires_grad=0, device=cuda:0)): %3 : int = prim::Constant[value=1]() %v1.1 : Float(32, 32, 32, strides=[1024, 32, 1], requires_grad=0, device=cuda:0) = aten::add(%1, %2, %3) # test/test_tensorexpr.py:155:0 %5 : int = prim::Constant[value=1]() %6 : Float(32, 32, 32, 32, strides=[32768, 1024, 32, 1], requires_grad=0, device=cuda:0) = aten::add(%v1.1, %0, %5) # test/test_tensorexpr.py:156:0 return (%6) ``` Would look like the following: ``` extern "C" __global__ void fused_add_add(float* t0, float* t1, float* t2, float* aten_add) { { float v = __ldg(t1 + 32 * (((512 * blockIdx.x + threadIdx.x) / 32) % 32) + (512 * blockIdx.x + threadIdx.x) % 32); float v_1 = __ldg(t2 + ((512 * blockIdx.x + threadIdx.x) / 32) % 32 + 32 * (((512 * blockIdx.x + threadIdx.x) / 1024) % 32)); float v_2 = __ldg(t0 + ((512 * blockIdx.x + threadIdx.x) / 1024) % 32 + 32 * ((512 * blockIdx.x + threadIdx.x) / 32768)); aten_add[((((512 * blockIdx.x + threadIdx.x) / 32768) * 32768 + 32 * (((512 * blockIdx.x + threadIdx.x) / 32) % 32)) + 1024 * (((512 * blockIdx.x + threadIdx.x) / 1024) % 32)) + (512 * blockIdx.x + threadIdx.x) % 32] = (v + v_1) + v_2; } } ``` Previously we generated: ``` extern "C" __global__ void func(float* t0, float* t1, float* t2, float* aten_add) { { float v = __ldg(t1 + 32 * (((512 * blockIdx.x + threadIdx.x) / 32) % 32) + (512 * blockIdx.x + threadIdx.x) % 32); float v_1 = __ldg(t2 + ((512 * blockIdx.x + threadIdx.x) / 32) % 32 + 32 * (((512 * blockIdx.x + threadIdx.x) / 1024) % 32)); float v_2 = __ldg(t0 + ((512 * blockIdx.x + threadIdx.x) / 1024) % 32 + 32 * ((512 * blockIdx.x + threadIdx.x) / 32768)); aten_add[((((512 * blockIdx.x + threadIdx.x) / 32768) * 32768 + 32 * (((512 * blockIdx.x + threadIdx.x) / 32) % 32)) + 1024 * (((512 * blockIdx.x + threadIdx.x) / 1024) % 32)) + (512 * blockIdx.x + threadIdx.x) % 32] = (v + v_1) + v_2; } } ``` Differential Revision: D24698273 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 6da95c6ac3d5155ebfaaab4f84f55a24deb6d10d	2020-11-03 16:41:22 -08:00
Mikhail Zolotukhin	a65e757057	[TensorExpr] CudaCodegen: restart counter for function names unique ID inside each codegen instantiation. (#47254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47254 CUDA codegen used a static global counter for picking names for functions, but the functions only need to be unique in the scope of the given codegen. This PR fixes that. Differential Revision: D24698271 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 516c0087b86b35bbb6ea7c71bb0ed9c3daaca2b8	2020-11-03 16:41:20 -08:00
Mikhail Zolotukhin	3161fe6d5a	[JIT] SubgraphUtils: add a function for generating a string name for a given graph. (#47253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47253 The function simply goes over all aten nodes in the graph and concatenates their names, truncating the final name to a given length. Differential Revision: D24698272 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: d6e50194ca5faf0cb61f25af83247b5e40f202e4	2020-11-03 16:36:41 -08:00
Brian Hirsh	7a0f0d24d0	Codegen - error when an argument that looks like an out argument isn't a kwarg (fix #43273 ) (#47284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47284 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D24706763 Pulled By: bdhirsh fbshipit-source-id: 60fbe81a0dff7e07aa8c169235d15b84151d3ed7	2020-11-03 16:30:01 -08:00
Howard Huang	a8ef4d3f0b	Provide 'out' parameter for 'tensordot' (#47278 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42102 Added an optional out parameter to the tensordot operation to allow using buffers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47278 Test Plan: pytest test/test_torch.py -k tensordot -v Reviewed By: agolynski Differential Revision: D24706258 Pulled By: H-Huang fbshipit-source-id: eb4bcd114795f67de3a670291034107d2826ea69	2020-11-03 15:56:00 -08:00
Zafar	31ebac3eb7	[quant] Quantized flip dispatch (#46235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46235 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24689161 Pulled By: z-a-f fbshipit-source-id: 6833c2639b29ea5f6c81c880b8928c5a1951c7b8	2020-11-03 15:36:22 -08:00
pomelyu	f41f3e3cd1	Implement bicubic grid sampler (#44780 ) Summary: Fix https://github.com/pytorch/pytorch/issues/44601 I added bicubic grid sampler in both cpu and cuda side, but haven't in AVX2 There is a [colab notebook](https://colab.research.google.com/drive/1mIh6TLLj5WWM_NcmKDRvY5Gltbb781oU?usp=sharing) show some test results. The notebook use bilinear for test, since I could only use distributed version of pytorch in it. You could just download it and modify the `mode_torch=bicubic` to show the results. There are some duplicate code about getting and setting values, since the helper function used in bilinear at first clip the coordinate beyond boundary, and then get or set the value. However, in bicubic, there are more points should be consider. I could refactor that part after making sure the overall calculation are correct. Thanks Pull Request resolved: https://github.com/pytorch/pytorch/pull/44780 Reviewed By: mrshenli Differential Revision: D24681114 Pulled By: mruberry fbshipit-source-id: d39c8715e2093a5a5906cb0ef040d62bde578567	2020-11-03 15:34:59 -08:00
Kshiteej K	63978556fd	[numpy] `torch.a{cosh, sinh}` : promote integer inputs to float (#47152 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47152 Reviewed By: mrshenli Differential Revision: D24681083 Pulled By: mruberry fbshipit-source-id: 246e2272536cf912a2575bfaaa831c3eceec034c	2020-11-03 15:26:13 -08:00
Chen Lai	2b5433dee6	[Pytorch][Annotation] Update inlined callstack with module instance info (#46729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46729 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D24493220 Pulled By: cccclai fbshipit-source-id: f37834157e6f69bbe87f73a7d3d38a94ece6017d	2020-11-03 15:19:02 -08:00
Cheng Chang	f730f2597e	[NNC] Implement Cond in LLVM codegen (#47256 ) Summary: Generate LLVM IR for statements such as ``` if (...) { .... } else { .... } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47256 Test Plan: added unit tests to test_llvm.cpp Reviewed By: nickgg Differential Revision: D24699080 Pulled By: cheng-chang fbshipit-source-id: 83b0cebcd242828263eb6052483f0924b5f091ce	2020-11-03 14:46:30 -08:00
Omkar Salpekar	8b13ab9370	Event Logging for NCCL Async Error Handling Process Crash (#47244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47244 This is an event-logging based update that should allow us to collect high-quality data about how many times the NCCL Async Error Handling mechanism is triggered. This logs an event called `ProcessGroupNCCL.WorkNCCL.handleNCCLGuard`, which is recorded as an entry in the `scuba_caffe2_pytorch_usage_stats` Scuba table. This Scuba entry will also contain metadata like workflow status, entitlement, hostnames, and workflow names, which will give us insight into what workloads/domains and machines are benefiting from async error handling. It also contains the Flow Run ID, which can be used as a join key with the `fblearner_workflow_run_status` scuba table for additional information like final error message, etc. We can easily quantify the number of times the async handling code was triggered by querying the `scuba_caffe2_pytorch_usage_stats` table. As a demonstration, I ran the following workflow with this diff patched: f229675892 Since the workflow above causes a desync, the `handleNCCLGuard` event is logged in scuba soon. See here for the filtered table: https://www.fburl.com/scuba/scuba_caffe2_pytorch_usage_stats/tmp1uvio As you can see, there are 4 entries. The workflow above uses 3 GPUs, 2 of which run into the desync scenario and are crashed using async error handling. We make this fail twice before succeeding the 3rd time, hence 4 entries. ghstack-source-id: 115708632 Test Plan: Did a quick demo as described above. Scuba entries with the logs can be found here: https://www.fburl.com/scuba/scuba_caffe2_pytorch_usage_stats/tmp1uvio Reviewed By: jiayisuse Differential Revision: D24688739 fbshipit-source-id: 7532dfeebc53e291fbe10d28a6e50df6324455b1	2020-11-03 13:42:42 -08:00
Nikita Shulga	ca61b061f3	Update minimum supported Python version to 3.6.2 (#47314 ) Summary: As typing.NoReturn is used in the codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/47314 Reviewed By: seemethere Differential Revision: D24712847 Pulled By: malfet fbshipit-source-id: f0692d408316d630bc11f1ee881b695437fb47d4	2020-11-03 13:32:07 -08:00
Jeffrey Wan	ea93bdc212	Add comment explaining purpose of the accumulate_grad argument (#47266 ) Summary: Addressing a comment from a PR that has already been merged https://github.com/pytorch/pytorch/issues/46855 https://github.com/pytorch/pytorch/pull/46855#discussion_r515161953 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47266 Reviewed By: agolynski Differential Revision: D24709017 Pulled By: soulitzer fbshipit-source-id: 3c104c2fef90ffd75951ecef4ae9e938d4b12d8c	2020-11-03 13:18:23 -08:00
Meghan Lele	dc0d68a1ee	[JIT] Print out interface mismatch for prim::ModuleDictIndex (#47300 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47300 Summary This commit augments the module interface subtyping check that is done before the emission of the `prim::ModuleDictIndex` operator so that the error message that is printed if the subtyping check fails provides more information on which methods do not match. Test Plan Existing unit tests for `prim::ModuleDictIndex`. Compilation of `ModWithWrongAnnotation` now produces this error: ``` Attribute module is not of annotated type __torch__.jit.test_module_containers.ModuleInterface: Method on class '__torch__.jit.test_module_containers.DoesNotImplementInterface' (1) is not compatible with interface '__torch__.jit.test_module_containers.ModuleInterface' (2) (1) forward(__torch__.jit.test_module_containers.DoesNotImplementInterface self, Tensor inp) -> ((Tensor, Tensor)) (2) forward(InterfaceType<ModuleInterface> self, Any inp) -> (Any) : ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D24709538 Pulled By: SplitInfinity fbshipit-source-id: 6b6cb75e4b2b12b08576a5530b4b90cbcad9b6e5	2020-11-03 13:07:21 -08:00
Nikita Shulga	14194e4f23	Embed `libiomp5.dylib` into wheel package (#47262 ) Summary: libiomp runtime is the only external dependency OS X package has if compiled with MKL Copy it to the stage directory from one of the available rpathes And remove all absolute rpathes, since project shoudl have none Fixes https://github.com/pytorch/pytorch/issues/38607 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47262 Reviewed By: walterddr Differential Revision: D24705094 Pulled By: malfet fbshipit-source-id: 9f588a3ec3c6c836c8986d858fb53df815a506c8	2020-11-03 13:00:30 -08:00
kshitij12345	c424d9389e	[numpy] `torch.a{cos, tan}` : promote integer inputs to float (#47005 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47005 Reviewed By: mrshenli Differential Revision: D24681097 Pulled By: mruberry fbshipit-source-id: 2f29655a5f3871ee96c2bfd35c93f4d721730e37	2020-11-03 13:00:24 -08:00
kshitij12345	0d00724e36	[numpy] `torch.{a}tanh` : promote integer inputs to float (#47064 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47064 Reviewed By: mrshenli Differential Revision: D24681107 Pulled By: mruberry fbshipit-source-id: 1818206c854dbce7074363bf6f1949daa7bf6052	2020-11-03 12:56:58 -08:00
kshitij12345	c68c3d0a02	[fix] nn.Embedding.from_pretrained : honour `padding_idx` argument (#47184 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46585 (first snippet) Now the behaviour of `padding_idx` agrees with documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47184 Reviewed By: mruberry Differential Revision: D24682567 Pulled By: albanD fbshipit-source-id: 864bd34eb9099d367a3fcbb8f4f4ba2e2b270724	2020-11-03 12:47:19 -08:00
Ivan Yashchuk	f276ab55cd	Added Kronecker product of tensors (torch.kron) (#45358 ) Summary: This PR adds a function for calculating the Kronecker product of tensors. The implementation is based on `at::tensordot` with permutations and reshape. Tests pass. TODO: - [x] Add more test cases - [x] Write documentation - [x] Add entry `common_methods_invokations.py` Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45358 Reviewed By: mrshenli Differential Revision: D24680755 Pulled By: mruberry fbshipit-source-id: b1f8694589349986c3abfda3dc1971584932b3fa	2020-11-03 12:41:41 -08:00
Wang Xu	32b66b0851	reorganize sparse_nn_partition (#47283 ) Summary: This PR moves combine_partitions_based_on_size and find_partition_to_combine_based_on_size to sparse_nn_partition since they are both only used by sparse_nn_partition Pull Request resolved: https://github.com/pytorch/pytorch/pull/47283 Reviewed By: gcatron Differential Revision: D24707864 Pulled By: scottxu0730 fbshipit-source-id: 183fe945e477e16301d7f489103287eb9d8a30af	2020-11-03 12:36:36 -08:00
Xiao Wang	774b638eb6	Change largeCUDATensorTest to largeTensorTest+onlyCUDA; add a buffer to large cuda tensor test (#45332 ) Summary: Effectively, `largeCUDATensorTest` = `largeTensorTest` + `onlyCUDA`. There was this problem where a user got OOM for a `largeCUDATensorTest('16GB')` on a 16GB V100. This decorator was checking total memory for a GPU device, however in most cases, we can't allocate all of the memory that a GPU has. So, it would be beneficial that we have a buffer on this `largeTensorTest` check for CUDA. I added a 10% buffer to it. Definition of `largeTensorTest` `d22dd80128/torch/testing/_internal/common_device_type.py (L560-L578)` `_has_sufficient_memory` `d22dd80128/torch/testing/_internal/common_device_type.py (L535-L557)` `largeCUDATensorTest` `d22dd80128/torch/testing/_internal/common_device_type.py (L526-L532)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45332 Reviewed By: ngimel Differential Revision: D24698690 Pulled By: mruberry fbshipit-source-id: a77544478e45ce271f6639ea04e87700574ae307	2020-11-03 11:43:49 -08:00
Erjia Guan	4e6f2440d8	Optimize backward for torch.repeat (#46726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46726 Fixes #43192 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24481801 Pulled By: ejguan fbshipit-source-id: 95c155e0de83b71f173c9135732ea84ba6399d69	2020-11-03 11:16:55 -08:00
Alban Desmaison	9c3a75527b	Update doc to reflect current behavior (#46937 ) Summary: This behavior was changed by side effect by https://github.com/pytorch/pytorch/pull/41984 Update the doc to reflect the actual behavior of the function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46937 Reviewed By: mruberry Differential Revision: D24682750 Pulled By: albanD fbshipit-source-id: 89b94b61f54dbcfc6a6988d7e7d361bd24ee4964	2020-11-03 11:02:19 -08:00
Yi Zhang	782f92b569	fix windows CI passed incorrectly (#47105 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47103 and fixes https://github.com/pytorch/pytorch/issues/45864 1. make the sh failsafe. 2. disable the failed test in windows. 3. verified [link](https://app.circleci.com/pipelines/github/pytorch/pytorch/233616/workflows/e33286c1-f5e2-4cf2-82ca-ef4f54dfa495/jobs/8608415/tests) Pull Request resolved: https://github.com/pytorch/pytorch/pull/47105 Reviewed By: samestep Differential Revision: D24648414 Pulled By: walterddr fbshipit-source-id: 1977007c2c7e8043efc590eb7261956a44e8f9ab	2020-11-03 10:29:26 -08:00
Facebook Community Bot	8c865493c6	Automated submodule update: FBGEMM (#47263 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `8eb6dcb23e` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47263 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: malfet Differential Revision: D24701113 fbshipit-source-id: 92ab4ae93c4d0753ee3d6590e5616fc8cd6082a0	2020-11-03 10:24:43 -08:00
Jeff Daily	9e58c85d08	[ROCm] remove use of HIP_PLATFORM (#47241 ) Summary: Fixes deprecated use of the HIP_PLATFORM env var. This env var is no longer needed to be set explicitly. Instead, HIP_PLATFORM is automatically detected by hipcc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47241 Reviewed By: mruberry Differential Revision: D24699982 Pulled By: ngimel fbshipit-source-id: 9cd2f32e7c0c8d662832b0cbbc2988835a45961a	2020-11-03 09:54:44 -08:00
Jane Xu	579cfc6641	Moving test order to rebalance test1 and test2 times (#47290 ) Summary: asan testing diff is absurd right now, moving some heftier tests to be in shard2 (test_nn and test_quantization) Pull Request resolved: https://github.com/pytorch/pytorch/pull/47290 Reviewed By: malfet Differential Revision: D24706877 Pulled By: janeyx99 fbshipit-source-id: 35069d1e425857f85775f9be76501d6a158e0376	2020-11-03 09:39:29 -08:00
Nikita Shulga	5c8896f8ad	Delete CUDA build rules from MacOS build (#47277 ) Summary: Also remove MAX_JOBS constraint, since OOM warning was about nvcc rather than clang Pull Request resolved: https://github.com/pytorch/pytorch/pull/47277 Reviewed By: walterddr Differential Revision: D24705180 Pulled By: malfet fbshipit-source-id: 25fd0161de3f7e14a2a4db86cbea8357cdc69e06	2020-11-03 09:01:12 -08:00
Nikita Shulga	c05ee86edd	Fix return-type-is-always-copy warning (#47279 ) Summary: `std::vector<bool>` can not return values by reference, since they are stored as bit fields Pull Request resolved: https://github.com/pytorch/pytorch/pull/47279 Reviewed By: glaringlee Differential Revision: D24705188 Pulled By: malfet fbshipit-source-id: 96e71cc4b9881f92af3b4a508d397deab6d68174	2020-11-03 08:53:24 -08:00
Erjia Guan	a341a4329a	Format error message for unmatched signature between _out and base functions (#47087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47087 Fixes #33547 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D24633077 Pulled By: ejguan fbshipit-source-id: d1baca84cb3bc415cced9b696103f17131e1e4c7	2020-11-03 07:36:37 -08:00
Tao Xu	73e121de1c	[GPU] Enable optimize_for_metal in fbcode (#47102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47102 Since the current mobile end to end workflow involves using `optmize_for_mobile` in python, the goal here is to be able to use `optmize_for_mobile(m, backend="metal")` in fbcode. ghstack-source-id: 115749752 Test Plan: 1. Be able to export models for metal (see the next diff) 2. Make sure the change won't break the OSS workflow 3. Make sure the change won't break on the mobile bulild. Reviewed By: xcheng16 Differential Revision: D24644422 fbshipit-source-id: bd77e22f0799533a96d048207932055fd051a67e	2020-11-03 00:58:55 -08:00
Tao Xu	ad3a3bd0d6	[GPU] Add an attribute to the torchscript model exported by metal (#47174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47174 As title ghstack-source-id: 115747991 Test Plan: Sandcastle Reviewed By: kimishpatel Differential Revision: D24616430 fbshipit-source-id: 2ccd264688471788f0dfea8bdc234fa69d39817f	2020-11-03 00:54:19 -08:00
Jerry Zhang	0ead9d545a	[quant][graphmode][fx] Add test for non quantized embedding and embeddingbag (#47092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47092 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D24637423 fbshipit-source-id: baaa431931242072edd9519a3393efba7469da6f	2020-11-02 23:56:43 -08:00
Alex Suhan	4df7eefa06	[TensorExpr] Support LLVM versions 8 through 12 (#47033 ) Summary: Adjust llvm_{codegen, jit}.cpp to support LLVM versions 8 through 12. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47033 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVM* Reviewed By: bertmaher Differential Revision: D24689903 Pulled By: asuhan fbshipit-source-id: 2654bb7eb2ab6a95a5527c079b07ed8552c51bde	2020-11-02 22:32:11 -08:00
Taylor Robie	ac8a8185eb	expose Timer docs to PyTorch website. (#46880 ) Summary: CC: gchanan jspisak seemethere I previewed the docs and they look reasonable. Let me know if I missed anything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46880 Reviewed By: seemethere, izdeby Differential Revision: D24551503 Pulled By: robieta fbshipit-source-id: 627f73d3dd4d8f089777bca8653702735632b9fc	2020-11-02 21:59:29 -08:00
Christian Puhrsch	09a52676ad	Add NestedTensor specific dispatch key to PyTorch (#44668 ) Summary: This adds a dedicated dispatch key for the [nestedtensor project](https://github.com/pytorch/nestedtensor). - [ ] Since this isn't a device or a backend, does this need further updates in other places other than DispatchKey.h? Pull Request resolved: https://github.com/pytorch/pytorch/pull/44668 Reviewed By: zhangguanheng66, ailzhang Differential Revision: D23998801 Pulled By: cpuhrsch fbshipit-source-id: 133b5a9a04c4f61c27c0728832da09e4b38a5939	2020-11-02 21:35:54 -08:00
Wang Xu	1fe273d798	add node by node cost function (#47009 ) Summary: This PR adds node-by-node cost function. Given a partition of nodes, get_latency_of_one_partition function will find the critical path in the partition and return its latency. A test unit is also provided. In the test unit, a graph module is partitioned into two partitions and the latency of each partition is tested. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47009 Reviewed By: gcatron Differential Revision: D24692542 Pulled By: scottxu0730 fbshipit-source-id: 64c20954d842507be0d1afa2516d88f705e11224	2020-11-02 21:15:43 -08:00
Gao, Xiang	084b71125f	Fix bug in toComplexWithDefault (#43841 ) Summary: I don't think this method is used anywhere, so I don't know how to test it. But the diff should justify itself. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43841 Reviewed By: mruberry Differential Revision: D24696505 Pulled By: anjali411 fbshipit-source-id: f2a249ae2e078b16fa11941a048b7d093e60241b	2020-11-02 21:07:08 -08:00
Yi Wang	b1b77148ac	Back out "[Gradient Compression] Surface C++ comm hooks to Python API as built-in comm hooks" (#47234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47234 Revert the diff because of https://github.com/pytorch/pytorch/issues/47153 Original PR issue: C++ DDP Communication Hook https://github.com/pytorch/pytorch/issues/46348 ghstack-source-id: 115720415 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D24691866 fbshipit-source-id: 58fe0c45943a2ae2a09fe5d5eac4a4d947586539	2020-11-02 20:51:18 -08:00
Ivan Kobzarev	2cff3bba58	[vulkan_api][ops] Mm, Pool, Upsample (#47063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47063 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D24624610 Pulled By: IvanKobzarev fbshipit-source-id: b6cf555506ea0e2426fa77c53b9d25ffb95d5bbc	2020-11-02 19:02:30 -08:00
Anthony Liu	b0e954fff5	quantize_tensor_per_channel ARM implementation (#46018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46018 Currently on mobile devices quantize_tensor has a vectorized implementation using ARM intrinsics; however quantize_tensor_per_channel does not. Test Plan: Build for ARM Neon ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI="armeabi-v7a with NEON" ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON ``` Build for ARM64 ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON ``` Then run the benchmark binary over adb shell. Note that by android cpu is not frequency locked by default and can lead to noisy benchmark results, but this can be changed by running the following for every cpu. ``` adb shell "echo userspace > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_governor" adb shell "echo '2000000' > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_setspeed" adb push build_android/bin/quantize_per_channel /data/local/tmp/ adb shell "/data/local/tmp/quantize_per_channel" ``` Resulting benchmarks are located [here](https://gist.github.com/AJLiu/d1711bb6a5e93b3338eca2c14c8aec9f) Google spreadsheet comparing results [here](https://docs.google.com/spreadsheets/d/1Ky-rEu2CqOqex2a84b67hB1VLAlfEDgAN2ZXe8IlGF8/edit?usp=sharing) Reviewed By: kimishpatel Differential Revision: D24286528 fbshipit-source-id: 5481dcbbff8345a2c0d6cc9b7d7f8075fbff03b3	2020-11-02 18:31:19 -08:00
Zachary DeVito	ecfa7a27b8	[jit] fix traced training attribute (#47211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47211 The attribute is getting shadowed by the default one set on all modules, and the __setattr__ on the TracedModule object prevents setting it correctly. import torch inp = torch.zeros(1, 3, 224, 224) model = torch.hub.load('pytorch/vision:v0.6.0', 'mobilenet_v2', pretrained=True) model.eval() print(model.training) with torch.no_grad(): traced = torch.jit.trace(model, inp) print(traced.training) traced.eval() print(traced.training) traced.training = False print(traced.training) torch.jit.freeze(traced) Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D24686690 Pulled By: zdevito fbshipit-source-id: 9c1678dc68e9bf83176e9f5a20fa8f6bff5d69a0	2020-11-02 17:28:49 -08:00
Anthony Liu	27f4a78bb8	Add benchmark for per channel tensor quantization (#46017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46017 Currently on mobile only per tensor quantization is optimized for mobile using ARM intrinsics. This benchmark is dded to help gauge performance improvement on mobile after performing the same optimizations for per channel quantization. Test Plan: Build for ARM Neon ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI="armeabi-v7a with NEON" ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON ``` Build for ARM64 ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON ``` Then run the benchmark binary over adb shell. Note that by android cpu is not frequency locked by default and can lead to noisy benchmark results, but this can be changed by running the following for every cpu. ``` adb shell "echo userspace > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_governor" adb shell "echo '2000000' > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_setspeed" adb push build_android/bin/quantize_per_channel /data/local/tmp/ adb shell "/data/local/tmp/quantize_per_channel" ``` Reviewed By: kimishpatel Differential Revision: D24286488 fbshipit-source-id: 1e7942f0bb3d9d1fe172409d522be9f351a485bd	2020-11-02 17:11:16 -08:00
Kimish Patel	82b74bd929	For torch::jit::module's attr method to moble::module (#47059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47059 This diff adds attr getter to mobile::module similar to torchscript module at https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/api/object.h#L75-L83. Test Plan: LiteInterpreterTest::CheckAttrAccess Reviewed By: xta0 Differential Revision: D24604950 fbshipit-source-id: cfac187f47f5115807dc119fe6c203f60dbd5dff	2020-11-02 16:38:12 -08:00
Hao Lu	b6685d3863	[PT] optional -> c10::optional (#47144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47144 Change `optional` to `c10::optional` to avoid conflicting with `std::optional` in files that happen to include both. Test Plan: contbuild Reviewed By: yinghai Differential Revision: D24662515 fbshipit-source-id: 1e72fbc791d585e797a7239305ab5e3f82ddfec9	2020-11-02 16:33:36 -08:00
Jerry Zhang	be2e3dd2a1	[quant][graphmode][fx][fix] Linear work with float_qparam_dynamic_qconfig (#47068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47068 Filter the dtype config before performing the quantization in linear Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D24627907 fbshipit-source-id: 162fa47b3fcf6648049f8bc0438e41ee97ac19e9	2020-11-02 16:28:33 -08:00
anjali411	cedeee2cd4	Add scalar.conj() and update backward formulas for add and sub (#46596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46596 1. Added `conj` method for scalar similar to numpy. 2. Updates backward formulas for add and sub to work correctly for R -> C cases and for the case when alpha is complex. 3. Enabled complex backward for nonzero (no formula update needed). Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24529227 Pulled By: anjali411 fbshipit-source-id: da871309a6decf5a4ab5c561d5ab35fc66b5273d	2020-11-02 16:17:00 -08:00
Richard Zou	86151da19e	Port CPU Trace from TH to ATen (#47126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47126 Context ------- This PR is a rebase of shihongzhi's https://github.com/pytorch/pytorch/pull/35360. I forgot to merge it back when it was submitted so I rebased it and ran new benchmarks on it. Benchmarks ---------- TL;DR: The op has more overhead than the TH version but for larger shapes the overhead disappears. ``` import torch shapes = [ [1, 1], [100, 100], [1000, 1000], [10000, 10000], [100000, 100000], ] for shape in shapes: x = torch.ones(shape) %timeit x.trace() Before: 1.83 µs ± 42.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 1.98 µs ± 48.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 3.19 µs ± 10.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 85.2 µs ± 700 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 1.23 ms ± 4.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) After: 2.16 µs ± 325 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 2.08 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 4.45 µs ± 19.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 81.8 µs ± 766 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 1.27 ms ± 6.75 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Future work ----------- Things that can be done after this PR: - add complex tensor support - Fix the type promotion discrepancy between CPU and CUDA Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24683259 Pulled By: zou3519 fbshipit-source-id: f92b566ad0d58b72663ab64899d209c96edb78eb	2020-11-02 16:03:22 -08:00
Richard Zou	8054ae3e77	Add test for trace (#47125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47125 We didn't actually have any tests for torch.trace. The tests expose a discrepancy between the behavior of torch.trace on CPU and CUDA that I'll file an issue for. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24683260 Pulled By: zou3519 fbshipit-source-id: 71dd3af62bc98c6b9b0ba2bf2923cb6d44daa640	2020-11-02 16:00:33 -08:00
Raghavan Raman	f58842c214	Enable inlining into reductions (#47020 ) Summary: This diff enables inlining producers into reductions. It also guards against inlining reductions themselves. Prior to this diff, if there was a reduction in the loopnest, no inlining was happening. After this change, we will inline all non-output buffers that do not correspond to a reduction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47020 Reviewed By: albanD Differential Revision: D24644346 Pulled By: navahgar fbshipit-source-id: ad234a6877b65be2457b734cbb7f3a1800baa6a5	2020-11-02 15:33:38 -08:00
Thomas Viehmann	b5a1be02a0	Add RAII DetectAnomalyGuard (#47164 ) Summary: This is a followup to the C++ anomaly detection mode, implementing the guard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47164 Reviewed By: mruberry Differential Revision: D24682574 Pulled By: albanD fbshipit-source-id: b2224a56bf6eca0b90b8e10ec049cbcd5af9d108	2020-11-02 15:07:59 -08:00
Jane Xu	ebf36ad3da	Remove travis-python references as well as some unnecessary dependencies (#47209 ) Summary: This PR attempts to remove unneeded installations of `pip` among other packages in `install_base.sh` since these very same packages are already installed elsewhere (for example in `install_conda.sh`). In the process, I found some old `TRAVIS_PYTHON_VERSION` references that are no longer needed, so I removed all references that need `install_travis_python.sh` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47209 Reviewed By: mruberry Differential Revision: D24690079 Pulled By: janeyx99 fbshipit-source-id: f8fef4cda9832c868595d4745d811fc7d42df34d	2020-11-02 15:01:05 -08:00
Sam Estep	42b6f96764	Make "Run flake8" step always succeed again (#47236 ) Summary: In https://github.com/pytorch/pytorch/issues/46990 I asked whether the "Run flake8" step was supposed to always succeed (so that the "Add annotations" step would be sure to run). The reviewers and I weren't sure of the answer to that question, so we merged it anyway, but that turned out to be wrong: https://github.com/pytorch/pytorch/runs/1327599980 So this PR fixes that issue introduced by https://github.com/pytorch/pytorch/issues/46990. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47236 Reviewed By: janeyx99 Differential Revision: D24692359 Pulled By: samestep fbshipit-source-id: c12382de6945245d6251ce792896e5e688f480af	2020-11-02 14:53:38 -08:00
Jeffrey Wan	f5073b0c5a	Add `inputs` argument to `autograd.backward()` (#46855 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46373 As noted in https://github.com/pytorch/pytorch/issues/46373, there needs to be a flag passed into the engine that indicates whether it was executed through the backward api or grad api. Tentatively named the flag `accumulate_grad` since functionally, backward api accumulates grad into .grad while grad api captures the grad and returns it. Moving changes not necessary to the python api (cpp, torchscript) to a new PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46855 Reviewed By: ngimel Differential Revision: D24649054 Pulled By: soulitzer fbshipit-source-id: 6925d5a67d583eeb781fc7cfaec807c410e1fc65	2020-11-02 14:32:38 -08:00
Heitor Schueroff	18470f68bc	Fix max_pool1d on discontiguous tensor (#47065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47065 #fixes https://github.com/pytorch/pytorch/issues/47054 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24633342 Pulled By: heitorschueroff fbshipit-source-id: b318f3a4fe68e538c71b147a82b62367f23146fa	2020-11-02 14:21:31 -08:00
Brian Hirsh	b3eb0c86cf	Revert D24335982: explicitly error out in comparison ops when the types don't match Test Plan: revert-hammer Differential Revision: D24335982 (`60fea510a1`) Original commit changeset: 3dfb02bcb403 fbshipit-source-id: 00072f1b00e228bbbe295053091cf4a7a46f4668	2020-11-02 14:08:01 -08:00
Tao Xu	7f125bca1c	[Metal] Add pin_memory check in empty_strided (#47228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47228 Add the false checking if pin_memory has been specified to `False` ghstack-source-id: 115715087 Test Plan: - CircleCI - Sandcastle Reviewed By: IvanKobzarev Differential Revision: D24690472 fbshipit-source-id: c65fc494fcd7b0b409a80c86e108a029ca7fd71e	2020-11-02 14:00:12 -08:00
Richard Barnes	e03820651a	Make conversions explicit (#46835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46835 We make explicit a couple of previously implicit down/narrowing conversions. This fixes a couple of compiler warnings. Test Plan: Standard pre-commit test rig. Reviewed By: ngimel Differential Revision: D24481427 fbshipit-source-id: 8c9b0215a662ccdef8e2ba3df5f78ef110071f7b	2020-11-02 13:54:00 -08:00
Xiong Wei	22b3d414de	Enhance the torch.pow testcase for the complex scalar base (#47101 ) Summary: Related https://github.com/pytorch/pytorch/issues/45259 This PR is to address the https://github.com/pytorch/pytorch/pull/45259#discussion_r514390664 - leverage the `make_tensor` function to generate a random tensor as the exponent, preventing the full zeros for the integer exponent. - add some special cases for the zero exponents and the `1 + 0j` base. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47101 Reviewed By: mruberry Differential Revision: D24682430 Pulled By: zou3519 fbshipit-source-id: f559dc0ba08f37ae070036fb25a52ede17a24149	2020-11-02 13:13:15 -08:00
Guilherme Leobas	9b52654620	annotate a few torch.nn.modules.* modules (#45772 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45771 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45772 Reviewed By: mruberry Differential Revision: D24682013 Pulled By: albanD fbshipit-source-id: e32bc4fe9c586c079f7070924a874c70f3d127fa	2020-11-02 13:04:59 -08:00
Stephen Jia	7178790381	Add vulkan clamp op (#47196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47196 Added vulkan clamp ops Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D24688470 Pulled By: SS-JIA fbshipit-source-id: b74d6718811972904816441e93515a982a518fd9	2020-11-02 12:48:04 -08:00
Rong Rong	96b23f7db1	add sandcastle device type test base discovery (#47119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47119 Test Plan: tests 1. test_cuda still works: `buck test --no-cache -c test.external_runner=tpx mode/dev-nosan //caffe2/test:cuda -- --use-remote-execution --force-tpx` 2. test_torch is blocked on D24623962 `buck test --no-cache -c test.external_runner=tpx mode/dev-nosan //caffe2/test:torch -- --use-remote-execution --force-tpx` Reviewed By: mruberry Differential Revision: D24649868 fbshipit-source-id: 97cb41996ea0c37a66a4bf2154e254d2d2912a17	2020-11-02 12:22:30 -08:00
Wanchao Liang	70d58031d7	[c10] make intrusive_ptr available as a pybind holder type (#44492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44492 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D23632278 Pulled By: wanchaol fbshipit-source-id: b9796e15074d68a347de443983abf7f052a3cdfe	2020-11-02 12:11:45 -08:00
Dhruv Matani	6852cbb952	[RFC] Better error message in case operator could not be run (#46885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46885 There seem to be a lot of situations where users are running into problems with missing operators (these users are mostly within FB for now since the mobile focus is currently on internal use cases). To avoid wasting their time, we would very much like to point them in a more actionable direction. This diff attempts to do just that. Please find additional context at: https://fb.workplace.com/groups/894363187646754/permalink/1081939438889127/ Previous message: ``` Could not run 'aten::less_equal.Scalar' with arguments from the 'CPU' backend. ``` New message: ``` Could not run 'aten::less_equal.Scalar' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omited during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. ``` ghstack-source-id: 115691682 Test Plan: Sandcastle Reviewed By: ezyang Differential Revision: D24552243 fbshipit-source-id: fb78b1ab2c1fa0e1faf5537cbf0575256391f081	2020-11-02 11:55:34 -08:00
Nikita Shulga	c5ae875179	Add bfloat support for torch.randn and torch.norm (#47143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47143 Reviewed By: pbelevich Differential Revision: D24664407 Pulled By: malfet fbshipit-source-id: c63ff1cbb812751aba4c56e64e6ee1008cfc2d7f	2020-11-02 11:49:21 -08:00
Brian Hirsh	60fea510a1	explicitly error out in comparison ops when the types don't match (#46399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46399 Explicitly error out in comparison/logical ops when the dtypes of the various input/output tensors don't match. See [this comment](https://github.com/pytorch/pytorch/pull/46399#discussion_r505686406) for more details. fixes #42660 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24335982 Pulled By: bdhirsh fbshipit-source-id: 3dfb02bcb403dda5bcbf5ed3eae543354ad698b2	2020-11-02 11:42:32 -08:00
Yen-Jung Chang	6e22b6008d	[MLF] Allow for computing prune quantile thresholds on absolute value of indicators in distributed-inference-compatible embedding LUT pruning (#46789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46789 1. Now `SelfBinningHistogram` can calculate the binning histogram using the absolute values from the given an array of values. 2. Update the invocation of `SelfBinningHistogram` in `post_training_prune`. Test Plan: 1. [buck test caffe2/caffe2/python/operator_test:self_binning_histogram_test](https://www.internalfb.com/intern/testinfra/testconsole/testrun/6473924488326108/) 2. [buck test dper3/dper3_backend/delivery/tests:post_training_prune_test](https://www.internalfb.com/intern/testinfra/testconsole/testrun/2251799854023163/) Reviewed By: hwangjeff Differential Revision: D24494097 fbshipit-source-id: 95e47137b25746e686ef9baa9409560af5d58fc1	2020-11-02 11:31:31 -08:00
Jeff Daily	6906701bde	[ROCm] enable stream priorities (#47136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47136 Reviewed By: mruberry Differential Revision: D24672457 Pulled By: ngimel fbshipit-source-id: 54f60c32df87cbd40fccd7fb1ecf0437905f01a3	2020-11-02 11:25:44 -08:00
Francisco Javier Ponce	c2e123331a	Check CUDA kernel launches (fbcode/caffe2/aten/src/ATen/native/cuda/) (#47207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47207 Add a safety check TORCH_CUDA_KERNEL_LAUNCH_CHECK() after each kernel launch. This diff only chnges the files inside the directory fbcode/caffe2/aten/src/ATen/native/cuda/. Will create similar DIFFS per directory. Test Plan: Run test with: ``` buck build //caffe2/aten:ATen-cu ``` Results: ``` [fjponce@30644.od ~/fbcode (c32bce1c)]$ buck build //caffe2/aten:ATen-cu Building: finished in 0.8 sec (100%) 1/1 jobs, 0 updated Total time: 1.0 sec More details at https://www.internalfb.com/intern/buck/build/c8d463e5-2d8b-4566-97f0-2d355eda8f2d [fjponce@30644.od ~/fbcode (b78b1f2d)]$ ``` The files does not appear anymore in the list when executing python script https://www.internalfb.com/intern/paste/P147803236/ Reviewed By: r-barnes Differential Revision: D24685062 fbshipit-source-id: 6ef7989d28b6629752d98dc36dd4a92c2507204c	2020-11-02 11:09:36 -08:00
Jeff Daily	0d6bf8864b	add rocm 3.9 to nightly builds (#47121 ) Summary: Corresponding pytorch builder repo update: https://github.com/pytorch/builder/pull/561. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47121 Reviewed By: samestep Differential Revision: D24660850 Pulled By: walterddr fbshipit-source-id: 68b22e0a2d341396eb1cdcfaa0a413ce7ad93ca3	2020-11-02 10:18:45 -08:00
anjali411	da26858c9c	Add complex backward support for torch.exp (#47194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47194 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D24683201 Pulled By: anjali411 fbshipit-source-id: c447dec51cbfe7c09d6943fbaafa94f48130d582	2020-11-02 09:39:44 -08:00
Alban Desmaison	c10aa44e33	Back out "Providing more information while crashing process in async error handling" (#47185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47185 Original commit changeset: 02d48f13352a Test Plan: CI Reviewed By: mruberry Differential Revision: D24682055 fbshipit-source-id: 060efa29eb2f322971848ead447021f6972cb3f3	2020-11-02 08:34:30 -08:00
Facebook Community Bot	85e5b76f17	Automated submodule update: FBGEMM (#47190 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `5b7566f412` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47190 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D24682770 fbshipit-source-id: da11d6039c3e158444253d3c6237e3ee71d5afb5	2020-11-02 07:51:50 -08:00
Venkata Chintapalli	1cc1da5411	LayerNormInt8QuantizeFakeNNPI fix to match ICEREF. (#47140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47140 LayerNorm + Int8Quantize fix to match ICEREF. (Note: this ignores all push blocking failures!) Test Plan: buck test --debug //caffe2/caffe2/contrib/fakelowp/test:test_layernorm_nnpi_fp16nnpi -- test_fused_ln_quantize --print-passing-details https://internalfb.com/intern/testinfra/testrun/7881299371969005 Reviewed By: hyuen Differential Revision: D24659904 fbshipit-source-id: 026d1a1f69a68eca662a39752af5ab0756bace2d	2020-11-01 14:31:38 -08:00
Meghan Lele	19ede75eb9	[JIT] Enable ModuleDict non-literal indexing (#45716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45716 Summary This commit enables indexing into `ModuleDict` using a non-literal index if the `ModuleDict` is annotated with `Dict[str, X]`, where `X` is a module interface type. These annotations must be expressed using a class attribute named `__annotations__`, which is a `Dict[str, Type]` where the keys are the names of module attributes and the values are their types. The approach taken by this commit is that these annotations are stored as "hints" along with the corresponding module attributes in the `ConcreteSubmoduleTypeBuilder` instance for each module (which might be a `ModuleDict`). These hints are passed into the `ModuleValue` that is created for desugaring operations on submodules so that indexing into a `ModuleDict` can be emitted as a getitem op into a dict emitted into the graph that represents the `ModuleDict`. Test Plan This commit adds unit tests to `TestModuleContainers` to test this feature (`test_typed_module_dict`). Differential Revision: D24070606 Test Plan: Imported from OSS Reviewed By: ansley Pulled By: SplitInfinity fbshipit-source-id: 6019a7242d53d68fbfc1aa5a49df6cfc0507b992	2020-10-31 21:36:23 -07:00
Natalia Gimelshein	317b78d56e	Revert D24665950: Create prototype for AST rewriter Test Plan: revert-hammer Differential Revision: D24665950 (`54feb00bbd`) Original commit changeset: b72110436126 fbshipit-source-id: 961412df006acd33c91a745c809832d5c6494c76	2020-10-31 18:07:10 -07:00
Ansley Ussery	54feb00bbd	Create prototype for AST rewriter (#46410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46410 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D24665950 Pulled By: ansley fbshipit-source-id: b72110436126a24ddc294b8ee7b3f691281c1f1b	2020-10-31 10:51:17 -07:00
Yi Wang	ee0033af9b	[Gradient Compression] Surface C++ comm hooks to Python API as built-in comm hooks (#46959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46959 1. Implemented the Pybind part. 2. In the reducer, once the builtin_comm_hook_type is set, a c++ comm hook instance will be created in Reducer::autograd_hook. 3. Added unit tests for the builit-in comm hooks. Original PR issue: C++ DDP Communication Hook https://github.com/pytorch/pytorch/issues/46348 ghstack-source-id: 115629230 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl Reviewed By: pritamdamania87 Differential Revision: D24471910 fbshipit-source-id: f96b752298549ea2067e2568189f1b394abcd99a	2020-10-30 23:19:42 -07:00
Ricardo Juan Palma Duran	e3f912e8b7	Revert D24655999: [fbcode] Make model reader utilities. Test Plan: revert-hammer Differential Revision: D24655999 (`7f056e99dd`) Original commit changeset: 5095ca158d89 fbshipit-source-id: c43f672def7331667421e01b90f979940366e3c9	2020-10-30 21:00:42 -07:00
Xingying Cheng	7f056e99dd	[fbcode] Make model reader utilities. Summary: For some of the end to end flow projects, we will need the capabilities to read module information during model validation or model publishing. Creating this model_reader.py for utilities for model content reading, this diff we included the following functionalities: 1. read the model bytecode version; 2. check if a model is lite PyTorch script module; 3. check if a model is PyTorch script module. Test Plan: ``` [xcheng16@devvm1099]/data/users/xcheng16/fbsource/fbcode% buck test pytorch_mobile/utils/tests:mobile_model_reader_tests Processing filesystem changes: finished in 1.5 sec Parsing buck files: finished in 1.6 sec Building: finished in 4.9 sec (100%) 9249/43504 jobs, 2 updated Total time: 6.5 sec More details at https://www.internalfb.com/intern/buck/build/6d0e2c23-d86d-4248-811f-31cb1aa7eab3 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 2ffccd62-ece5-44b5-8350-3a292243fad9 Trace available for this run at /tmp/tpx-20201030-122220.664763/trace.log Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649711969390 ✓ ListingSuccess: pytorch_mobile/utils/tests:mobile_model_reader_tests - main (10.234) ✓ Pass: pytorch_mobile/utils/tests:mobile_model_reader_tests - test_is_pytorch_lite_module (pytorch_mobile.utils.tests.test_model_reader.TestModelLoader) (7.039) ✓ Pass: pytorch_mobile/utils/tests:mobile_model_reader_tests - test_is_pytorch_script_module (pytorch_mobile.utils.tests.test_model_reader.TestModelLoader) (7.205) ✓ Pass: pytorch_mobile/utils/tests:mobile_model_reader_tests - test_read_module_bytecode_version (pytorch_mobile.utils.tests.test_model_reader.TestModelLoader) (7.223) Summary Pass: 3 ListingSuccess: 1 Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649711969390 Reviewed By: husthyc Differential Revision: D24655999 fbshipit-source-id: 5095ca158d89231fb17285d445548f91ddb89bab	2020-10-30 19:04:14 -07:00
Jane Xu	1aa57bb761	Moving coverage, xunit, pytest installation to Docker (#47082 ) Summary: Fixes a TODO. This PR moves `pip_install unittest-xml-reporting coverage pytest` to the base Ubuntu docker instead of running that installation during every test. This should save some time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47082 Reviewed By: samestep Differential Revision: D24634393 Pulled By: janeyx99 fbshipit-source-id: 3b980890409eafef9b006b9e03ad7f3e9017529e	2020-10-30 18:34:44 -07:00
Horace He	cb4b6336ba	[FX] Fix handling of attributes (#47030 ) Summary: Probably works :) Fixes https://github.com/pytorch/pytorch/issues/46872 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47030 Reviewed By: ngimel Differential Revision: D24652600 Pulled By: Chillee fbshipit-source-id: 3fe7099ad02d1b5c23a7335b855d36d373603d18	2020-10-30 17:08:58 -07:00
Omkar Salpekar	7eb427e931	Providing more information while crashing process in async error handling (#46274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46274 We crash the process in NCCL Async Error Handling if the collective has been running for greater than some set timeout. This PR logs more information about the rank and duration the collective ran before throwing an exception. ghstack-source-id: 115614622 Test Plan: Run desync tests and flow. Here are the Flow runs showing the right messages: f225031389 f225032004 Reviewed By: jiayisuse Differential Revision: D24200144 fbshipit-source-id: 02d48f13352aed40a4476768c123d5cebbedc8e0	2020-10-30 16:22:51 -07:00
Mohamad H. Danesh	d1d6dc2e3c	Add more specific error message (#46905 ) Summary: While using `torch.utils.data.TensorDataset`, if sizes of tensors mismatch, there's now a proper error message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46905 Reviewed By: ngimel Differential Revision: D24565712 Pulled By: mrshenli fbshipit-source-id: 98cdf189591c2a7a1b693627cc8464e8f553d9ee	2020-10-30 16:03:44 -07:00
Thomas Viehmann	a81572cdc5	Add anomaly mode for C++ (#46981 ) Summary: This adds anomaly mode for C++. The backtrace isn't perfect yet, but it's a start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46981 Reviewed By: IvanKobzarev Differential Revision: D24631957 Pulled By: albanD fbshipit-source-id: 4b91e205e7e51f4cf0fbc651da5013a00a3b2497	2020-10-30 15:18:07 -07:00
Nikita Shulga	c86af4aa55	Disable NEON acceleration on older compilers (#47099 ) Summary: Optimized build compiled by gcc-7.5.0 generates numerically incorrect code Works around https://github.com/pytorch/pytorch/issues/47098 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47099 Reviewed By: walterddr Differential Revision: D24642272 Pulled By: malfet fbshipit-source-id: 2cfb43e950c0d1c92cfcee13749f1ad13248c39b	2020-10-30 13:33:42 -07:00
Jerry Zhang	085193c291	[quant][graphmode][fx][fusion] Add test for fuse_fx (#47085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47085 Both in train and eval mode Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24632457 fbshipit-source-id: 486aee4e073fb87e9da46a344e8dc77e848a60cf	2020-10-30 12:25:54 -07:00
Jane Xu	1dd220bd84	Add C++ coverage for Ubuntu cpu tests (#46656 ) Summary: In order to enable C++ code coverage for tests, we need to build pytorch with the correct coverage flags. This PR should introduce a build that allows coverage tests to stem from a specific coverage build. This PR does the following: 1. Adds a new build to `-coverage_` build with the correct `--coverage` flag for C++ coverage 2. Calls `lcov` at the end of testing to capture C++ coverage results 3. Pushes C++ results along with Python results 4. Shards the coverage test to not take ~4hrs Pull Request resolved: https://github.com/pytorch/pytorch/pull/46656 Reviewed By: walterddr Differential Revision: D24636213 Pulled By: janeyx99 fbshipit-source-id: 362a1a2a20c069ba0a7931669194dac53ac81133	2020-10-30 11:11:14 -07:00
Nikita Shulga	edac4060d7	Fix mul cuda for bool (#47031 ) Summary: Also, add tests for tensor by scalar multiplication / division Fixes https://github.com/pytorch/pytorch/issues/47007 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47031 Reviewed By: walterddr Differential Revision: D24608874 Pulled By: malfet fbshipit-source-id: 4e15179904814d6e67228276d3d11ff1b5d15d0d	2020-10-30 10:38:32 -07:00
Basil Hosmer	69fe10c127	use bitfield to shrink TensorImpl (#45263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45263 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23900587 Pulled By: bhosmer fbshipit-source-id: 9214b887fde010bd7c8be848ee7846329c35752f	2020-10-30 10:18:44 -07:00
Basil Hosmer	99fed7bd87	faster TensorOptions merging (#45046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45046 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23806787 Pulled By: bhosmer fbshipit-source-id: 3c8304f9a4503658081f8805ec06da78a467e125	2020-10-30 10:18:40 -07:00
Basil Hosmer	c7fc8cab3b	track Half/ComplexHalf default dtype (#45043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45043 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23806097 Pulled By: bhosmer fbshipit-source-id: 1c816b09c1e6b3c7ba85ed43d8e6c2518a768da4	2020-10-30 10:18:38 -07:00
Basil Hosmer	f05b66b70d	pass TypeMeta by value (#45026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45026 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23802943 Pulled By: bhosmer fbshipit-source-id: 81b06ef00bf8eb4375c0e0ff2032e03bd1d1188a	2020-10-30 10:14:17 -07:00
Heitor Schueroff	2643800881	Fix max_pool2d with ceil_mode bug (#46558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46558 This PR fixes a bug with how pooling output shape was computed. ## BC Breaking Notes Previously, a bug in the pooling code allowed a sliding window to be entirely off bounds. Now, sliding windows must start inside the input or left padding (not right padding, see https://github.com/pytorch/pytorch/issues/46929) and may only go off-bounds if ceil_mode=True. fixes #45357 TODO - [x] Ensure existing tests are checking for the correct output size Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24633372 Pulled By: heitorschueroff fbshipit-source-id: 55925243a53df5d6131a1983076f11cab7516d6b	2020-10-30 09:36:04 -07:00
Facebook Community Bot	7df0224cba	Automated submodule update: FBGEMM (#47071 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `39d5addbff` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47071 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: albanD Differential Revision: D24628407 fbshipit-source-id: 9b636f66b92e5853cafd521e704996ebc2faa954	2020-10-30 08:15:44 -07:00
lixinyu	67b7e751e6	add warning if DataLoader is going to create excessive number of thread (#46867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46867 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24545540 Pulled By: glaringlee fbshipit-source-id: a3bef0d417e535b8ec0bb33f39cfa2308aadfff0	2020-10-30 07:54:23 -07:00
Bugra Akyildiz	eec201c138	Add last_n_window_collector Summary: Add `last_n_window_collector` as C2 supports and PyTorch currently does not have this operator: https://www.internalfb.com/intern/diffusion/FBS/browsefile/master/fbcode/caffe2/caffe2/operators/last_n_window_collector.cc?lines=139 ## Problem that we are solving This operator works on multiple pieces of data and collects last `n` element that has been seen. If you have the following pieces of data that has been passed around: ``` [1, 2, 3, 4] [5, 6, 7] [8, 9, 10, 11] ``` for 3 times and the number of collector is given to be 6. The expected result is: ``` [6, 7, 8, 9, 10, 11] ``` What this means is that, almost like we need a FIFO(First in First Out) mechanism where as we are passing this data through the collector, we will be pushing some other data at the end. In this particular example, in the first pass(the data is `[1, 2, 3, 4]`) , we hold `[1, 2, 3, 4]` in the queue as our queue size is 6. In the second pass(the data is `[5, 6, 7]`), we hold `[2, 3, 4, 5, 6, 7]` in the queue and since 1 is inserted the last, it will drop due to the size limitation of the queue. In the third pass(the data is `[8, 9, 10, 11]`), we hold `[6, 7, 8, 9, 10, 11]` in the queue and `2,3,4,5` are dropped due the the size of the queue. For multidimension case, when we have the following data: ``` [[1, 2], [2, 3], [3, 4], [4, 5]] [[5, 6], [6, 7], [7, 8]] [[8, 9], [9, 10], [10, 11], [11, 12]] ``` and our queue size is 6. In the first pass, we will have ` [[1, 2], [2, 3], [3, 4], [4, 5]]` In the second pass, we will have `[2, 3], [3, 4], [4, 5]] [[5, 6], [6, 7], [7, 8]]` In the third pass, we will have `[6, 7], [7, 8]] [[8, 9], [9, 10], [10, 11], [11, 12]]` ### The implementation I am using FIFO queue in Python which is in the collections library. This accepts `maxlen` argument which can be used to set the size of the queue. I am using last n indices of the tensor through list indices and in this operator, I am not doing copy. In the test plan, I have both single dimension tensors as well as multi-dimension tensors. ### Benchmark I used various different configurations and added a benchmark test. PyTorch implementation is much master than Caffe2 implementation: #### CPU Benchmark ``` torch_response.median 0.00019254473969340324 caffe_response.median 0.00030233583599794657 ``` #### GPU Benchmark ``` torch_response.mean 0.000081007429903838786 caffe_response.mean 0.00010279081099724863 ``` Test Plan: ### For CPU: ``` buck test //caffe2/torch/fb/sparsenn:test ``` ### For GPU: - Used an on-demand machine and did the following commands: ``` jf get D24435544 buck test mode/opt //caffe2/torch/fb/sparsenn:test ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/4222124688138052/ Reviewed By: dzhulgakov, radkris-git Differential Revision: D24435544 fbshipit-source-id: 8193b4746b20f2a4920fd4d41271341045cdcee1	2020-10-30 02:35:54 -07:00
Wang Xu	6c34aa720c	add add_node function for partition to fix partition mem size calculation (#47083 ) Summary: Placeholders and constants in the partition are counted twice when combining two partitions. This PR fixes it by adding add_node function into Partition class. A unit test is also updated to test if the partition size is correct Pull Request resolved: https://github.com/pytorch/pytorch/pull/47083 Reviewed By: gcatron Differential Revision: D24634368 Pulled By: scottxu0730 fbshipit-source-id: ab408f29da4fbf729fd9741dcb3bdb3076dc30c4	2020-10-30 01:59:42 -07:00
Meghan Lele	f9d32c4fa8	[JIT] Add selective backend lowering API (#43613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43613 Summary This commit adds a helper/utility to faciliate the selective lowering of specific submodules within a module hierarchy to a JIT backend. The reason that this is needed is that lowering a submodule of a scripted module to a backend after the module has been scripted requires adjusting its JIT type. Test Plan This commit refactors `NestedModuleTest` in `jit/test_backends.py` to use this new selective lowering API. Fixes This commit fixes ##41432. Test Plan: Imported from OSS Reviewed By: mortzur Differential Revision: D23339855 Pulled By: SplitInfinity fbshipit-source-id: d9e69aa502febbe04fd41558c70d219729252be9	2020-10-30 00:37:33 -07:00
Yanan Cao	0dbd72935a	Split comm hooks into python-dependent hooks and others (#47019 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/47019 Split comm hooks into python-dependent hooks and others This is needed because we plan to move most of c10d C++ implementation into `libtorch_*.so`, which can not have Python dependencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47019 Reviewed By: albanD Differential Revision: D24614129 Pulled By: gmagogsfm fbshipit-source-id: 3f32586b932a2fe6a7b01a3800f000e66e9786bb	2020-10-30 00:30:45 -07:00
Jiakai Liu	d95e1afad3	[pytorch] add script to run all codegen (#46243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46243 Add util script to test whether any codegen output changes. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24388873 Pulled By: ljk53 fbshipit-source-id: ef9ef7fe6067df1e0c53aba725fc13b0dfd7f4c2	2020-10-29 22:55:12 -07:00
Sam Estep	707d271493	Fix links in tools/build_variables.bzl (#47066 ) Summary: I know the `aten/src/ATen/core/CMakeLists.txt` link is now correct because that file was deleted in 061ed739c17028fe907737b52f50a495ab5b4617: ``` $ git log --full-history -- aten/src/ATen/core/CMakeLists.txt \| head -n 1 commit 061ed739c17028fe907737b52f50a495ab5b4617 $ git show 061ed739c17028fe907737b52f50a495ab5b4617^ \| head -n 1 commit f99a693cd9ff7a9b5fdc71357dac66b8192786d3 ``` But I can't tell what the `tools/cpp_build/torch/CMakeLists.txt` link is supposed to be, because that file (indeed, its entire parent directory) doesn't seem to have ever existed: ``` $ git log --full-history -- tools/cpp_build/torch ``` (The output of the above command is empty.) So I saw that the grandparent directory was deleted in 130881f0e37cdedc0e90f6c9ed84957aee6c80ef: ``` $ git log --full-history -- tools/cpp_build \| head -n 1 commit 130881f0e37cdedc0e90f6c9ed84957aee6c80ef $ git show 130881f0e37cdedc0e90f6c9ed84957aee6c80ef^ \| head -n 1 commit c6facc2aaa5a568756e971a9d2b7f2af282dff39 ``` And looking at [the history of that directory](`c6facc2aaa/tools/cpp_build`), I see that some of the last commits touch `torch/CMakeLists.txt`, so I'm just using that here and hoping it's correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47066 Reviewed By: albanD, seemethere Differential Revision: D24628980 Pulled By: samestep fbshipit-source-id: 0fe8887e323593ef1676c34d4b920aeeaebd8550	2020-10-29 18:47:46 -07:00
Jerry Zhang	366888a5e2	[quant][graphmode][fx] Remove logging for standalone module api calls (#47032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47032 these are not top level apis, not supposed to be called directly by user. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24610602 fbshipit-source-id: c5510f06b05499387d70f23508470b676aea582c	2020-10-29 18:39:43 -07:00
Xiaodong Wang	e3b55a8a65	[pytorch/ops] Concat fast path w/ zero tensor (#46805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46805 The current implementation goes with slow path if there is zero tensor in the list. This is inefficient. Use the fast path for torch.cat even if there are empty tensors. This wastes one thread block for the empty tensor, but still much better than the slow path. Test Plan: CI + sandcastle Reviewed By: ngimel Differential Revision: D24524441 fbshipit-source-id: 529c8af51ecf8374621deee3a9d16cacbd214741	2020-10-29 18:14:40 -07:00
James Reed	2e2dc5874b	Fix lint (#47095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47095 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D24639056 Pulled By: jamesr66a fbshipit-source-id: e4f7842eb0438675723d1cac78e20d13b96e802c	2020-10-29 18:09:23 -07:00
yangu	f5477e3703	Enable python code coverage on windows (#44548 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43897 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44548 Reviewed By: walterddr Differential Revision: D24582777 Pulled By: malfet fbshipit-source-id: 9b68b72ba356fef61461fc2446c73360f67ce0b4	2020-10-29 17:30:53 -07:00
Heitor Schueroff	ddeacf1565	Fix median bug on discontigous tensors (#46917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46917 fixes https://github.com/pytorch/pytorch/issues/46814 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24633412 Pulled By: heitorschueroff fbshipit-source-id: 54732671b298bdc2b04b13ab3a373892ee0933c3	2020-10-29 17:12:22 -07:00
James Reed	9bc8f071a3	[WIP] Move torch.fx into its own target (#46658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46658 ghstack-source-id: 115213192 Test Plan: waitforsadcastle Reviewed By: zdevito, vkuzo Differential Revision: D24374723 fbshipit-source-id: 2b5708001f5df2ffb21ea5e586e26030653ccdcf	2020-10-29 17:03:08 -07:00
Leon Gao	7190155408	[Transposed Conv]add ConvTranspose3d with FBGEMM as backend (#46608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46608 introduce frontend API for quantized transposed convolution with only FBGEMM as backend. ghstack-source-id: 115289210 Test Plan: https://www.internalfb.com/intern/testinfra/testconsole/testrun/7599824394184104/ Reviewed By: z-a-f Differential Revision: D24369831 fbshipit-source-id: b8babd3ddbe0df8f4c8bc652bb745f85e0813797	2020-10-29 16:18:43 -07:00
Michael Carilli	3c643d112e	Pin destination memory for cuda_tensor.to("cpu", non_blocking=True) (#46878 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39694. [`torch.cuda._sleep(int(100 * get_cycles_per_ms()))`](https://github.com/pytorch/pytorch/pull/46878/files#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R511-R513) in the test helps avoid flakiness noted by ngimel (https://github.com/pytorch/pytorch/pull/35144#issuecomment-602103631). Pull Request resolved: https://github.com/pytorch/pytorch/pull/46878 Reviewed By: izdeby Differential Revision: D24550403 Pulled By: xw285cornell fbshipit-source-id: 1ecc35ef75f9a38ab332aacdf4835955105edafc	2020-10-29 15:42:55 -07:00
Natalia Gimelshein	e17b8dea1d	fix calculation of number of elements to not overflow (#46997 ) Summary: Possibly fixes https://github.com/pytorch/pytorch/issues/46764. Computing number of tensor elements in many cases is written as ``` int64_t numel = std::accumulate(oldshape.begin(), oldshape.end(), 1, std::multiplies<int64_t>()); ``` This computes the product with the type of `1` literal, which is `int`. When there's more than INT_MAX elements, result overflows. In https://github.com/pytorch/pytorch/issues/46746, the tensor that was sent to reshape had 256^4 elements, and that was computed as `0`, so reshape was not done correctly. I've audited usages of std::accumulate and changed them to use int64_t as `init` type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46997 Reviewed By: albanD Differential Revision: D24624654 Pulled By: ngimel fbshipit-source-id: 3d9c5e6355531a9df6b10500eec140e020aac77e	2020-10-29 15:37:16 -07:00
Pritam Damania	78de12f588	Replace -f with -x for pytest tests. (#46967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46967 Tests under `tests/distributed/_pipeline/sync` use pytest and specifying the `-f` option for such tests as follows: `python test/run_test.py -i distributed/_pipeline/sync/skip/test_api -- -f` doesn't work. The equivalent option for pytest is `-x`. To resolve this issue, I've updated `run_test.py` to replace `-f` with `-x` for pytest tests. More details in https://github.com/pytorch/pytorch/issues/46782 #Closes: https://github.com/pytorch/pytorch/issues/46782 ghstack-source-id: 115440558 Test Plan: 1) waitforbuildbot 2) `python test/run_test.py -i distributed/_pipeline/sync/skip/test_api -- -f` Reviewed By: malfet Differential Revision: D24584556 fbshipit-source-id: bd87f5b4953504e5659fe72fc8615e126e5490ff	2020-10-29 15:28:06 -07:00
BowenBao	a4caa3f596	[ONNX] bump CI ort to 1.5.2 rel for stability (#46595 ) Summary: Recently the ort-nightly has become unstable and causing issues with CI tests. Switching to release package for now for stability, until the situation is improved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46595 Reviewed By: houseroad Differential Revision: D24566175 Pulled By: bzinodev fbshipit-source-id: dcf36e976daeeb17465df88f28bc9673eebbb7b7	2020-10-29 14:51:38 -07:00
Edward Yang	843cab3f2e	Delete TypeDefault.h and TypeDerived.h codegen entirely. (#47002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47002 There was no good reason for TypeDerived.h (CPUType.h) codegen to exist after static dispatch was deleted, and now that we have Math alias key TypeDefault.h header is not needed either. Sorry to anyone who was using these out of tree. I didn't entirely delete TypeDefault.h as it has a use in a file that I can't conveniently compile test locally. Will kill it entirely in a follow up. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24596583 Pulled By: ezyang fbshipit-source-id: b5095d3509098ff74f836c5d0c272db0b2d226aa	2020-10-29 14:43:53 -07:00
Edward Yang	c689b4d491	Delete TypeDefault call code generation logic in VariableType (#47000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47000 There is a new invariant that emit_body is only ever called when strategy is 'use_derived', which means we can delete a bunch of code. This removes the last use of TypeXXX.h headers. Note that this change makes sense, as the TypeDefault entries are registered as Math entries, which means they automatically populate Autograd (and we no longer have to register them ourselves). Ailing did all the hard work, this is just the payoff. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24596584 Pulled By: ezyang fbshipit-source-id: 6fa754b5f16e75cf2dcbf437887c0fdfda5e44b1	2020-10-29 14:43:50 -07:00
Edward Yang	41f8641f1e	Delete SchemaRegister.cpp, make flag operate on TypeDefault.cpp (#46991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46991 This change is motivated by a problem bdhirsh observed which is that in internal builds that include both SchemaRegister.cpp and TypeDefault.cpp, some operators have their schemas defined multiple times. Instead of dumping schema registrations in multiple files, it seems better to just toggle how many schemas we write into TypeDefault.cpp. ljk53 observes that technically SchemaRegister.cpp is only needed by full-JIT frontend, and not by light interpreter (to resolve schema lookups). However, in practice, the registration file seems to be unconditionally loaded. This change will make it harder to do the optimization where we drop schemas in the light interpreter, but you probably want to architect this differently (similar to per-op registrations, DON'T do any registrations in ATen, and then write out the schema registrations in a separate library.) I took this opportunity to also simplify the TypeDefault generation logic by reworking things so that we only ever call with None argument when registering. Soon, we should be able to just split these files up entirely. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D24593704 Pulled By: ezyang fbshipit-source-id: f01ea22a3999493da77b6e254d188da0ce9adf2f	2020-10-29 14:43:47 -07:00
Edward Yang	54d83296a9	Desugar missing dispatch field into singleton Math entry (#46970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46970 Now that catchall declarations are reinterpreted as registrations to dispatch key Math, we can now simplify code generation logic by directly generating to Math, and bypasing logic for catchall. This also helps avoid bugs where we incorrectly classify some kernels as Math and others as not, even though they get registered in the same way. Bill of changes: - Give Math its own unique TORCH_LIBRARY_IMPL - Make it so NativeFunction.dispatch is always non-None. Simplify downstream conditionals accordingly - When parsing NativeFunction, fill in missing dispatch with a singleton Math entry (pointing to the cpp.name!) One thing that is a little big about this change is a lot of kernels which previously didn't report as "math" now report as math. I picked a setting for these booleans that made sense to me, but I'm not sure if e.g. XLA will handle it 100% correctly. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24592391 Pulled By: ezyang fbshipit-source-id: 2e3355f19f9525698864312418df08411f30a85d	2020-10-29 14:43:44 -07:00
Edward Yang	87e86fa84c	Some miscellaneous cleanup in codegen (#46940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46940 - Remove inaccurate generated comments - Delete some dead code - Delete some unused headers - Delete unnecessary SparseTypeDerived.cpp template Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24573971 Pulled By: ezyang fbshipit-source-id: 3de05d9cd9bada4c73f01d6cfaf51f16ada66013	2020-10-29 14:43:41 -07:00
Edward Yang	dc6f723cb4	Delete Vulkan from code generator. (#46938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46938 It turns out that after https://github.com/pytorch/pytorch/pull/42194 landed we no longer actually generate any registrations into this file. That means it's completely unnecessary. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D24573518 Pulled By: ezyang fbshipit-source-id: b41ada9e394b780f037f5977596a36b896b5648c	2020-10-29 14:40:54 -07:00
Ailing Zhang	156c08b0d9	view_as_real doesn't work for all backends since it relies on strides. (#47018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47018 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D24607340 Pulled By: ailzhang fbshipit-source-id: c7fd85cd636ae9aebb22321f8f1a255af81a473f	2020-10-29 14:33:19 -07:00
Nikolay Korovaiko	71c0133e23	enable PE everywhere but mobile (#47001 ) Summary: enable PE everywhere but mobile Pull Request resolved: https://github.com/pytorch/pytorch/pull/47001 Reviewed By: eellison Differential Revision: D24596252 Pulled By: Krovatkin fbshipit-source-id: 3e3093a43287e1ff838cb03ec0e53c11c82c8dd2	2020-10-29 14:22:56 -07:00
Basil Hosmer	377a09c8e8	reland fast TypeMeta/ScalarType conversion (#45544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45544 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24006482 Pulled By: bhosmer fbshipit-source-id: 5da2401ab40bbf58da27a5d969e00bcee7562ed6	2020-10-29 14:07:39 -07:00
shubhambhokare1	1ea14e30f5	[ONNX] Enable NoneType inputs to export API (#45792 ) Summary: Enables the use of NoneType arguments to inputs tuple in the export API Pull Request resolved: https://github.com/pytorch/pytorch/pull/45792 Reviewed By: heitorschueroff Differential Revision: D24312784 Pulled By: bzinodev fbshipit-source-id: 1717e856b56062add371af7dc09cdd9c7b5646da	2020-10-29 13:56:52 -07:00
Wang Xu	c556d4550c	fix_combine_two_partition_size (#47053 ) Summary: fix combine_two_partitions in Partitioner.py to calculate new partition used memory size after combining two partitions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47053 Reviewed By: gcatron Differential Revision: D24624270 Pulled By: scottxu0730 fbshipit-source-id: a0e2a8486e012d02ea797d6ba36ab304d27cc93f	2020-10-29 13:40:44 -07:00
BowenBao	129b41226e	[ONNX] Support nd mask index in opset >= 11 (#45252 ) Summary: Fixes below pattern for opset >= 11 `return tensor[tensor > 0]` where rank of `tensor` > 1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45252 Reviewed By: VitalyFedyunin Differential Revision: D24116945 Pulled By: bzinodev fbshipit-source-id: 384026cded1eb831bb5469e31ece4fcfb6ae8f2a	2020-10-29 13:32:59 -07:00
kshitij12345	1d233d7d1f	[fix] torch.nn.functional.embedding -> padding_idx behavior (#46714 ) Summary: Reference https://github.com/pytorch/pytorch/issues/46585 Fix for second snippet in the mentioned issue. ```python predefined_weights = torch.rand(10, 3) result = torch.nn.functional.embedding(torch.LongTensor([1,2,0]), predefined_weights, padding_idx=0) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46714 Reviewed By: VitalyFedyunin Differential Revision: D24593352 Pulled By: albanD fbshipit-source-id: 655b69d9ec57891871e26feeda2aa0dcff73beba	2020-10-29 13:29:00 -07:00
Mingzhe Li	3e499e490a	Bump up NCCL to v2.8 (#46742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46742 Use NCCL v2.8 Test Plan: waitforsandcastle Reviewed By: mrshenli Differential Revision: D24488800 fbshipit-source-id: d39897da1499e63ca783a81aec1ce707606423a3	2020-10-29 13:17:58 -07:00
Rohan Varma	d850b5c98c	Fix DDP issue where parameters share same grad_accumulator (#46755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46755 As reported in https://github.com/pytorch/pytorch/issues/41324, there is a bug in DDP when `find_unused_parameters=True` and 2 or more parameters share the same gradient accumulator. In the reducer, we currently keep a mapping of grad accumulator to index and populate it with map[accumulator] = index, but this overwrites indices when the accumulator is the same. To fix this, switch the mapping values to a vector of indices to hold all such indices that share the same accumulator. ghstack-source-id: 115453567 Test Plan: Added UT Reviewed By: pritamdamania87 Differential Revision: D24497388 fbshipit-source-id: d32dfa9c5cd0b7a8df13c7873d5d28917b766640	2020-10-29 12:23:06 -07:00
Martin Yuan	680571533b	[RFC] Decouple fast pass functions (#46469 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46469 There are some "fast_pass" function calls, where the symbols in `ATen/native` are directly referenced from outside of native at linking stage. This PR is to decouple one of the fast pass from native, while keeping the same functionality. `scalar_to_tensor` is included through `ATen/ATen.h`, which could be referenced by any cpp file including this header. ghstack-source-id: 114485740 Test Plan: CI Reviewed By: ezyang Differential Revision: D24361863 fbshipit-source-id: 28d658688687b6cde286a6e6933ab33a4b3cf9ec	2020-10-29 12:18:50 -07:00
Xiong Wei	74d730c0b5	implement NumPy-like functionality column_stack, row_stack (#46313 ) Summary: Related https://github.com/pytorch/pytorch/issues/38349 This PR implements `column_stack` as the composite ops of `torch.reshape` and `torch.hstack`, and makes `row_stack` as the alias of `torch.vstack`. Todo - [x] docs - [x] alias pattern for `row_stack` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46313 Reviewed By: ngimel Differential Revision: D24585471 Pulled By: mruberry fbshipit-source-id: 62fc0ffd43d051dc3ecf386a3e9c0b89086c1d1c	2020-10-29 12:14:39 -07:00
tmanlaibaatar	fee585b5a3	Correctly mark unannotated NamedTuple field to be inferred TensorType (#46969 ) Summary: If there is no annotation given, we want to show users that the type is inferred Pull Request resolved: https://github.com/pytorch/pytorch/pull/46969 Test Plan: Added a new test case that throws an error with the expected error message Fixes https://github.com/pytorch/pytorch/issues/46326 Reviewed By: ZolotukhinM Differential Revision: D24614450 Pulled By: gmagogsfm fbshipit-source-id: dec555a53bfaa9cdefd3b21b5142f5e522847504	2020-10-29 12:07:40 -07:00
Sam Estep	1e275bc1a6	Show Flake8 errors in GitHub CI again (#46990 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46985. Can someone comment on whether the "Run flake8" step should fail if `flake8` produces errors? This PR makes sure the errors are still shown, but [the job linked from the issue](https://github.com/pytorch/pytorch/runs/1320258832) also shows that the failure of that step seems to have caused the "Add annotations" step not to run. Is this what we want, or should I instead revert back to the `--exit-zero` behavior (in this case by just removing the `-o pipefail` from this PR) that we had before https://github.com/pytorch/pytorch/issues/46740? And if the latter, then (how) should I modify this `flake8-py3` job to make sure it fails when `flake8` fails (assuming it didn't already do that?) Pull Request resolved: https://github.com/pytorch/pytorch/pull/46990 Reviewed By: VitalyFedyunin Differential Revision: D24593573 Pulled By: samestep fbshipit-source-id: 361392846de9fadda1c87d2046cf8d26861524ca	2020-10-29 11:59:30 -07:00
mfkasim91	6eaa324c9f	Implement torch.igamma (#46183 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41637 This is regularized lower incomplete gamma function, equivalent to scipy's `gammainc` and tensorflow `igamma`. cc fritzo mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/46183 Reviewed By: gchanan Differential Revision: D24479126 Pulled By: mruberry fbshipit-source-id: fdf8ea289fe4ca1b408810732192411e948fcdfe	2020-10-29 11:40:18 -07:00
Blaise Sanouillet	dd95bf65b6	[caffe2/FC DNNLOWP] Shrink Y_int32_ vector capacity when appropriate Summary: The FullyConnectedDNNLowPOp::Y_int32_ vectors consume between 1GB and 2GB on one of FB's larger applications. By adding tracing I noticed that the number of elements in each instance oscillates wildy over time. As the buffer backing a vector can only be extended in a resize operation, this means there is wasted memory space. So as a simple optimization, I added code to right-size the buffer backing the vector when the number of elements is less than half the vector capacity at that point; this doesn't affect the existing elements. There is of course a memory/cpu tradeoff here - with the change we are doing more mallocs and frees. I added tracing to measure how many times we grow or shrink per second: it's about 100 per second on average, which is not a great deal. Test Plan: Memory growth impact: over 24 hours and after the startup period, the memory consumed by this code grows from 0.85GB to 1.20GB vs 0.95GB to 1.75GB in the baseline. [ source: https://fburl.com/scuba/heap_profiles/wm47kpfe ] https://pxl.cl/1pHlJ Reviewed By: jspark1105 Differential Revision: D24592098 fbshipit-source-id: 7892b35f24e42403653a74a1a9d06cbc7ee866b9	2020-10-29 11:19:45 -07:00
Stephen Jia	38265acfbe	Add Mul op for Vulkan (#47021 ) Summary: Updates mul_scalar shader to support the new Vulkan API, and adds a new op for it using the new API. Also adds an in-place version for the op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47021 Test Plan: Unit test included. To build & run: ``` BUILD_CUSTOM_PROTOBUF=OFF \ BUILD_TEST=ON \ USE_EIGEN_FOR_BLAS=OFF \ USE_FBGEMM=OFF \ USE_MKLDNN=OFF \ USE_NNPACK=OFF \ USE_NUMPY=OFF \ USE_OBSERVERS=OFF \ USE_PYTORCH_QNNPACK=OFF \ USE_QNNPACK=OFF \ USE_VULKAN=ON \ USE_VULKAN_API=ON \ USE_VULKAN_SHADERC_RUNTIME=ON \ USE_VULKAN_WRAPPER=OFF \ MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python3 setup.py develop --cmake && ./build/bin/vulkan_api_test ``` Reviewed By: AshkanAliabadi Differential Revision: D24624729 Pulled By: SS-JIA fbshipit-source-id: 97e76e4060307a9a24311ac51dca8812e4471249	2020-10-29 11:14:25 -07:00
Nikita Shulga	2b6a720eb1	Update pybind to 2.6.0 (#46415 ) Summary: Preserve PYBIND11 (`63ce3fbde8`) configuration options in `torch._C._PYBIND11 (`63ce3fbde8`)_COMPILER_TYPE` and use them when building extensions Also, use f-strings in `torch.utils.cpp_extension` "Fixes" https://github.com/pytorch/pytorch/issues/46367 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46415 Reviewed By: VitalyFedyunin Differential Revision: D24605949 Pulled By: malfet fbshipit-source-id: 87340f2ed5308266a46ef8f0317316227dab9d4d	2020-10-29 10:53:47 -07:00
Sameer Deshmukh	2249a293b7	Fix segfault with torch.orgqr. (#46700 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41768 The fault was that a NULL `tau` would get passed to LAPACK function. This PR fixes that by checking whether the `tau` contains 0 elements at the beginning of the function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46700 Reviewed By: albanD Differential Revision: D24616427 Pulled By: mruberry fbshipit-source-id: 92e8f1489b113c0ceeca6e54dea8b810a51a63c3	2020-10-29 10:34:39 -07:00
Ivan Yashchuk	f629fbe235	Added torch.linalg.tensorsolve (#46142 ) Summary: This PR adds `torch.linalg.tensorsolve` function that matches `numpy.linalg.tensorsolve`. Ref https://github.com/pytorch/pytorch/issues/42666. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46142 Reviewed By: izdeby Differential Revision: D24539400 Pulled By: mruberry fbshipit-source-id: 6e38364fe0bc511e739036deb274d9307df119b2	2020-10-29 10:29:28 -07:00
Richard Barnes	13b4127c95	Fix implicit conversion (#46833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46833 Implicit integer conversions are causing compiler warnings. Since in this case the logs make it pretty clear that the `unsigned` types won't overflow despite 64-bit inputs, we fix the issue by making the downconversion explicit. Test Plan: Standard test rig. Reviewed By: malfet Differential Revision: D24481377 fbshipit-source-id: 4422538286d8ed2beb65065544016fd430394ff8	2020-10-29 10:22:37 -07:00
Rohan Varma	ecdbea77bc	Fix DDP documentation (#46861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46861 Noticed that in the DDP documentation: https://pytorch.org/docs/master/generated/torch.nn.parallel.DistributedDataParallel.html?highlight=distributeddataparallel there were some examples with `torch.nn.DistributedDataParallel`, fix this to read `torch.nn.parallel.DistributedDataParallel`. ghstack-source-id: 115453703 Test Plan: ci Reviewed By: pritamdamania87, SciPioneer Differential Revision: D24534486 fbshipit-source-id: 64b92dc8a55136c23313f7926251fe825a2cb7d5	2020-10-29 09:13:47 -07:00
Sebastian Messmer	262bd6437a	Show old kernel location when there are mismatches (#46850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46850 So far, in the error messages when kernel signatures mismatched, we showed the location where the second kernel came from, but we didn't show the location of the first kernel. This PR now shows the location of both. ghstack-source-id: 115468616 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D24540368 fbshipit-source-id: 3b4474062879d17f9bb7870ad3814343edc1b755	2020-10-29 08:30:49 -07:00
ashish	dfdc1dbee4	Disable softmax tests on ROCm (#46793 ) Summary: This PR disables the test_softmax and test_softmax_results in test_nn.py that were enabled in https://github.com/pytorch/pytorch/issues/46363. The softmax tests are causing failure on gfx906 machines. Disabling those until we root cause and fix them on 906. cc: jeffdaily ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/46793 Reviewed By: izdeby Differential Revision: D24539211 Pulled By: ezyang fbshipit-source-id: 633cb9dc497ad6359af85b85a711c4549d772b2a	2020-10-29 08:05:36 -07:00
Brandon Lin	4a581ba6c2	Implement LengthsToOffsets operator in Caffe2 (#46590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46590 This operator is very similar to LengthsToRanges but doesn't pack the offsets next to the original lengths. Reviewed By: yf225 Differential Revision: D24419746 fbshipit-source-id: aa8b014588bb22eced324853c545f8684086c4e4	2020-10-29 07:03:34 -07:00
Kunal Bhalla	18d273dc0e	[RFC][LocalSession] Fix workspace type Summary: I was reading/looking into how LocalSession works and realized that the workspace type being passed around was the bound function on TaskGroup instead of the actual type. This meant that all workspaces for localsession would always be global, because they'd never match the private workspace type. Test Plan: <not sure, could use some suggestions> Reviewed By: cryptopic Differential Revision: D24458428 fbshipit-source-id: 0f87874babe9c1ddff25b5363b443f9ca37e03c1	2020-10-29 04:12:17 -07:00
James Reed	d0df29ac22	[FX] Put inf and nan in globals instead of with an import string (#47035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47035 Chillee thought the `from math import inf, nan` string at the top of `.code` was annoying so here's an alternative way to do it by putting those values in `globals` before we `exec` Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D24611278 Pulled By: jamesr66a fbshipit-source-id: c25ef89e649bdd3e79fe91aea945a30fa7106961	2020-10-29 00:35:41 -07:00
Yi Wang	cab32d9cdf	[RPC Framework] Support remote device format "<workername>/<device>" (#46773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46773 Changed the constructor of RemoteModule to accept a `remote_device` arg in the following format: "<workername>/<device>" (e.g., "trainer0/cpu", "ps0/cuda:0") This arg merges the original `on` and `device` arg. Original PR issue: RemoteDevice Format #46554 ghstack-source-id: 115448051 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule Reviewed By: pritamdamania87 Differential Revision: D24482562 fbshipit-source-id: 5acfc73772576a4b674df27625bf560b8f8e67c1	2020-10-29 00:14:56 -07:00
Martin Yuan	b553c06abb	Throw an exception in the constructor of torchscript serialization to avoid double-exception (#44266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44266 If PyTorchStreamWriter is writing to a file in a non-existing path, it throws an exception. In unwinding the destructor calls writeEndOfFile() and throws again. To avoid this double-exception, a check and throw is added in the constructor. In such case the destructor will not be called and the exception can go through the unwinding. Test Plan: python test/test_jit.py TestSaveLoad.test_save_nonexit_file Reviewed By: dreiss Differential Revision: D23560770 Pulled By: iseeyuan fbshipit-source-id: 51b24403500bdab3578c7fd5e017780467a5d06a	2020-10-28 22:41:19 -07:00
Dhruv Matani	9c1a41b724	[RFC] Add OperatorHandle overload to the RecordFunction::before() method (#46401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46401 Broader context about selective/custom build available at https://fb.quip.com/2oEzAR5MKqbD and https://fb.workplace.com/groups/pytorch.mobile.team/permalink/735794523641956/ Basically, we want to be able to trace full operator names (with overload name). The current observer infra picks up the operator name from the schema, which doesn't seem to include the overload name. To ensure consistency with the existing uses and to accomodate the new use-case, this diff adds a new overload to accept an `OperatorHandle` object, and the code in `before()` eagerly resolves it to an `OperatorName` object (which can be cached in a member variable) as well as a string (view) operator-name which has the same semantics as before. Why do we pass in an `OperatorHandle` but then resolve it to an `OperatorName`? This might come across as a strange design choice (and it is), but it is grounded in practicality. It is not reasonable to cache an `OperatorHandle` object but caching an `OperatorName` object is reasonable since it holds all the data itself. An initial version of this change was trying to test this change in the `xplat` repo, which didn't work. Thanks to ilia-cher for pointing out that the dispatcher observing mechanism is disabled under a compile time flag (macro) for xplat. ghstack-source-id: 114360747 Test Plan: `buck test fbcode/caffe2/fb/test:record_function_test` succeeds. Also replicated this test in OSS in the file `test_misc.cpp` where the rest of the `RecordFunction` subsystem is being tested. Ran benchmark as reqiested by ilia-cher {P146511280} Reviewed By: ilia-cher Differential Revision: D24315241 fbshipit-source-id: 239f3081e6aa2e26c3021a7dd61f328b723b03d9	2020-10-28 22:38:26 -07:00
Xiong Wei	604e1b301a	Fix negative column numbers for the torch.eye (#46841 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46757 Error out the negative column numbers and add the corresponding tests in the `test/test_tensor_creation_ops.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46841 Reviewed By: VitalyFedyunin Differential Revision: D24593839 Pulled By: ngimel fbshipit-source-id: b8988207911453de7811cf3ceb43747192cd689d	2020-10-28 22:29:25 -07:00
Kshiteej K	5c8aad1141	[numpy] `torch.cos`, `torch.tan` : promote integer inputs to float (#46706 ) Summary: References https://github.com/pytorch/pytorch/issues/42515 cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/46706 Reviewed By: izdeby Differential Revision: D24537262 Pulled By: mruberry fbshipit-source-id: e57377a625814a3f34a765ce6bfd63a33c02a5d9	2020-10-28 22:02:52 -07:00
Nikita Shulga	42a51148c1	Use f-strings in torch.utils.cpp_extension (#47025 ) Summary: Plus two minor fixes to `torch/csrc/Module.cpp`: - Use iterator of type `Py_ssize_t` for array indexing in `THPModule_initNames` - Fix clang-tidy warning of unneeded defaultGenerator copy by capturing it as `const auto&` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47025 Reviewed By: samestep Differential Revision: D24605907 Pulled By: malfet fbshipit-source-id: c276567d320758fa8b6f4bd64ff46d2ea5d40eff	2020-10-28 21:32:33 -07:00
Jiakai Liu	9d23fd5c00	[pytorch] get rid of cpp_type_str from pybind codegen (#46977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46977 Clean up a few TODOs in the new python binding codegen. Get rid of the _simple_type() hack and the uses of cpp_type_str. Now python argument type strings and PythonArgParser unpacking methods are directly generated from the original Type model. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24589209 Pulled By: ljk53 fbshipit-source-id: b2a6c3911d58eae49c031d319c8ea6f804e2cfde	2020-10-28 21:25:55 -07:00
Jiakai Liu	79474a1928	[pytorch] simplify tensor options logic in pybinding codegen (#46976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46976 Technically, it's not semantic preserving, e.g.: emition of 'requires_grad' is no longer gated by 'has_tensor_return' - there is no guarantee that is_like_or_new_function should all have tensor return. But the output is identical so there might be some invariant - could also add assertion to fail loudly when it's broken. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24589211 Pulled By: ljk53 fbshipit-source-id: 47c7e43b080e4e67a526fde1a8a53aae99df4432	2020-10-28 21:22:59 -07:00
Wang Xu	a86b3438eb	add support for different memory sizes on size_based_partition (#46919 ) Summary: WIP: add support for different memory sizes on size_based_partition, so the size_based_partition could support different logical devices with different memory sizes. Compared to the original size_based_partition, the new one also supports partition to logical device mapping. Multiple partitions can be mapped into one device if the memory size is allowed. A test unit test_different_size_partition is also added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46919 Reviewed By: gcatron, VitalyFedyunin Differential Revision: D24603511 Pulled By: scottxu0730 fbshipit-source-id: 1ba37338ae054ad846b425fbb7e631d3b6c500b6	2020-10-28 21:11:41 -07:00
Jerry Zhang	c2a3951352	[quant][graphmode][fx] Remove inplace option for convert_fx (#46955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46955 Initially we were thinking of adding a `invalidate_quantized_float_parameters` option to free the memory of quantized floating parameters, but it turns out we will do module swap just like in eager mode for the modules that are quantized, so the old floating point module will not be referenced after quantization. therefore this feature is only needed for functionals, since most people are using quantization with modules we may not need this. we'll revisit after we find there is a need for this. Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D24579400 fbshipit-source-id: fbb0e567405dc0604a2089fc001573affdade986	2020-10-28 21:07:19 -07:00
Pritam Damania	ad260ae7fd	Disable test_joing_running_workers for TSAN. (#46966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46966 These tests had false positives in TSAN for modifying thread local variables: ``` WARNING: ThreadSanitizer: data race (pid=5364) Write of size 8 at 0x7b2c0004ff70 by thread T2: #0 free <null> (libtools_build_sanitizers_tsan-py.so+0xde6ad) #1 __GI__dl_deallocate_tls Previous write of size 1 at 0x7b2c0004ff71 by thread T3: #0 at::GradMode::set_enabled(bool) caffe2/aten/src/ATen/core/grad_mode.cpp:20 (libcaffe2_ATen-core.so+0x40e013) #1 torch::autograd::set_grad_enabled(_object, _object) caffe2/torch/csrc/autograd/init.cpp:143 (libcaffe2__C_impl_cuda.so+0x115ef0e) #2 _PyMethodDef_RawFastCallKeywords Thread T3 (tid=5385, finished) created by main thread at: #0 pthread_create <null> (libtools_build_sanitizers_tsan-py.so+0xc5a86) #1 PyThread_start_new_thread ``` ghstack-source-id: 115330433 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D24584411 fbshipit-source-id: e35f704dfcb7b161a13a4902beaf8b1e41ccd596	2020-10-28 19:28:04 -07:00
Richard Barnes	9fefb40628	Fix signed-to-unsigned conversion warning (#46834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46834 `next` is set to `-1`, presumably to avoid an "undefined variable" warning. However, Setting `next=-1` gives a signed-to-unsigned warning. In practice, the `-1` wraps around to `size_t::max`. So we set this to `size_t::max` from the get go to avoid all warnings. Test Plan: Standard pre-commit test rig. Reviewed By: xw285cornell Differential Revision: D24481068 fbshipit-source-id: 58b8a1b027a129fc4994c8593838a82b3991be22	2020-10-28 18:17:23 -07:00
Rohan Varma	c7183c9878	Fix object-based collectives API to use torch.cuda.current_device instead of (#46897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46897 These APIs implicitly assumed that gpu for rank == rank index, but that is not necessarily true. For example, the first GPU could be used for a different purpose and rank 0 could use GPU 1, rank 1 uses GPU 2, etc. Thus, we mandate that the user specify the device to use via `torch.cuda.set_device()` before making calls to this API. This expectation should be okay since we clearly document it, and we expect the user to set this for DistributedDataParallel as well. Also adds/tidies up some documentation. ghstack-source-id: 115359633 Test Plan: Modified unittests Reviewed By: divchenko Differential Revision: D24556177 fbshipit-source-id: 7e826007241eba0fde3019180066ed56faf3c0ca	2020-10-28 18:12:50 -07:00
Michael Suo	dc8176356e	Various cleanups to ir_emitter and friends (#46686 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46686 I was trying to page this code back in after a while and some things stuck out as unnecessarily confusing. 1. Improve documentation of closures and fork stuff to be more accurate to how we use them today. 2. Change `prim::LocalVariableScope` to `prim::ListComprehension`. It is only ever used for a list comprehensions, and in general the nodes emitted by `ir_emitter` should correspond to concrete operations or language features rather than semantic constraints. 3. Change the somewhat mysterious "inputs" and "attributes" argument names throughout the codebase to be the more obvious "args" and "kwargs" that they generally represent (I think "inputs" and "attributes" come from the AST naming). Test Plan: Imported from OSS Reviewed By: navahgar, jamesr66a Differential Revision: D24464197 Pulled By: suo fbshipit-source-id: 1f4b1475b58b5690a0b204e705caceff969533b4	2020-10-28 16:28:05 -07:00
Supriya Rao	fc2bd991cc	[quant] Fix flaky test test_histogram_observer_against_reference (#46957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46957 Possibly due to use of large tensor in hypothesis. Reducing the size to see if it helps Test Plan: python test/test_quantization.py TestRecordHistogramObserver.test_histogram_observer_against_reference Imported from OSS Reviewed By: jerryzh168 Differential Revision: D24580137 fbshipit-source-id: f44ab059796fba97cccb12353c13803bf49214a1	2020-10-28 15:48:49 -07:00
Nikita Vedeneev	cd26d027b3	[doc] Fix info on the shape of pivots in `torch.lu` + more info on what and how they encode permutations. (#46844 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46844 Reviewed By: VitalyFedyunin Differential Revision: D24595538 Pulled By: ezyang fbshipit-source-id: 1bb9c0310170124c3b6e33bd26ce38c22b36e926	2020-10-28 14:56:31 -07:00
Nikita Shulga	058f43fc51	Fix torch.version.debug generation (#47006 ) Summary: argparser type bool returns True for any argument passed as input Use `distutils.util.strtobool` which returns 0 for input values like "0", "no", "n", "f", "false" and 1 for "1", "yes", "y", "t", "true" Fixes https://github.com/pytorch/pytorch/issues/46973 and https://github.com/pytorch/pytorch/issues/47003 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47006 Reviewed By: samestep Differential Revision: D24598193 Pulled By: malfet fbshipit-source-id: e8f6688d6883011f301b49a0f03c452c611f7001	2020-10-28 12:48:30 -07:00
Ashkan Aliabadi	14d87ec5a3	Add Vulkan op Add. (#44017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44017 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23820826 Pulled By: AshkanAliabadi fbshipit-source-id: 47db435894696f4eb4277370d4d317d2df9e3b98	2020-10-28 12:12:56 -07:00
Ashkan Aliabadi	ec600bc391	Add Vulkan tensor copy. (#46481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46481 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D24379143 Pulled By: AshkanAliabadi fbshipit-source-id: cf492c5c632bf193c8aff0169d17bbf962e019e1	2020-10-28 12:09:53 -07:00
James Reed	bf08814b73	[FX] Kill functional transforms name (#47004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47004 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D24597581 Pulled By: jamesr66a fbshipit-source-id: 9213d58f4a53ea55e97e6ca0572fdcf5e271bdc3	2020-10-28 11:59:28 -07:00
David Reiss	23bce17baa	Add inputsSize to Python IR, like outputsSize (#46779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46779 Test Plan: Used it in some notebooks. Reviewed By: suo Differential Revision: D24574005 Pulled By: dreiss fbshipit-source-id: 78ba7a2bdb859fef5633212b73c7a3eb2cfbc380	2020-10-28 11:35:39 -07:00
Richard Barnes	179d2b288c	Fix interval midpoint calculation in vulkan (#46839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46839 Interval midpoint calculations can overflow (integers). This fixes such an instance. Test Plan: Standard test rig. Reviewed By: drdarshan Differential Revision: D24392545 fbshipit-source-id: 84c81802165bb8084e2d54c9f3755f39143a5b00	2020-10-28 11:22:42 -07:00
Nikita Shulga	98b3da8b13	Revert D24452660: [pytorch][PR] Add CUDA 11.1 CI Test Plan: revert-hammer Differential Revision: D24452660 (`1479ed91be`) Original commit changeset: 3480a2533214 fbshipit-source-id: 1e720c5d6fe1a377f6decd3ecc4f412c53fb293c	2020-10-28 10:53:53 -07:00
Alban Desmaison	61ee0242c0	Fix backcompat in master following revert (#46984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46984 Reviewed By: mrshenli Differential Revision: D24592404 Pulled By: albanD fbshipit-source-id: d317d934b650f1ac0f91e51ef5cbc14e886aa3fe	2020-10-28 10:32:14 -07:00
James Reed	069232a574	[FX] Fix corner case in name sanitization (#46958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46958 Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D24580474 Pulled By: jamesr66a fbshipit-source-id: 2f8d252998c72e1e79d6a5f7766c2d51a271cc83	2020-10-28 10:22:33 -07:00
Kimish Patel	cbf90dafe1	Fix CPUCaching allocator guard bug (#46922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46922 Earlier bug wrongly captures the previous value to be saved. Test Plan: cpu_caching_allocator_test Reviewed By: dreiss Differential Revision: D24566514 fbshipit-source-id: 734a4c1f810bbec16fe007f31fffa360898955ac	2020-10-28 10:06:22 -07:00
Richard Barnes	c3fc17b48e	Fix bit math (#46837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46837 Formerly `static_cast<StreamId>(bits)` and `static_cast<DeviceIndex>(bits)` were and-ed against `ull` types resulting in an integer promotion which later raised a warning in downcasting passes to `Stream` and `Device`. Moving the `&` operation inside the cast results in two `uint64_t` being operated on and then cast to the correct type, eliminating the warning. Test Plan: Standard pre-commit test rig. Reviewed By: malfet Differential Revision: D24481292 fbshipit-source-id: a8bcbde631054c26ca8c98fbed275254dd359dd0	2020-10-28 09:55:52 -07:00
Sheng Qin	c9222b7471	Implement clip_ranges operator for PyTorch Test Plan: unit test for correctness ``` buck test caffe2/torch/fb/sparsenn:test -- test_clip_ranges Parsing buck files: finished in 1.6 sec Creating action graph: finished in 18.9 sec Building: finished in 15.0 sec (100%) 9442/9442 jobs, 1 updated Total time: 35.6 sec More details at https://www.internalfb.com/intern/buck/build/66fb17de-859e-4d01-89bf-5c5de2950693 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 80f5e0c2-7db2-48a4-b148-25dd34651682 Trace available for this run at /tmp/tpx-20201026-123217.050766/trace.log Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/4503599665041422 ✓ ListingSuccess: caffe2/torch/fb/sparsenn:test - main (14.912) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (14.098) Summary Pass: 1 ListingSuccess: 1 Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4503599665041422 ``` new benchmark perf test ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypetorch.int32_cpu # Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: torch.int32, device: cpu Forward Execution Time (us) : 155.765 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypetorch.int32_cpu # Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: torch.int32, device: cpu Forward Execution Time (us) : 156.248 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypetorch.int32_cpu # Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: torch.int32, device: cpu Forward Execution Time (us) : 156.634 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypetorch.int32_cpu # Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: torch.int32, device: cpu Forward Execution Time (us) : 155.408 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypetorch.int32_cpu # Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: torch.int32, device: cpu Forward Execution Time (us) : 165.168 ``` Compare with the old implementation, there are around 300us gain ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypetorch.int32_cpu # Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: torch.int32, device: cpu Forward Execution Time (us) : 443.012 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypetorch.int32_cpu # Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: torch.int32, device: cpu Forward Execution Time (us) : 446.480 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypetorch.int32_cpu # Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: torch.int32, device: cpu Forward Execution Time (us) : 444.064 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypetorch.int32_cpu # Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: torch.int32, device: cpu Forward Execution Time (us) : 445.511 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypetorch.int32_cpu # Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: torch.int32, device: cpu Forward Execution Time (us) : 450.468 ``` Reviewed By: MarcioPorto Differential Revision: D24546110 fbshipit-source-id: e6c9b38e911f177f97961ede5bf375107f240363	2020-10-28 09:46:37 -07:00
Sheng Qin	c6858fd71a	Set up benchmarks for ClipRanges operator for Caffe2 and PyTorch Summary: As title, adding the benchmark tests for ClipRanges operators. Test Plan: benchmark test for Caffe2 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking Caffe2: clip_ranges WARNING: Logging before InitGoogleLogging() is written to STDERR W1026 12:30:33.938997 2658759 init.h:137] Caffe2 GlobalInit should be run before any other API calls. # Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypeint32 # Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: int32 Forward Execution Time (us) : 5.805 # Benchmarking Caffe2: clip_ranges # Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypeint32 # Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: int32 Forward Execution Time (us) : 5.913 # Benchmarking Caffe2: clip_ranges # Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypeint32 # Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: int32 Forward Execution Time (us) : 5.941 # Benchmarking Caffe2: clip_ranges # Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypeint32 # Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: int32 Forward Execution Time (us) : 5.868 # Benchmarking Caffe2: clip_ranges # Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypeint32 # Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: int32 Forward Execution Time (us) : 6.408 ``` benchmark test for PyTorch ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypetorch.int32_cpu # Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: torch.int32, device: cpu Forward Execution Time (us) : 443.012 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypetorch.int32_cpu # Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: torch.int32, device: cpu Forward Execution Time (us) : 446.480 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypetorch.int32_cpu # Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: torch.int32, device: cpu Forward Execution Time (us) : 444.064 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypetorch.int32_cpu # Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: torch.int32, device: cpu Forward Execution Time (us) : 445.511 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypetorch.int32_cpu # Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: torch.int32, device: cpu Forward Execution Time (us) : 450.468 ``` Reviewed By: MarcioPorto Differential Revision: D24500468 fbshipit-source-id: a582090a3982005af272cb10cdd257b2b2e787c4	2020-10-28 09:42:10 -07:00
Kurt Mohler	b75b961934	Fix `requires_grad` arg for `new_full`, `new_empty`, `new_zeros` (#46486 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/36455 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46486 Reviewed By: gchanan Differential Revision: D24497034 Pulled By: ezyang fbshipit-source-id: 769a7f00f9a8f7cb77273a1193173a837ae7e32f	2020-10-28 09:34:53 -07:00
Richard Barnes	353e7f940f	Ensure kernel launches are checked (#46474 ) Summary: Caffe2 and Torch currently does not have a consistent mechanism for determining if a kernel has launched successfully. The result is difficult-to-detect or silent errors. This diff provides functionality to fix that. Subsequent diffs on the stack fix the identified issues. Kernel launch errors may arise if invalid launch parameters (number of blocks, number of threads, shared memory, or stream id) are specified incorrectly for the hardware or for other reasons. Interestingly, unless these launch errors are specifically checked for CUDA will silently fail and return garbage answers which can affect downstream computation. Therefore, catching launch errors is important. Launches are currently checked by placing ``` AT_CUDA_CHECK(cudaGetLastError()); ``` somewhere below the kernel launch. This is bad for two reasons. 1. The check may be performed at a site distant to the kernel launch, making debugging difficult. 2. The separation of the launch from the check means that it is difficult for humans and static analyzers to determine whether the check has taken place. This diff defines a macro: ``` #define TORCH_CUDA_KERNEL_LAUNCH_CHECK() AT_CUDA_CHECK(cudaGetLastError()) ``` which clearly indicates the check. This diff also introduces a new test which analyzes code to identify kernel launches and determines whether the line immediately following the launch contains `TORCH_CUDA_KERNEL_LAUNCH_CHECK();`. A search of the Caffe2 codebase identifies 104 instances of `AT_CUDA_CHECK(cudaGetLastError());` while the foregoing test identifies 1,467 launches which are not paired with a check. Visual inspection indicates that few of these are false positives, highlighting the need for some sort of static analysis system. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46474 Test Plan: The new test is run with: ``` buck test //caffe2/test:kernel_launch_checks -- --print-passing-details ``` And should be launched automatically with the other land tests. (TODO: Is it?) The test is currently set up only to provide warnings but can later be adjusted to require checks. Otherwise, I rely on the existing test frameworks to ensure that changes resulting from reorganizing existing launch checks don't cause regressions. Reviewed By: ngimel Differential Revision: D24309971 Pulled By: r-barnes fbshipit-source-id: 0dc97984a408138ad06ff2bca86ad17ef2fdf0b6	2020-10-28 09:27:48 -07:00
Nikita Shulga	50c9581de1	AT_ERROR if mmap allocation has failed (#46934 ) Summary: All other system call failures in `THMapAllocator` constructor are considered failures, but this one, for some reason was not Fixes https://github.com/pytorch/pytorch/issues/46651 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46934 Reviewed By: walterddr, seemethere Differential Revision: D24572657 Pulled By: malfet fbshipit-source-id: 0a2b6ce78d5484190536bc4949fc4697d6387ab8	2020-10-28 09:06:48 -07:00
frgfm	c886c7f6dd	fix: Fixed typing of bool in _ConvNd (#46828 ) Summary: Hello there 👋 I do believe there is some typo in the typing of the `bool` argument of `_ConvNd`constructor. The typing of the attribute is correct, but the constructor argument, while being the same way, is not the value that will be assigned to `self.bias`. This PR simply corrects that. Any feedback is welcome! Pull Request resolved: https://github.com/pytorch/pytorch/pull/46828 Reviewed By: izdeby Differential Revision: D24550435 Pulled By: ezyang fbshipit-source-id: ab10f1a5b29a912cb23fc321a51e78b04a8391e3	2020-10-28 08:08:53 -07:00
Jerry Zhang	cd8ed93287	[quant][graphmode][fx][api] Remove `inplace` option from prepare_fx (#46954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46954 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D24579401 fbshipit-source-id: adce623ce819fa220f7bb08d1ff3beaa69850621	2020-10-28 08:00:12 -07:00
Alban Desmaison	46b252b83a	Revert D24262885: [pytorch][PR] Added foreach_zero_ API Test Plan: revert-hammer Differential Revision: D24262885 (`8e37dcb1f3`) Original commit changeset: 144c283dd009 fbshipit-source-id: 451b202e23bc1fcb11b20d26c11d9a1329789d22	2020-10-28 06:48:59 -07:00
Bram Wasti	ddbdbce623	[jit] Prevent caching of `graph` attribute. (#46960 ) Summary: `graph` is automatically cached even when the underlying graph changes -- this PR hardcodes a fix to that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46960 Reviewed By: mrshenli Differential Revision: D24582185 Pulled By: bwasti fbshipit-source-id: 16aeeba251830886c92751dd5c9bda8699d62803	2020-10-27 23:56:52 -07:00
Jerry Zhang	d92bf921db	[quant][graphmode][fx] Remove `inplace` option for fuse_fx (#46953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46953 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D24579402 fbshipit-source-id: 5e0b8abf682287ab3c7dd54c2fc2cf309295e147	2020-10-27 22:34:11 -07:00
Yi Wang	e299393fd5	[Gradient Compression] Provide 2 default C++ comm hooks (#46701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46701 Provide 2 built-in implementations of C++ comm hook. Original PR issue: C++ DDP Communication Hook https://github.com/pytorch/pytorch/issues/46348 ghstack-source-id: 115319061 Test Plan: waitforbuildbot Reviewed By: pritamdamania87 Differential Revision: D24382504 fbshipit-source-id: 1c1ef56620f91ab37a1707c5589f1d0eb4455bb3	2020-10-27 21:43:15 -07:00
Yi Wang	e077a2a238	[Gradient Compression] Add CppCommHook subclass for supporting the C++ API of communication hook. (#46566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46566 Only provides an interface. Some built-in implementations will be provided in a follow-up commit. Original PR issue: C++ DDP Communication Hook https://github.com/pytorch/pytorch/issues/46348 ghstack-source-id: 115319038 Test Plan: waitforbuildbot Reviewed By: pritamdamania87 Differential Revision: D24379460 fbshipit-source-id: 8382dc4185c7c01d0ac5b3498e1bead785bccec5	2020-10-27 21:43:12 -07:00
Jerry Zhang	998b9b9e68	[quant][graphmode][fx] custom_module support static/dynamic/weight_only quant (#46786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46786 Previously we only support static quant, this PR added support for other types of quantization. Note qat is actually orthogonal to these quant types, this is referring to the convert step where we convert the observed module to a quantized module. for qat, user will provide a CustomModule -> FakeQuantizedCustomModule in prepare_custom_config_dict and FakeQuantizedCustomModule -> static/dynamic/weight_only quantized CustomModule in convert_custom_config_dict. Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D24514701 fbshipit-source-id: 2918be422dd76093d67a6df560aaaf949b7f338c	2020-10-27 21:41:33 -07:00
Jerry Zhang	5a8198eb3c	[quant][graphmode][fx][fix] scalar as first input for add/mul (#46751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46751 Currently we assume the first input for add/mul is node (Tensor), but it might not be the case Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_quantized_add python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_relu python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_relu Imported from OSS Reviewed By: raghuramank100 Differential Revision: D24494456 fbshipit-source-id: ef5e23ba60eb22a57771791f4934306b25c27c01	2020-10-27 19:59:28 -07:00
Yang Wang	810c68fb1d	[OpBench] fix jit tracing with quantized op/tensor by enabling `_compare_tensors_internal` to compare quantized tensors (#46772 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46772 When running `buck run caffe2/benchmarks/operator_benchmark/pt:qactivation_test -- --use_jit`, I encountered the following error P146518683. The error was traced down to the fact that `torch.allclose` does not work with quantized tensors (the error was triggered by this particular multiplication https://fburl.com/diffusion/8vw647o6 since native mul can not work with a float scalar and a quantized tensor.) Minimum example to reproduce: ```(Pdb) input = torch.ones(5) (Pdb) aa = torch.quantize_per_tensor(input, scale=1.0, zero_point=0, dtype=torch.quint8) (Pdb) bb = torch.quantize_per_tensor(input, scale=1.0, zero_point=0, dtype=torch.quint8) (Pdb) torch.allclose(aa, bb) Comparison exception: promoteTypes with quantized numbers is not handled yet; figure out what the correct rules should be, offending types: QUInt8 Float ``` Here the proposed fix is to compare quantized tensors strictly within `_compare_tensors_internal`. The other two possible fixes are: 1. convert quantized tensors to float tensors first before sending them to `torch.allclose` 2. change `torch.allclose` to handle quantized tensor. Test Plan: buck run caffe2/benchmarks/operator_benchmark/pt:qactivation_test -- --use_jit Reviewed By: kimishpatel Differential Revision: D24506723 fbshipit-source-id: 6426ea2a88854b4fb89abef0edd2b49921283796	2020-10-27 18:53:13 -07:00
iurii zdebskyi	8e37dcb1f3	Added foreach_zero_ API (#46215 ) Summary: Adding Added foreach_zero_(TensorList) API Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/46215 Reviewed By: zhangguanheng66 Differential Revision: D24262885 Pulled By: izdeby fbshipit-source-id: 144c283dd00924083096d6d92eb9085cbd6097d3	2020-10-27 18:03:34 -07:00
James Reed	67c1dc65a3	[FX] Fix handling of `inf` and `nan` literals (#46894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46894 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D24555136 Pulled By: jamesr66a fbshipit-source-id: 22765a4d9d373711e9e6d7b1d3898080ecbcf2f5	2020-10-27 17:55:35 -07:00
kiyosora	53839ac9d7	Fix internal assert for torch.heaviside with cuda tensor and cpu scalar tensor (#46831 ) Summary: Fixed https://github.com/pytorch/pytorch/issues/46681 ``` >>> x = torch.randn(10, device='cuda') >>> y = torch.tensor(1.) >>> torch.heaviside(x, y) tensor([0., 1., 0., 1., 1., 0., 1., 1., 1., 0.], device='cuda:0') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46831 Reviewed By: navahgar Differential Revision: D24567953 Pulled By: izdeby fbshipit-source-id: e5fcf4355b27ce0bdf434963d01863d3b24d0bea	2020-10-27 16:47:33 -07:00
Dmytro Dzhulgakov	115bbf9945	[caffe2] Disable running full grad check in tests by default Summary: We've been seeing a lot of Hypothesis timeouts and from profiling a few of the failing tests one of the contributing factors is really slow grad checker. In short, it launches the whole op for each of the input elements so the overall complexity is O(numel^2) at least. This applies a very unscientific hack to just run grad check on the first and last few elements. It's not ideal, but it's better than flaky tests. One can still explicitly opt in with the env var. Reviewed By: malfet Differential Revision: D23336220 fbshipit-source-id: f04d8d43c6aa1590c2f3e72fc7ccc6aa674e49d2	2020-10-27 16:10:03 -07:00
Vasiliy Kuznetsov	8066e89f64	quant: fix bug with copy.deepcopy of FX prepared quantization models (#46895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46895 Bug: models after the FX graph mode quant prepare step lost information, such as the extra attributes defined in `Quantizer.save_state`, if the user performed `copy.deepcopy` on them. The information was lost because `GraphModule` does not copy attributes which are not present on `nn.Module` by default. Fix: define a custom `__deepcopy__` method on observed models and whitelist the attributes we care about. This is needed because users sometimes run `copy.deepcopy` on their models during non-quantization related preparations, and we should make sure that quantization related state survives these calls. Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_deepcopy python test/test_quantization.py TestQuantizeFx.test_standalone_module ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D24556035 fbshipit-source-id: f7a6b28b6d2225fa6189016f967f175f6733b124	2020-10-27 16:05:35 -07:00
Xiang Gao	1479ed91be	Add CUDA 11.1 CI (#46616 ) Summary: libtorch XImportant now runs on CUDA 11.1, Pull Request resolved: https://github.com/pytorch/pytorch/pull/46616 Reviewed By: gchanan Differential Revision: D24452660 Pulled By: malfet fbshipit-source-id: 3480a2533214f2d986444ff912f619503a75940d	2020-10-27 15:58:13 -07:00
Nikita Shulga	c20c840c1b	Install sccache from source (#46672 ) Summary: Build `sccache` from https://github.com/pytorch/sccache Also, update sccache wrappers not to call sccache from sccache Pull Request resolved: https://github.com/pytorch/pytorch/pull/46672 Reviewed By: janeyx99 Differential Revision: D24455767 Pulled By: malfet fbshipit-source-id: b475a65e6ad03b9a192ab29a6d9a14280cd76a92	2020-10-27 15:23:23 -07:00
Jane Xu	64d4b24a12	Adding link to gcov depending on GCC_VERSION (#46928 ) Summary: We already link g++ and gcc to the correct version, but we do not do that for gcov, which is needed for coverage. This PR adds a link for gcov as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46928 Reviewed By: malfet Differential Revision: D24569240 Pulled By: janeyx99 fbshipit-source-id: 4be012bff21ddae0c81339665b58324777b9304f	2020-10-27 15:09:35 -07:00
Charles Coulombe	dc53eefd25	Conditional requirement for py3.6 only (#46932 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46930 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46932 Reviewed By: mrshenli Differential Revision: D24574196 Pulled By: seemethere fbshipit-source-id: 11daf8abe226670277f1b5682fd9890d23576271	2020-10-27 14:59:55 -07:00
Yuchen Huang	79a1d2bd78	[iOS] Bump up the cocoapods version (#46935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46935 Bump up the cocoapods version ghstack-source-id: 115283786 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: xta0 Differential Revision: D24572715 fbshipit-source-id: 41ffcd43512dc7d4e94af887fb5dfeab703d7602	2020-10-27 14:51:33 -07:00
Guilherme Leobas	717e6d8081	add type annotations to comm.py (#46736 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46735 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46736 Reviewed By: albanD Differential Revision: D24565554 Pulled By: mrshenli fbshipit-source-id: 4e40e4232ebf256af228f9c742ea4d28c626c616	2020-10-27 14:27:06 -07:00
Jeff Daily	151f31ba27	remove event not ready assertion from TestCuda.test_copy_non_blocking (#46857 ) Summary: It is incorrect to assume that a newly recorded event will immediately query as False. This test is flaky on ROCm due to this incorrect assumption. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46857 Reviewed By: albanD Differential Revision: D24565581 Pulled By: mrshenli fbshipit-source-id: 0e9ba02cf52554957b29dbeaa5093696dc914b67	2020-10-27 14:21:40 -07:00
Nikita Shulga	8c39f198b4	Fix typo in setup.py (#46921 ) Summary: Also, be a bit future-proof in support version list Pull Request resolved: https://github.com/pytorch/pytorch/pull/46921 Reviewed By: seemethere Differential Revision: D24568733 Pulled By: malfet fbshipit-source-id: ae34f8da1ed39b80dc34db0b06e4ef142104a3ff	2020-10-27 13:14:41 -07:00
kshitij12345	21e60643c0	[numpy] `torch.log{2,10}` : promote integer inputs to float (#46810 ) Summary: References https://github.com/pytorch/pytorch/issues/42515 cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/46810 Reviewed By: izdeby Differential Revision: D24536187 Pulled By: mruberry fbshipit-source-id: b7dd7678d4e996f3dea0245c65055654e02be459	2020-10-27 13:07:44 -07:00
Heitor Schueroff	bbe5bfaa4f	Add GradMode::enabled check to max_pool1d (#46767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46767 ## Benchmark ``` ------------------------------------------------------------------------------------------ benchmark: 2 tests ------------------------------------------------------------------------------------------ Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_grad_disabled 390.0155 (1.0) 533.4131 (1.0) 392.6161 (1.0) 8.5603 (1.0) 390.7457 (1.0) 0.3912 (1.0) 98;319 2,547.0171 (1.0) 2416 1 test_grad_enabled 3,116.7269 (7.99) 4,073.2883 (7.64) 3,178.0827 (8.09) 122.7487 (14.34) 3,142.2675 (8.04) 33.0228 (84.42) 10;22 314.6551 (0.12) 225 1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` snippet (using pytest benchmark module) ``` import torch torch.set_num_threads(1) x = torch.randn(1000, 10, 36, requires_grad=True) def test_grad_enabled(benchmark): benchmark(torch.max_pool1d, x, 2) def test_grad_disabled(benchmark): torch.set_grad_enabled(False) benchmark(torch.max_pool1d, x, 2) ``` Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24565126 Pulled By: heitorschueroff fbshipit-source-id: 91a93be9921f597db21e9dc277f6e36eae85b37a	2020-10-27 10:23:42 -07:00
Nikita Shulga	daf2a6a29d	Increase no-output-timeout for OSX builds (#46891 ) Summary: Because conda-build native library relocation scripts can take a while. From see https://app.circleci.com/pipelines/github/pytorch/pytorch/227245/workflows/e287613d-5e48-4bca-b3d8-b75df2be9f65/jobs/8235584 : ``` Oct 15 08:53:31 Oct 15 09:49:27 INFO: ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46891 Reviewed By: seemethere Differential Revision: D24553645 Pulled By: malfet fbshipit-source-id: 62b2251f174aec7ff573a8c4f8cb7a920fa3eaca	2020-10-27 08:04:36 -07:00
Shijun Kong	d5cd781cd3	Update dper3 to use torch.nan_to_num and nan_to_num_ (#46873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46873 OSS: Add op benchmark for torch.nan_to_num and torch.nan_to_num_ Test Plan: OSS: `buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:nan_to_num_test` Reviewed By: qizzzh, houseroad Differential Revision: D24521835 fbshipit-source-id: 1fd50a99e5329ffec2d470525ce6976d39424958	2020-10-27 06:41:48 -07:00
Wang Xu	8640905088	add sparse_nn_partition (#46390 ) Summary: WIP: This PR adds sparse_nn_partition into Partitioner class. It includes logical device assignment for all dag nodes. The basic idea is to do size_based_partition separately for embedding nodes and non-embedding nodes. A test unit is also added in test_fx_experimental.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46390 Reviewed By: gcatron Differential Revision: D24555415 Pulled By: scottxu0730 fbshipit-source-id: 8772af946d5226883759a02a1c827cfdfce66097	2020-10-27 00:11:58 -07:00
Raghavan Raman	4b6e307191	Replace flatten tensors with flatten loops. (#46737 ) Summary: This is the second attempt at replacing flatten tensors with flatten loops in `TensorExprKernel::generateStmt`. The first attempt (https://github.com/pytorch/pytorch/pull/46539) resulted in a build failure due to an exception that gets thrown during inline. The reason for the build failure was because there was an inline step, which was supposed to happen on the unflattened tensors. This was necessary earlier because for every flattened tensor there was an unflattened tensor which had to be inlined. That is no longer necessary since we do not have 2 tensors (flattened and unflattened) now. Removed this inline. Checked python and cpp tests on CPU as well as CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46737 Reviewed By: anjali411, izdeby Differential Revision: D24534529 Pulled By: navahgar fbshipit-source-id: 8b131a6be076fe94ed369550d9f54d3879fdfefd	2020-10-27 00:01:20 -07:00
Jerry Zhang	6b50ccc41c	[quant][graphmode][fx] Support sigmoid/hardsigmoid/tanh in qat (#46738 ) (#46871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46871 Test Plan: Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D24547180 fbshipit-source-id: d2eb9aa74c6e5436204376b1a2ebcc6188d3562f	2020-10-26 23:52:07 -07:00
Jack Montgomery	60eded6c0f	Add single element tuple output from to_backend/to_glow (#5029 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/5029 Support single element tuples in to_backend Test Plan: new unit test for to_glow Reviewed By: andrewmillspaugh Differential Revision: D24539869 fbshipit-source-id: fb385a7448167b2b948e70f6af081bcf78f338dc	2020-10-26 22:29:04 -07:00
Hong Xu	bcbb6baccf	Add a warning message that torch.sign would not support complex numbers (#43280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43280 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24538769 Pulled By: anjali411 fbshipit-source-id: ab2d5283501e4c1d7d401d508e32f685add7ebb1	2020-10-26 21:13:12 -07:00
Linbin Yu	37da6d26ff	add fburl link to error message (#46795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46795 add fburl link to the error message of missing ops so user can debug themselves. Test Plan: fburl.com/missing_ops Reviewed By: iseeyuan Differential Revision: D24519992 fbshipit-source-id: d2d16db7e9d9c84ce2c4600532eb253c30b31971	2020-10-26 21:05:49 -07:00
Jeffrey Wan	9858b012ec	Fix TripletMarginWithDistanceLoss example code (#46853 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45210 Removes `requires_grad=True` from all the `randint` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46853 Reviewed By: izdeby Differential Revision: D24549483 Pulled By: soulitzer fbshipit-source-id: c03576571ed0b2dbb281870f29a28eb6f6209c65	2020-10-26 21:02:54 -07:00
Wanchao Liang	4a35280ec2	[c10] fix weak_intrusive_ptr lock() (#46007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46007 When owner released the object, target will become null and illegal to access refcount_ again. This PR fixes this and return null in that case. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24374846 Pulled By: wanchaol fbshipit-source-id: 741074f59c0904a4d60b7bde956cad2d0925be4e	2020-10-26 20:54:12 -07:00
Oleg Khabinov	b3e64c86e0	Remove loop_test mode (#46618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46618 Using D19631971 (`b4b1b100bd`) and https://github.com/pytorch/pytorch/pull/32935/files as a reference. Test Plan: ``` $ buck run glow/fb/test:sparsenn_test -- --gtest_filter='SparseNNTest.vanillaC2' --onnxifi_debug_mode --nocaffe2_predictor_use_memonger ``` Generated dot file https://www.internalfb.com/intern/graphviz/?paste=P146216905 Reviewed By: yinghai Differential Revision: D24427800 fbshipit-source-id: 7d1d8768352a52af104e0a75ce982c1eb861aa73	2020-10-26 20:38:41 -07:00
Ashkan Aliabadi	af27da93de	Add Vulkan Tensor factory. (#44016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44016 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23820823 Pulled By: AshkanAliabadi fbshipit-source-id: d007650f255fd79b4a2f4bba0bf8ea00f9a2e6cf	2020-10-26 18:38:13 -07:00
Ashkan Aliabadi	c9bf03a6c4	Add Vulkan Tensor. (#44015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44015 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23820827 Pulled By: AshkanAliabadi fbshipit-source-id: 7691da56a5d0073d078b901d8951437757ab1085	2020-10-26 18:35:16 -07:00
Bert Maher	2397c8d1f7	[pytorch] Improve/fix heuristics for using mkldnn vs native conv (#46675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46675 We've found a few heuristics for using/not using mkldnn that seem to generally improve performance on 2d and 3d conv. - 1x1 convolutions are basically batch matmuls, and mkldnn's implementation appears to usually be slower than using the native conv (which lowers to aten::mm, which in turn calls mkl gemm). - 3d conv was often not using mkldnn even when it's beneficial, because the heuristic was checking the kernel depth rather than height/width. mkldnn seems to be faster for (1, 7, 7) and (3, 7, 7) kernel sizes, which are allowed by the new heuristic. Test Plan: Bento notebooks showing before/after: before: https://www.internalfb.com/intern/anp/view/?id=38089 after: https://www.internalfb.com/intern/anp/view/?id=380893 Also, I've run a conv fuzzer, and it generally supports these heuristics. I'm not sure how to best share the data since there's a lot of it (I tried about 50k parameter combinations). For the 1x1 case, about 70% were faster with "native". I played with constructing a decision tree (using scikit-learn) and found that switching back to MKL for batch size > 16 might be slightly better still, but I'm not sure it's worth complicating the heuristic. Results for some popular shapes in tabular format: ``` [------------------------- conv2d_1x1 ------------------------] \| base \| diff 1 threads: ---------------------------------------------------- [1, 128, 56, 56] [256, 128, 1, 1] \| 3665.3 \| 2838.4 [1, 512, 14, 14] [1024, 512, 1, 1] \| 3174.7 \| 3164.0 [1, 64, 56, 56] [256, 64, 1, 1] \| 2249.1 \| 1468.8 [1, 1024, 14, 14] [512, 1024, 1, 1] \| 3158.2 \| 3147.7 [1, 1024, 7, 7] [2048, 1024, 1, 1] \| 8191.8 \| 3973.9 [1, 2048, 7, 7] [1024, 2048, 1, 1] \| 7901.2 \| 3861.6 [1, 256, 28, 28] [512, 256, 1, 1] \| 3103.9 \| 2775.9 2 threads: ---------------------------------------------------- [1, 128, 56, 56] [256, 128, 1, 1] \| 1973.7 \| 1475.8 [1, 512, 14, 14] [1024, 512, 1, 1] \| 2265.0 \| 1603.0 [1, 64, 56, 56] [256, 64, 1, 1] \| 1445.4 \| 789.8 [1, 1024, 14, 14] [512, 1024, 1, 1] \| 2298.8 \| 1620.0 [1, 1024, 7, 7] [2048, 1024, 1, 1] \| 6350.7 \| 1995.0 [1, 2048, 7, 7] [1024, 2048, 1, 1] \| 6471.2 \| 1903.7 [1, 256, 28, 28] [512, 256, 1, 1] \| 1932.3 \| 1524.2 4 threads: ---------------------------------------------------- [1, 128, 56, 56] [256, 128, 1, 1] \| 1198.8 \| 785.6 [1, 512, 14, 14] [1024, 512, 1, 1] \| 1305.0 \| 901.6 [1, 64, 56, 56] [256, 64, 1, 1] \| 791.0 \| 472.9 [1, 1024, 14, 14] [512, 1024, 1, 1] \| 1311.2 \| 908.5 [1, 1024, 7, 7] [2048, 1024, 1, 1] \| 3958.6 \| 997.7 [1, 2048, 7, 7] [1024, 2048, 1, 1] \| 4099.6 \| 1023.1 [1, 256, 28, 28] [512, 256, 1, 1] \| 1120.3 \| 740.8 Times are in microseconds (us). [--------------------- conv2d_7x7 ---------------------] \| base \| diff 1 threads: --------------------------------------------- [25, 3, 48, 320] [64, 3, 7, 7] \| 209.3 \| 229.3 [1, 3, 384, 288] [64, 3, 7, 7] \| 68.9 \| 72.3 2 threads: --------------------------------------------- [25, 3, 48, 320] [64, 3, 7, 7] \| 116.0 \| 117.6 [1, 3, 384, 288] [64, 3, 7, 7] \| 40.4 \| 38.7 4 threads: --------------------------------------------- [25, 3, 48, 320] [64, 3, 7, 7] \| 64.2 \| 66.5 [1, 3, 384, 288] [64, 3, 7, 7] \| 21.4 \| 21.9 Times are in milliseconds (ms). [---------------------------- conv3d ---------------------------] \| base \| diff 1 threads: ------------------------------------------------------ [1, 3, 16, 224, 224] [32, 3, 1, 7, 7] \| 602.8 \| 296.2 [1, 3, 4, 112, 112] [64, 3, 3, 7, 7] \| 52.5 \| 26.5 [1, 256, 8, 14, 14] [256, 256, 3, 3, 3] \| 50.0 \| 50.3 2 threads: ------------------------------------------------------ [1, 3, 16, 224, 224] [32, 3, 1, 7, 7] \| 351.0 \| 168.1 [1, 3, 4, 112, 112] [64, 3, 3, 7, 7] \| 38.5 \| 14.9 [1, 256, 8, 14, 14] [256, 256, 3, 3, 3] \| 24.8 \| 26.2 4 threads: ------------------------------------------------------ [1, 3, 16, 224, 224] [32, 3, 1, 7, 7] \| 212.6 \| 96.0 [1, 3, 4, 112, 112] [64, 3, 3, 7, 7] \| 21.5 \| 7.6 [1, 256, 8, 14, 14] [256, 256, 3, 3, 3] \| 12.7 \| 13.3 Times are in milliseconds (ms). ``` Reviewed By: jansel Differential Revision: D24452071 fbshipit-source-id: 12687971be531831530dc29bf2fc079a917d0c8d	2020-10-26 18:27:12 -07:00
Jeff Daily	a602811da7	[ROCm] fix bug in miopen findAlgorithm. (#46852 ) Summary: findAlgorithm should return if and only if a suitable algorithm is found. The default algorithm is not guaranteed to have been cached. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46852 Reviewed By: izdeby Differential Revision: D24546748 Pulled By: bhosmer fbshipit-source-id: 171137b377193e0825769b61d42a05016f02c34c	2020-10-26 18:20:04 -07:00
Richard Barnes	a4adc1b6d7	Fix unused variable warning (#46838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46838 `SCALAR_TYPE` may be unused in some contexts where the macro is used. We use the standard `(void)var` trick to suppress the compiler warning in these instances. Test Plan: Standard pre-commit tests. Reviewed By: jerryzh168 Differential Revision: D24481142 fbshipit-source-id: 4fcde669cc279b8863443d49c51edaee69f4d7bd	2020-10-26 18:14:27 -07:00
Yi Wang	a6cd294c9b	[Gradient Compression] Refactor CommHookInterface and PythonCommHook. (#46512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46512 1. Merge 1-line PythonCommHook constructor into the header for simplicity. 2. Move the implementation of PythonCommHook destructor from the header file to cpp file. 3. Rename processFuture method as parseHookResult for readability. 4. Simplify some comments. Original PR issue: C++ DDP Communication Hook https://github.com/pytorch/pytorch/issues/46348 ghstack-source-id: 115161086 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_sparse_gradients buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_with_then_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_future_passing_gpu_gloo Reviewed By: jiayisuse Differential Revision: D24374282 fbshipit-source-id: c8dbdd764bca5b3fa247708f1218cb5ff3e321bb	2020-10-26 18:07:58 -07:00
Pritam Damania	adafd3d4b2	Support RRef.backward() for local RRefs. (#46568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46568 This PR adds support for an RRef.backward() API. This would be useful in applications like pipeline parallelism as described here: https://github.com/pytorch/pytorch/issues/44827 This PR only adds support for local RRefs, remote RRef support will be added in a follow up PR. ghstack-source-id: 115100729 Test Plan: 1) unit tests. 2) waitforbuildbot Reviewed By: mrshenli Differential Revision: D24406311 fbshipit-source-id: fb0b4e185d9721bf57f4dea9847e0aaa66b3e513	2020-10-26 17:31:17 -07:00
Xiang Gao	7731370e71	CUDA BFloat16 gelu, hardswish, hardsigmoid (#44997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44997 Reviewed By: izdeby Differential Revision: D24547748 Pulled By: ngimel fbshipit-source-id: 34639dfe6ca41c3f59fd2af861e5e3b1bb86757a	2020-10-26 16:01:22 -07:00
Xiang Gao	99cf3b1ce4	CUDA BFloat16 signal windows (#45155 ) Summary: Looks like this op is never tested for the support of different dtypes? Pull Request resolved: https://github.com/pytorch/pytorch/pull/45155 Reviewed By: zou3519 Differential Revision: D24438839 Pulled By: ngimel fbshipit-source-id: 103ff609e11811a0705d04520c2b97c456b623ef	2020-10-26 15:53:30 -07:00
anjali411	13a5be571b	Enable complex backward for torch.take() and tensor.fill_() (#46860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46860 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D24544601 Pulled By: anjali411 fbshipit-source-id: 4e29d48da30da3630cb558ccee464d89780b1ab7	2020-10-26 15:46:08 -07:00
Richard Zou	02dc52f25b	vmap fallback: gracefully error out when vmap over dim of size 0 (#46846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46846 Previously, this would crash with a floating point error. If the user vmaps over a dimension of size 0, ideally we would return a tensor with a batch dim of size 0 and the correct output shape. However, this isn't possible without a shape-checking API. This PR changes the vmap fallback to error out gracefully if it sees vmap occuring over a dimension of size 0. If we want to support vmapping over dimension of size 0 for a specific op, then the guidance is to implement a batching rule for that op that handles 0-sized dims. Test Plan: - new test Reviewed By: ezyang Differential Revision: D24539315 Pulled By: zou3519 fbshipit-source-id: a19c049b46512d77c084cfee145720de8971f658	2020-10-26 15:32:22 -07:00
Omkar Salpekar	5e2f17d77a	Add NCCL_ASYNC_ERROR_HANDLING to docs (#46856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46856 Add reference to NCCL_ASYNC_ERROR_HANDLING in the pytorch docs, similar to how NCCL_BLOCKING_WAIT is curently described. ghstack-source-id: 115186877 Test Plan: CI, verifying docs change Reviewed By: jiayisuse Differential Revision: D24541822 fbshipit-source-id: a0b3e843bc6392d2787a4bb270118f2dfda5f4ec	2020-10-26 14:41:32 -07:00
Zafar	57bf0b596a	[docs] Changing the wording on quantization versioning and support (#46858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46858 Test Plan: Imported from OSS Reviewed By: dskhudia Differential Revision: D24542598 Pulled By: z-a-f fbshipit-source-id: 0eb7a2dcc8f8ad52954f2555cf41d5f7524cbc2c	2020-10-26 14:30:50 -07:00
Oscar Sandoval	58ed60c259	Added context manager enabling all futures returned by rpc_async and custom build rpc functions to be automatically waited on (#41807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41807 Test Plan: Make sure ci tests pass, including newly written test Reviewed By: mrshenli Differential Revision: D22640839 Pulled By: osandoval-fb fbshipit-source-id: 3ff98d8e8c6e6d08575e307f05b5e159442d7216	2020-10-26 12:53:35 -07:00
Alban Desmaison	25db74bf5e	Revert D24486972: [quant][graphmode][fx] Support sigmoid/hardsigmoid/tanh in qat Test Plan: revert-hammer Differential Revision: D24486972 (`e927b62e73`) Original commit changeset: c9f139bfdd54 fbshipit-source-id: 2a75f5ec93d55a62b40d1cdd49adcf65436058f7	2020-10-26 12:47:05 -07:00
Shen Li	0c74b43a3f	Update TensorPipe submodule (#46842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46842 Reviewed By: mrshenli Differential Revision: D24539899 Pulled By: lw fbshipit-source-id: 8731165c6ecd3c97433b4dfa469989f5b9019e36	2020-10-26 12:40:09 -07:00
shmsong	56a3831bc6	[NVFuser]Benchmark minor update (#46778 ) Summary: This is a tiny PR for two minor fixes: 1. Added `torch._C._jit_set_texpr_fuser_enabled(False)` to enable shape inference on nv fuser runs. 2. Renamed dynamic benchmark module names to avoid multiple matching. i.e. `simple_element` with `dynamic_simple_element`. I guess it'd be much easier if the pattern matching was based on `startswith`. Would be happy to update that if agreed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46778 Reviewed By: zhangguanheng66 Differential Revision: D24516911 Pulled By: bertmaher fbshipit-source-id: 839f9a3e058f9d7aca17b2e6eb8b558e0e48e8f4	2020-10-26 12:22:36 -07:00
Jerry Zhang	e927b62e73	[quant][graphmode][fx] Support sigmoid/hardsigmoid/tanh in qat (#46738 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46738 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D24486972 fbshipit-source-id: c9f139bfdd54973da1a93a45e32937595dbe67fc	2020-10-26 12:04:42 -07:00
Huan Gui	b5662ba0f0	[uhm][0/n] add cuda Mod Op (#46732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46732 as titled Test Plan: unittest buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:mod_op_test Reviewed By: xianjiec Differential Revision: D24368100 fbshipit-source-id: 1232d22a67ac268986043911d548fa9d657470ec	2020-10-26 11:07:51 -07:00
Jinwoo Park	5a2b537b54	Add error messages and workaround for RET failure of containers with a torch class type (#46543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46543 Add error messages and workaround for RET failure of containers with a torch class type. - Error case condition 1) ins.op == RET 2) input_type == TypeKind::ListType or TypeKind::DictType 3) Any(input_type's element type) == TypeKind::ClassType ghstack-source-id: 114618426 Test Plan: buck test mode/dev caffe2/test:mobile -- 'test' Summary Pass: 13 ListingSuccess: 1 Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7318349417617713 Reviewed By: iseeyuan Differential Revision: D24388483 fbshipit-source-id: 7d30f6684a999054d0163e691422797cb818bb6a	2020-10-26 10:46:07 -07:00
Jane Xu	3e606da0af	Upgrading lcov install to install v1.15 to be compatible with GCC9 (#46847 ) Summary: According to [this issue](https://github.com/linux-test-project/lcov/issues/58), LCOV 1.14 and below are not compatible with GCC9 when gathering coverage. Instead of installing `lcov` with `apt-get`, which installs version 1.13, this PR would install v1.15 from source onto the ubuntu Docker images. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46847 Reviewed By: seemethere Differential Revision: D24540444 Pulled By: janeyx99 fbshipit-source-id: 0ac2a37241d94cdd8fea2fded7984c495a64cedc	2020-10-26 10:11:46 -07:00
Nikita Shulga	83d358da7c	Fix LAPACK functionality detection from static OpenBLAS (#46710 ) Summary: BLAS `sgemm_` only depends on pthreads, but LAPACK `cheev_` also depends on libm Pull Request resolved: https://github.com/pytorch/pytorch/pull/46710 Reviewed By: walterddr Differential Revision: D24476082 Pulled By: malfet fbshipit-source-id: e0b91116f18bbcdabb1f99c2ec9d98283df4393f	2020-10-26 08:34:28 -07:00
Kurt Mohler	b61671ccd2	Enable dtype arg for torch.linalg.norm with order 'fro' and 'nuc' (#46637 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46255 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46637 Reviewed By: gchanan Differential Revision: D24459097 Pulled By: mruberry fbshipit-source-id: 7f207a23de902c27f8313ee80f452687a97e8f6f	2020-10-26 02:59:00 -07:00
anjali411	d94bd998ec	Update backward formulas (Re #44444 ) (#46275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46275 Re #44444 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D24285785 Pulled By: anjali411 fbshipit-source-id: c60ecd4fe4f144132085f2c91d3b950e92b2a491	2020-10-25 19:40:59 -07:00
Richard Barnes	edbc84aa4a	Fix hash type (#46769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46769 Value to be hashed is `int64_t`, but hash is `int`. This results in a downconversion which throws out bits which would otherwise be hashed. Test Plan: Standard pre-commit test rig Reviewed By: malfet Differential Revision: D24480962 fbshipit-source-id: 497b1d8bc3f6d2119a6ba16e6ae92911bd34b916	2020-10-24 16:14:41 -07:00
Richard Barnes	fa8cd06a5c	Perform explicit cast (#46771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46771 `std::ceil` returns a `float` which is cast to `size_t` by the `max` operation. We convert to `int64_t` to suppress the warning while matching the type of `newDims[0]`. Since the types match, we don't need an explicit template type for `max`. This allows `max` to take `int64_t` as its values, matching the type of `newCapacity`. Test Plan: Standard pre-commit test rig. Reviewed By: malfet Differential Revision: D24481684 fbshipit-source-id: aed7cabc1e9d395b2662cb633f3ace19c279ab4c	2020-10-24 16:10:02 -07:00
Richard Barnes	9cbdd84e15	Fix compiler warning Summary: `sizeof` returns an unsigned, so comparison against `-1` is a warning. This fixes that. Test Plan: Standard pre-commit test rig. Reviewed By: bhosmer Differential Revision: D24506390 fbshipit-source-id: cdb2887d319c6730a90b9f8d74a248527dd6c2ab	2020-10-24 14:23:43 -07:00
Yanan Cao	f9b9430152	Support doc_string for TorchBind custom classes (#46576 ) Summary: With this PR, users can optionally provide a "doc_string" to describe a class or its method. doc_string for TorchBind classes and methods are stored as `doc_string` properties in `Function` and `ScriptClass`. These `dos_string` properties are then exposed in Python layer via PyBind for doc generation. Fixes https://github.com/pytorch/pytorch/issues/46047 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46576 Reviewed By: wanchaol Differential Revision: D24440636 Pulled By: gmagogsfm fbshipit-source-id: bfa9b270a6c2d8bc769a88fad6be939cc6310412	2020-10-24 12:51:35 -07:00
Richard Barnes	7d4c1a5ab0	Fix type warning (#46770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46770 Test Plan: Standard pre-commit test rig. Reviewed By: malfet Differential Revision: D24480898 fbshipit-source-id: a5031f1e20f4b1ea5954e7cabd54502300d5a916	2020-10-24 12:37:24 -07:00
Jerry Zhang	37dbc6117f	[quant][eagermode] Add additional_fuser_method_mapping to config (#46355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46355 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24319562 fbshipit-source-id: be9800723c0b3e36f26e73c25c0c6ae1d4344f45	2020-10-24 02:18:04 -07:00
Yanan Cao	13b7855f33	Support hashing of various data types by implementing generic hashing for IValues (#46441 ) Summary: It used to be that TorchScript only supported hashing of `int`, `float` and `str`. This PR adds hashing for many other types including `Tuple`, `bool`, `device` by implementing generic hashing on IValue. * Tensor hashing follows eager behavior, which is identity-based (hash according to pointer address rather than tensor content). Fixes https://github.com/pytorch/pytorch/issues/44038 This is based on suo's https://github.com/pytorch/pytorch/issues/44047, with some cleaning, more tests and fixing BC check issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46441 Reviewed By: robieta Differential Revision: D24440713 Pulled By: gmagogsfm fbshipit-source-id: 851f413f99b6f65084b551383ad21e558e7cabeb	2020-10-23 21:26:01 -07:00
Guilherme Leobas	789e935304	Annotate torch.nn.cpp (#46490 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46489 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46490 Reviewed By: zhangguanheng66 Differential Revision: D24509519 Pulled By: ezyang fbshipit-source-id: edffd32ab2ac17ae4bbd44826b71f5cb9f1da1c5	2020-10-23 17:40:32 -07:00
Bert Maher	c4892c8efe	[pytorch][tensorexpr] Promote integer arguments to sin/cos/tan to float (#46776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46776 Following numpy and (now) eager mode Fixes #46458 Test Plan: test_jit_fuser_te Reviewed By: navahgar Differential Revision: D24509884 fbshipit-source-id: c063030fc609ba4aefcd9abd25b50f082fef1548	2020-10-23 17:32:54 -07:00
Jerry Zhang	343260a1cc	[quant][graphmode][fx] Add support for additional_{fusion/quant}_pattern (#46346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46346 Allow user to provide additional fusion/quant patterns for fx graph mode Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24317437 fbshipit-source-id: 719927cce50c74dffa4f848bd5c98995c944a26a	2020-10-23 15:03:42 -07:00
Richard Zou	74d81080a0	Use new_zeros in evenly_distribute_backward (#46674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46674 Summary ------- This adds batched gradient support (i.e., vmap through the gradient formulas) for Tensor.max(), Tensor.min(), Tensor.median() that have evenly_distribute_backward as their backward formula. Previously, the plan was to register incompatible gradient formulas as backward operators (see #44052). However, it turns out that we can just use `new_zeros` to get around some incompatible gradient formulas (see next section for discussion). Context: the vmap+inplace problem --------------------------------- A lot of backwards functions are incompatible with BatchedTensor due to using in-place operations. Sometimes we can allow the in-place operations, but other times we can't. For example, consider select_backward: ``` Tensor select_backward(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t index) { auto grad_input = at::zeros(input_sizes, grad.options()); grad_input.select(dim, index).copy_(grad); return grad_input; } ``` and consider the following code: ``` x = torch.randn(5, requires_grad=True) def select_grad(v): torch.autograd.grad(x[0], x, v) vs = torch.randn(B0) batched_grads = vmap(select_grad)(vs) ``` For the batched gradient use case, grad is a BatchedTensor. The physical version of grad has size (B0,). However, select_backward creates a grad_input of shape (5), and tries to copy grad to a slice of it. Up until now, the proposal to handle this has been to register these backward formulas as operators so that vmap doesn’t actually see the `copy_` calls (see #44052). However, it turns out we can actually just use `new_zeros` to construct a new Tensor that has the same "batched-ness" as grad: ``` auto grad_input = grad.new_zeros(input_sizes); grad_input.select(dim, index).copy_(grad); ``` We should use this for simple backward functions. For more complicated backward functions where this solution doesn't work, we should register those as operators. Alternatives ------------ Option 2: Register `evenly_distribute_backward` as an operator and have the vmap fallback run it in a loop. - This requires more LOC changes. - Furthermore, we'd have to write an efficient batching rule for `evenly_distribute_backward` in the future. - If we use `new_zeros` instead, we don't need to write an efficient batching rule for `evenly_distribute_backward` as long as the constituents of `evenly_distributed_backward` have efficient batching rules. Option 3: Have factory functions perform differently if they are called inside vmap. - For example, `at::zeros(3, 5)` could return a Tensor of shape `(B0, B1, 3, 5)` if we are vmapping over two dimensions with size B0 and B1. This requires maintaining some global and/or thread-local state about the size of the dims being vmapped over which can be tricky. And more... Future ------ - I will undo some of the work I’ve done in the past to move backward functions to being operators (#44052, #44408). The simpler backward functions (like select backward) can just use Tensor.new_zeros. I apologize for the thrashing. - Include a NOTE about the vmap+inplace problem somewhere in the codebase. I don't have a good idea of where to put it at the moment. Test Plan --------- - New tests Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D24456781 Pulled By: zou3519 fbshipit-source-id: 9c6c8ee2cb1a4e25afd779bdf0bdf5ab76b9bc20	2020-10-23 14:29:40 -07:00
Richard Zou	aa828bf084	Support undefined grads in vmap fallback (#46671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46671 Previously, the vmap fallback would choke whenever it saw an undefined tensor. For each sample in a batch, the fallback runs an operator and then stacks together outputs to get the actual output. Undefined tensors can occur as outputs while computing batched gradients with vmap. This PR updates the vmap fallback to handle undefined tensors which can appear in backward formulas: - if for each sample in a batch the output was undefined, then the vmap fallback returns an undefined tensor - if for each sample in a batch the output is defined, then the vmap fallback stacks together the defined tensors - if for some samples in a batch the output is defined/undefined, then we error out. Test Plan: - new tests Reviewed By: ezyang Differential Revision: D24454909 Pulled By: zou3519 fbshipit-source-id: d225382fd17881f23c9833323b68834cfef351f3	2020-10-23 14:26:50 -07:00
Jane Xu	85954164a4	fix minor bug, message variable does not exist (#46777 ) Summary: When run with `--continue-through-error`, the script ends with the following error: ``` Traceback (most recent call last): File "run_test.py", line 745, in <module> main() File "run_test.py", line 741, in main print_to_stderr(message) NameError: name 'message' is not defined make: *** [macos-compat] Error 1 ``` This PR just changes `message` to `err`, which is the intended variable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46777 Reviewed By: seemethere Differential Revision: D24510460 Pulled By: janeyx99 fbshipit-source-id: be1124b6fc72b178d62acc168d0cbc74962de52b	2020-10-23 14:20:23 -07:00
peter	89f368bef8	Enable XNNPACK on Windows & Update XNNPACK (#45830 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44283. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45830 Reviewed By: zhangguanheng66 Differential Revision: D24504302 Pulled By: ezyang fbshipit-source-id: ab28088a4fbb553a27ed7c8da87ec7b40c73c2f1	2020-10-23 14:17:45 -07:00
Iurii Zdebskyi	999f7ed3a1	Refactored ForeachFunctors.cuh (#46660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46660 Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D24453345 Pulled By: izdeby fbshipit-source-id: 307839d40a358d9dda3eee6f62990b38b8274642	2020-10-23 13:58:45 -07:00
Rong Rong	822efb7275	add workflow ID to report tags (#46725 ) Summary: Currently circle doesn't report workflow ID as one of the dimensions This causes statistics for some failed/rerun CircleCI job to report overlapping results. Fixing it by adding workflow ID tag Pull Request resolved: https://github.com/pytorch/pytorch/pull/46725 Reviewed By: seemethere, zhangguanheng66 Differential Revision: D24505006 Pulled By: walterddr fbshipit-source-id: cc65bb8ebc0787e443a42584dfb0d2224e824e7d	2020-10-23 12:10:54 -07:00
Rohan Varma	ccb79f3ac7	Add option to log subprocess output to files in DDP launcher. (#33193 ) Summary: Closes https://github.com/pytorch/pytorch/issues/7134. This request is to add an option to log the subprocess output (each subprocess is training a network with DDP) to a file instead of the default stdout. The reason for this is that if we have N processes all writing to stdout, it'll be hard to decipher the output, and it would be cleaner to log these to separate files. To support this, we add an optional argument `--logdir` set the subprocess stdout to be the a file of the format "node_rank_{}_local_rank_{}" in the logging directory. With this enabled, none of the training processes output to the parent process stdout, and instead write to the aformentioned file. If a user accidently passes in something that's not a directory, we fallback to ignoring this argument. Tested by taking a training script at https://gist.github.com/rohan-varma/2ff1d6051440d2c18e96fe57904b55d9 and running `python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port="29500" --logdir test_logdir train.py`. This results in a directory `test_logdir` with files "node_0_local_rank_0" and "node_0_local_rank_1" being created with the training process stdout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33193 Reviewed By: gchanan Differential Revision: D24496013 Pulled By: rohan-varma fbshipit-source-id: 1d3264cba242290d43db736073e841bbb5cb9e68	2020-10-23 11:22:57 -07:00
Yunfan Zhong	e519fcd1aa	Remap net name inside arg.n for AsyncIf operator Summary: Similar to If operator, AsyncIf also contains nets in args. It needs the same handling. Test Plan: New unit test test_control_op_remap `buck test caffe2/caffe2/python:core_test` Also it worked end to end in prototype of dist bulk eval workflow f226680903 Reviewed By: yyetim Differential Revision: D24451775 fbshipit-source-id: 50594e2ab9bb457329ed8da7b035f7409461b5f6	2020-10-23 10:41:06 -07:00
Iurii Zdebskyi	3ea26b1424	[WIP] Push rocm to slow path for foreach APIs (#46733 ) Summary: Move ROCM to a slow path for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46733 Reviewed By: ngimel Differential Revision: D24485012 Pulled By: izdeby fbshipit-source-id: f0f4227cc594d8a87d44008cd5e27ebe100b6b22	2020-10-23 10:33:41 -07:00
Nikita Vedeneev	c31ced4246	make `torch.lu` differentiable. (#46284 ) Summary: As per title. Limitations: only for batches of squared full-rank matrices. CC albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/46284 Reviewed By: zou3519 Differential Revision: D24448266 Pulled By: albanD fbshipit-source-id: d98215166268553a648af6bdec5a32ad601b7814	2020-10-23 10:13:46 -07:00
BowenBao	52f8d320b3	[ONNX] Update ONNX doc for indexing export (#46349 ) Summary: Adding example code for supported cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46349 Reviewed By: gchanan Differential Revision: D24459449 Pulled By: malfet fbshipit-source-id: 65021a96cd12225615aa40af5d916e0cda56d107	2020-10-23 09:49:43 -07:00
Luca Wehrstedt	f230245c06	Revert D24422354: [pytorch][PR] fix-process-group-counter Test Plan: revert-hammer Differential Revision: D24422354 (`caed29a069`) Original commit changeset: 32493cc2001d fbshipit-source-id: 9b633f738ea555f45031056689f780dde8eda859	2020-10-23 08:04:37 -07:00
Xiao Wang	e0fd590ec9	Fix incorrect usage of CUDACachingAllocator (#46605 ) Summary: We need an object to hold the ownership of allocated memory in the scope, instead of directly using the raw pointer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46605 Reviewed By: zou3519 Differential Revision: D24453548 Pulled By: ezyang fbshipit-source-id: d29e5a69afa6c0d9e519849910e04524667d0a26	2020-10-23 07:36:39 -07:00
Ansley Ussery	6c5f634657	Fix grammar and spelling errors (#46713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46713 Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D24477771 Pulled By: ansley fbshipit-source-id: bc39b63ab2158a5233e48b89bfaa97a4cfb1f7a1	2020-10-23 01:31:17 -07:00
Ailing Zhang	4fd2cce9fa	Check support_as_strided before using empty_strided. (#46746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46746 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24492468 Pulled By: ailzhang fbshipit-source-id: 25f869e64cf8628e41661edca9823e95170ae1ed	2020-10-22 21:56:12 -07:00
Leon Gao	129279a374	[FBGEMM][Transposed Conv] add transposed conv support for fbgemm backend for 1d, 2d, 3d (#46607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46607 wire the fbgemm backend of transposed conv for 1d, 2d, 3d cases in qconv, although there is no official frontend API 3d cases. ghstack-source-id: 114896586 Test Plan: https://www.internalfb.com/intern/testinfra/testconsole/testrun/6755399464206048/ Reviewed By: z-a-f Differential Revision: D24323802 fbshipit-source-id: 1c7d2fbb703018fd15f5c85edcfa6c9deac9662e	2020-10-22 20:55:52 -07:00
Richard Barnes	8558c0e612	Eliminate narrowing conversion (#46730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46730 A narrowing conversion on `last_idx` raises a compiler warning. This fixes that. Test Plan: Standard pre-commit test rig. Reviewed By: EscapeZero Differential Revision: D24481497 fbshipit-source-id: f3e913b586738add59c422c3cf65035d87fc9e34	2020-10-22 20:08:59 -07:00
Christian Hudon	511f89eaa9	Add nvtx.range() context manager (#42925 ) Summary: Small quality-of-life improvement to NVTX Python bindings, that we're using internally and that would be useful to other folks using NVTX annotations via PyTorch. (And my first potential PyTorch contribution.) Instead of needing to be careful with try/finally to make sure all your range_push'es are range_pop'ed: ``` nvtx.range_push("Some event") try: # Code here... finally: nvtx.range_pop() ``` you can simply do: ``` with nvtx.range("Some event"): # Code here... ``` or even use it as a decorator: ``` class MyModel(nn.Module): # Other methods here... nvtx.range("MyModel.forward()") def forward(self, *input): # Forward pass code here... ``` A couple small open questions: 1. I also added the ability to call `msg.format()` inside `range()`, with the intention that, if there is nothing listening to NVTX events, we should skip the string formatting, to lower the overhead in that case. If you like that idea, I could add the actual "skip string formatting if nobody is listening to events" parts. We can also just leave it as is. Or I can remove that if you folks don't like it. (In the first two cases, should we add that to `range_push()` and `mark()` too?) Just let me know which one it is, and I'll update the pull request. 2. I don't think there are many places for bugs to hide in that function, but I can certainly add a quick test, if you folks want. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42925 Reviewed By: gchanan Differential Revision: D24476977 Pulled By: ezyang fbshipit-source-id: 874882818d958e167e624052e42d52fae3c4abf1	2020-10-22 19:46:16 -07:00
ashish	88e94da580	Enable softmax and tiny norm FP16 tests on ROCm (#46363 ) Summary: This pull request enables the following tests on ROCm: * TestCuda.test_tiny_half_norm_ * TestNNDeviceTypeCUDA.test_softmax_cuda_float16 * TestNNDeviceTypeCUDA.test_softmax_cuda_float32 * TestNNDeviceTypeCUDA.test_softmax_results_cuda_float16 * TestNNDeviceTypeCUDA.test_softmax_results_cuda_float32 The earlier failures, because of which the tests were skipped, were because of a precision issue for FP16 compute on MI25 hardware with ROCm 3.7 and older. The fix was delivered in the compiler in ROCm 3.8. The pull request fixes https://github.com/pytorch/pytorch/issues/37493 cc: jeffdaily ezyang malfet mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/46363 Reviewed By: heitorschueroff Differential Revision: D24325639 Pulled By: ezyang fbshipit-source-id: a7dbb238cf38c04b6592baad40b4d71725a358c9	2020-10-22 19:40:00 -07:00
Shijun Kong	6ae0a7c919	Add ReplaceNaN benchmark as baseline (#46685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46685 as title Test Plan: caffe2 ``` ./buck-out/gen/caffe2/benchmarks/operator_benchmark/c2/replace_nan_test.par # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking Caffe2: replace_nan WARNING: Logging before InitGoogleLogging() is written to STDERR W1022 10:09:48.508246 1887813 init.h:137] Caffe2 GlobalInit should be run before any other API calls. # Name: replace_nan_M16_N16_dtypefloat # Input: M: 16, N: 16, dtype: float Forward Execution Time (us) : 30.742 # Benchmarking Caffe2: replace_nan # Name: replace_nan_M16_N16_dtypedouble # Input: M: 16, N: 16, dtype: double Forward Execution Time (us) : 29.135 # Benchmarking Caffe2: replace_nan # Name: replace_nan_M64_N64_dtypefloat # Input: M: 64, N: 64, dtype: float Forward Execution Time (us) : 94.059 # Benchmarking Caffe2: replace_nan # Name: replace_nan_M64_N64_dtypedouble # Input: M: 64, N: 64, dtype: double Forward Execution Time (us) : 93.569 ``` Reviewed By: qizzzh, houseroad Differential Revision: D24448483 fbshipit-source-id: 51574ca0eca6dba5828dfdc754193dba5a62954f	2020-10-22 19:12:14 -07:00
albanD	27e2ea4cea	Make add_relu an internal function (#46676 ) Summary: Cleanup for 1.7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46676 Reviewed By: gchanan Differential Revision: D24458565 Pulled By: albanD fbshipit-source-id: b1e4b4630233d3f1a4bac20e3077411d1ae17f7b	2020-10-22 18:08:15 -07:00
lixinyu	870a5a0d6d	Enable DataParallel to run zero input Module (#46565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46565 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24405275 Pulled By: glaringlee fbshipit-source-id: a8baaf4cf227f7f21fc3b080a446f92f0effe18e	2020-10-22 18:04:33 -07:00
Supriya Rao	842494af77	[quant][fx] EmbeddingBag quantization support (#46678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46678 Test Plan: python test/test_quantization.py TestQuantzeFxOps.test_qembedding_bag_module Imported from OSS Reviewed By: jerryzh168 Differential Revision: D24463306 fbshipit-source-id: 175e77f4450344fbf63409be35338b0c29afd585	2020-10-22 18:04:31 -07:00
Supriya Rao	e34c825b77	[quant][fx] Embedding quantization support (#46677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46677 Add support for weight only embedding quantization Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_qembedding_module Imported from OSS Reviewed By: jerryzh168 Differential Revision: D24463305 fbshipit-source-id: 2dba49d8a77cf237a8e6da2efdd83b1ebdc432d6	2020-10-22 17:59:52 -07:00
Sam Estep	fe6fb7753e	Clean up use of Flake8 in GitHub CI (#46740 ) Summary: [Previously](https://github.com/pytorch/pytorch/runs/1293724033) Flake8 was run using `flake8-mypy`, which didn't change the actual lint output, and undesirably resulted in this noisy message being printed many times: ``` /opt/hostedtoolcache/Python/3.9.0/x64/lib/python3.9/site-packages is in the MYPYPATH. Please remove it. See https://mypy.readthedocs.io/en/latest/running_mypy.html#how-mypy-handles-imports for more info ``` Since `mypy` is already run in other test scripts, this PR simply removes it from the Flake8 setup. This PR also removes the `--exit-zero` flag from Flake8, because currently Flake8 gives no error output, so it would be valuable to know if it ever does happen to return error output. (This doesn't strike me as a perfect solution since now it's a bit harder to reproduce the Flake8 behavior when running locally with `flake8-mypy` installed, but it's the easiest way to fix it in CI specifically.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/46740 Reviewed By: janeyx99 Differential Revision: D24487904 Pulled By: samestep fbshipit-source-id: d534fdeb18e32d3bc61406462c1cf955080a688f	2020-10-22 17:08:16 -07:00
Tao Xu	bf1ea14fbc	[CI][IOS] Add a arm64 ios job for Metal (#46646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46646 Test Plan: Imported from OSS Reviewed By: seemethere, linbinyu Differential Revision: D24459597 Pulled By: xta0 fbshipit-source-id: e93a3a26897614c66768804c71658928cd26ede7	2020-10-22 16:54:46 -07:00
Tao Xu	344abd56f9	[CI][IOS] Rename the IOS_VERSION (#46645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46645 ### Summary The IOS_VERSION should be renamed to XCODE_VERSION ### Test - CircleCI Test Plan: Imported from OSS Reviewed By: seemethere, linbinyu Differential Revision: D24459598 Pulled By: xta0 fbshipit-source-id: 9dcba973cc57aa44f8fd4151daf5d89c8da61c67	2020-10-22 16:49:22 -07:00
Jeff Daily	ce5bca5502	ProcessGroupNCCL::alltoall_base needs to call recordStream (#46603 ) Summary: For similar reasons as documented in the `[Sync Streams]` note. For a current example, `ProcessGroupNCCL::allgather` must also call `recordStream` and does so already. The output tensor is created on the default stream (by the application). NCCL/RCCL internally uses another stream (i.e., ncclStream). If we do not record the output tensor on the ncclStream, there is a chance that the output tensor might be deallocated while NCCL/RCCL is using it. The application is not aware of the ncclStream since it's internal to ProcessGroupNCCL. So, the application cannot record the output tensor on the ncclStream. Patch originally developed by sarunyap. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46603 Reviewed By: srinivas212 Differential Revision: D24458530 fbshipit-source-id: b02e74d1c3a176ea1b9bbdd7dc671b221fcadaef	2020-10-22 15:53:19 -07:00
Jerry Zhang	bd90379df5	[quant][graphmode][fx] Add support for additional_fuse_method_mapping (#46345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46345 Allow user to add more fusion mappings Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24317439 fbshipit-source-id: 3b144bbc305e41efbdf3e9fb25dbbeaad9e86c6a	2020-10-22 15:15:31 -07:00
Hao Lu	d6519d4e9f	[pt][static_runtime] Add option enable_out_variant (#46690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46690 - Add option enable_out_variant to Static Runtime - Add gflags --pt_cleanup_activations and --pt_enable_out_variant to the benchmark script Reviewed By: yinghai, houseroad Differential Revision: D24438107 fbshipit-source-id: c1185c0fee93edc0118542b2faa8bc4ffdd19075	2020-10-22 15:00:23 -07:00
Xiao Wang	f326f6a8a0	Remove dilation restriction on cuDNN ConvTranspose2d (#46290 ) Summary: Close https://github.com/pytorch/pytorch/issues/31690 I have verified the functionality of ConvTranspose2d (with this PR) on roughly 32,000 random shapes on V100, A100, using cuDNN 8.0.4 and CUDA 11.1. The 32,000 shapes contain 4x8,000 of (fp16, fp32) x (nchw, nhwc) each. The random shapes are sampled from ```jsonc { "batch_size": {"low": 1, "high": 8}, "in_channels": {"low": 16, "high": 128}, "out_channels": {"low": 16, "high": 128}, "height": {"low": 16, "high": 224}, "stride": {"set": [[1, 1], [2, 2]]}, "padding": {"set": [[0, 0]]}, "output_padding": {"set": [[0, 0], [1, 1], [0, 1], [1, 0]]}, "kernel_size": {"set": [[3, 3], [1, 1], [1, 3], [3, 1], [2, 2]]}, "dilation": {"set": [[1, 1]]}, "deterministic": {"set": [true, false]}, "benchmark": {"set": [true, false]}, "allow_tf32": {"set": [true, false]}, "groups": {"set": [1, IN_CHANNELS]} } ``` - Input `width` is the same as `height`. - `groups` can be either 1, or the same as `in_channels` (grouped convolution). When `groups` is 1, `out_channels` is random; when `groups` is the same as `in_channels`, `out_channels` is also the same as `in_channels` All of the checked shapes can be found in csv files here https://github.com/xwang233/code-snippet/tree/master/convtranspose2d-dilation/functionality-check-cudnn8.0.4. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46290 Reviewed By: mruberry Differential Revision: D24422091 Pulled By: ngimel fbshipit-source-id: 9f0120f2995ae1575c0502f1b2742390d7937b24	2020-10-22 13:42:03 -07:00
Hao Lu	53dff784e2	[caffe2] Fix inplace ops in onnx::SsaRewrite (#46134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46134 Make sure in-place ops stay in-place after SsaRewrite. This seems to break the premise of SSA, but it's necessary to ensure correctness. Note here we only preserve the inplace ops that enforce inplace. Ops like `Relu` don't enforce inplace, they allow inplace. (Note: this ignores all push blocking failures!) Reviewed By: yinghai Differential Revision: D24234957 fbshipit-source-id: 274bd3ad6227fce6a98e615aad7e57cd2696aec3	2020-10-22 13:26:31 -07:00
Hao Lu	51bf7bed84	[caffe2] Allow memonger to optimize nets with inplace(enforced) ops (#46560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46560 Follow-up for D24236604 (`16c52d918b`). For nets that pass the schema check, memonger actually makes sure to preserve the inplaceness of operators if they are already inplace. So we can safely enable it for correct input nets. (Note: this ignores all push blocking failures!) Differential Revision: D24402482 fbshipit-source-id: a7e95cb0e3eb87adeac79b9b69eef207957b0bd5	2020-10-22 13:23:33 -07:00
Jerry Zhang	23fad9111e	[quant][graphmode][fx] Add additional_qat_module_mapping (#46344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46344 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24317438 fbshipit-source-id: f9e73aeb4c7a107c8df0bae8319464e7d5d7275b	2020-10-22 13:11:26 -07:00
Sameer Deshmukh	982fa07ccb	torch.nn.Unfold accepts 0-dim for batch size (#40689 ) Summary: In partial completion of https://github.com/pytorch/pytorch/issues/12013 Allows specifying a tensor with 0-dim batch size for `torch.nn.Unfold()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40689 Reviewed By: zou3519 Differential Revision: D24441164 Pulled By: ngimel fbshipit-source-id: 49cd53b9b23f2e221aecdb4b5fed19a234038063	2020-10-22 13:05:24 -07:00
Iurii Zdebskyi	c57c560744	Revert "Push rocm to slow path (#46216 )" (#46728 ) Summary: This reverts commit bc1ce584512a860c15cb991460d8c98debd62b26. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/46728 Reviewed By: cpuhrsch Differential Revision: D24482783 Pulled By: izdeby fbshipit-source-id: 619b710a8e790b9878e7317f672b4947e7b88145	2020-10-22 12:04:29 -07:00
James Reed	9ccf85b7b4	[FX] Make wrapped functions traceable (#46692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46692 Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D24465958 Pulled By: jamesr66a fbshipit-source-id: 8c04aa3f59d1371d730ded7abd8f0c6c047e76b6	2020-10-22 12:00:02 -07:00
James Reed	2700932ef2	[FX] Fix recursion depth issue on Graph deepcopy (#46669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46669 Make `Graph`'s deepcopy behavior iterative rather than recursive. This prevents stack overflow issues with very large `Graph`s Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D24455120 Pulled By: jamesr66a fbshipit-source-id: 5c37db5acabe313b9a7a464bebe2a82c59e4e2e9	2020-10-22 11:55:23 -07:00
Richard Zou	18d80501a6	Batching rules for: new_zeros, new_empty (#46606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46606 Note that new_empty uses `m.impl_UNBOXED` because the operator doesn't go through the c10 dispatcher due to #43572. Test Plan: - new tests Reviewed By: ezyang Differential Revision: D24428106 Pulled By: zou3519 fbshipit-source-id: 5e10f87a967fb27c9c3065f3d5b577db61aeb20e	2020-10-22 11:40:51 -07:00
Richard Barnes	c44300884e	Clarify timing of GetDeviceProperty() (#46715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46715 Test Plan: N/A Reviewed By: ezyang Differential Revision: D24455538 fbshipit-source-id: 1770807d178f618ef6338e28f669f09e4cbd2009	2020-10-22 11:29:31 -07:00
Yang Wang	920ec6651f	[OpBench] fix jit mode run of operator benchmark for ops with parameters (#46694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46694 For the op with parameters (e.g. conv), the jit mode run currently will raise an error of `RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient`. After consulting https://www.fburl.com/vtkys6ug, decided to turn-off gradient for the parameters in the forward run. If we want op with parameters to work in backward with jit mode, probably needs to turn `TorchBenchmarkBase` into a sub-class of `nn.Module` Test Plan: ./buck-out/gen/caffe2/benchmarks/operator_benchmark/pt/conv_test.par --use_jit Reviewed By: mingzhe09088 Differential Revision: D24451206 fbshipit-source-id: 784eb60ca155b0152d745c92f6d0ce6b2c9014c6	2020-10-22 11:10:28 -07:00
Pritam Damania	06d50b5eb0	Pull in fairscale.nn.Pipe into PyTorch. (#44090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44090 This is an initial commit pulling in the torchgpipe fork at https://github.com/facebookresearch/fairscale. The purpose of this commit is to just pull in the code and ensure all tests and builds work fine. We will slowly modify this to match our intended API mentioned in https://fb.quip.com/txurAV3zIFox#RPZACAfAKMq. Follow up PRs would address further changes needed on top of the initial commit.. We're pulling the code into the `torch.distributed._pipeline.sync` package. The package is private on purpose since there is a lot of work (ex: docs, API changes etc.) that needs to go in before we can actually officially support this. ghstack-source-id: 114864254 Test Plan: 1) waitforbuildbot 2) Ran all tests on my devgpu Reviewed By: mrshenli Differential Revision: D23493316 fbshipit-source-id: fe3c8b7dadeeb86abdc00e8a8652491b0b16743a	2020-10-22 10:59:02 -07:00
Tao Xu	b63ddd6f57	[OSS][Metal] Support Resnet models Summary: This diff adds the missing ops to run the Resnet models from Torchvision. Move the tensors to GPU can significantly improve the perf as show below (iPhone11) Time running on CPU (ms): ``` forward took: 166.115 forward took: 150.722 forward took: 150.383 forward took: 150.345 forward took: 150.761 forward took: 150.533 forward took: 150.588 forward took: 150.812 forward took: 150.925 forward took: 150.25 ``` Time running on GPU (ms): ``` forward took: 39.9355 forward took: 41.3531 forward took: 41.798 forward took: 40.4744 forward took: 39.5181 forward took: 42.6464 forward took: 41.2658 forward took: 40.0862 forward took: 42.3533 forward took: 41.9348 ``` Discrepancy in result ``` GPU: "(623, 4.6211)", "(111, 3.8809)", "(499, 3.8555)", "(596, 3.8047)", "(473, 3.7422)", "(846, 3.5762)", "(892, 3.5449)", "(813, 3.5098)", "(446, 3.5020)", "(902, 3.4980)" CPU: "(623, 4.4229)", "(499, 3.8321)", "(596, 3.6192)", "(111, 3.5295)", "(813, 3.4848)", "(584, 3.3979)", "(418, 3.3357)", "(473, 3.2760)", "(846, 3.2745)", "(902, 3.2376)" ``` Test Plan: {F340824316} Reviewed By: IvanKobzarev Differential Revision: D24416294 fbshipit-source-id: 12c9199ade0b76a7aa8a3838eddc4c19c79b6f37	2020-10-22 10:49:51 -07:00
Alexander Grund	93719440b8	Replace map(lambda constructs (#46462 ) Summary: Follow-up of https://github.com/pytorch/pytorch/issues/46461 with a similar goal Makes them more readable and possibly faster. Care has to be taken because `map` applies the function immediately while `(x for x in xs)` is a generator expression which gets evaluated later. This is a benefit in some cases where it is not required to actually create the list of values in memory (e.g. when passing to `tuple` or `extend` or `join`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/46462 Reviewed By: zou3519 Differential Revision: D24422343 Pulled By: ezyang fbshipit-source-id: 252e33499c92ac0b15238f2df32681dbbda2b237	2020-10-22 09:50:22 -07:00
Rohan Varma	25dc0056f2	[RPC] print exception message on workers that run python functions (#46372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46372 Currently, in `_run_function`, we catch an exception from the python function which is run, and report it back to the master. However in some large scale training jobs, it would be valuable to also log the error on the trainer itself for faster debugging. Test Plan: Added unittest. Reviewed By: pritamdamania87 Differential Revision: D24324578 fbshipit-source-id: 88460d7599ea69d2c38fd9c10eb6471f7edd4100	2020-10-22 09:44:15 -07:00
Ivan Kobzarev	3112e23428	[py][vulkan][reland] Add is_vulkan to py api, add vulkan to device type parsing (#46655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46655 Test Plan: Imported from OSS Pulled By: IvanKobzarev Reviewed By: mrshenli Differential Revision: D24448984 fbshipit-source-id: 5000846a06077f7a5a06dd51da422d2a42f70820	2020-10-22 09:35:50 -07:00
Iurii Zdebskyi	bc1ce58451	Push rocm to slow path (#46216 ) Summary: Push rocm to slow path Pull Request resolved: https://github.com/pytorch/pytorch/pull/46216 Reviewed By: bwasti Differential Revision: D24263731 Pulled By: izdeby fbshipit-source-id: 98ede2478b8f075ceed44a9e4f2aa292f523b8e2	2020-10-22 09:31:01 -07:00
Sam Estep	3526b604b1	Add comment about running C++ executable lint locally (#46698 ) Summary: I got confused while locally running some of the `quick-checks` lints (still confused by `.jenkins/run-shellcheck.sh` but that's a separate matter) so I'm adding a comment to the "Ensure C++ source files are not executable" step in case someone in the future tries it and gets confused like I did. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46698 Reviewed By: walterddr Differential Revision: D24470718 Pulled By: samestep fbshipit-source-id: baacd8f414aa41b9b7b7aac765d938f21085eac5	2020-10-22 09:24:43 -07:00
Richard Barnes	52a970bac9	Minor cleaning of `test_cuda.py` (#46617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46617 Sort includes, fix deprecated test warning Test Plan: ``` buck run mode/dev-nosan //caffe2/test:cuda ``` Reviewed By: drdarshan Differential Revision: D24429247 fbshipit-source-id: 65f53d7c904032e5c8f8ca45d1d2bb437358ffdd	2020-10-22 09:03:30 -07:00
Richard Barnes	aa9ca85bd0	Fix interval midpoint calculation (#46666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46666 Interval midpoint calculations can overflow (integers) this diff fixes such an instance. Test Plan: Standard test rig Reviewed By: xw285cornell Differential Revision: D23997893 fbshipit-source-id: 788c1181031e0b71d3efb6f7090fbd4ba2aa3f86	2020-10-22 08:53:38 -07:00
Rohan Varma	7245d2c939	Avoid scatter for single-device case in DDP (#46304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46304 In the case that a single process operates only on one GPU, we can avoid this scatter and instead replace it with a recursive version of `to` which transfers the input tensors to the correct device. The implementation of `_recursive_to` is modeled after `scatter` in https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/scatter_gather.py, in order to keep parity with the previous conventions (i.e. custom types not having their tensors moved). ghstack-source-id: 114896677 Test Plan: Added unittest, and CI Reviewed By: pritamdamania87 Differential Revision: D24296377 fbshipit-source-id: 536242da05ecabfcd36dffe14168b1f2cf58ca1d	2020-10-22 08:29:37 -07:00
Shijun Kong	e5a2ba2ea1	Fix benchmark_caffe2 Summary: benchmakr_caffe2 is broken, due to some refactoring which change from eager test generation to register only. Test Plan: `buck run caffe2/benchmarks/operator_benchmark/c2:add_test` ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking Caffe2: add WARNING: Logging before InitGoogleLogging() is written to STDERR W1021 08:07:06.350742 390665 init.h:137] Caffe2 GlobalInit should be run before any other API calls. # Name: add_M8_N16_K32_dtypeint # Input: M: 8, N: 16, K: 32, dtype: int Forward Execution Time (us) : 652.748 # Benchmarking Caffe2: add # Name: add_M16_N16_K64_dtypefloat # Input: M: 16, N: 16, K: 64, dtype: float Forward Execution Time (us) : 63.570 # Benchmarking Caffe2: add # Name: add_M64_N64_K128_dtypeint # Input: M: 64, N: 64, K: 128, dtype: in ``` Reviewed By: qizzzh Differential Revision: D24448374 fbshipit-source-id: 850fd375d194c20c385ea4433aea13066c7476e6	2020-10-22 08:09:06 -07:00
albanD	143d1fd9f5	Namespace cleanup for 1.7 Part 2 (#46673 ) Summary: make valgrind_toggle and valgrind_supported_platform private functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/46673 Reviewed By: gchanan Differential Revision: D24458133 Pulled By: albanD fbshipit-source-id: 6f3fad9931d73223085edbd3cd3b7830c569570c	2020-10-22 07:57:51 -07:00
albanD	16c5b7b3f2	Avoid leaking has_torch_function and handle_torch_function in torch namespace (#46680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46680 Reviewed By: zou3519 Differential Revision: D24459823 Pulled By: albanD fbshipit-source-id: 4ff6925afcf14214dc45921bca0d2f33ca1944a1	2020-10-22 07:48:36 -07:00
Pearu Peterson	905ed3c840	Revised sparse tensor documentation. (#45400 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44635. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45400 Reviewed By: ezyang Differential Revision: D24359410 Pulled By: mruberry fbshipit-source-id: 37c691a49a7b0042c7a298e0ed1226702b097c8b	2020-10-22 02:07:54 -07:00
kshitij12345	8e13fe6c44	[numpy] `torch.sin` : support and promote integer inputs to float (#45733 ) Summary: References https://github.com/pytorch/pytorch/issues/42515 > Enable integer -> float unary type promotion for ops like sin Will follow-up for other such Ops once this PR is merged. cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/45733 Reviewed By: zou3519 Differential Revision: D24431194 Pulled By: mruberry fbshipit-source-id: db600bc5de0e535b538d2aa301c3526b7c75ed17	2020-10-22 01:58:57 -07:00
Yi Wang	98aad933b6	[pytorch][PR] Record FutureNCCL callback stream on CUDA caching allocator (#45318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45318 When calling `then()` from WorkNCCL, record the input data pointers in futureNCCLCallbackStream_ before the execution of the input callback. Note that the recording cannot be directly added to the lambda used by addCallback in ProcessGroupNCCL.hpp. This is because the type of future value in that context is pyobject rather than TensorList, but a type casting will require pybind and introduce Python dependency, which should not be allowed in c10d library. I have considered creating a util function in a separate file to support this type casting, and then placing it under torch/csrc directory where python dependency is allowed. However, torch/csrc has a dependency on c10d, so this will create a circular dependency. Finally, a `record_stream_cb_` member is added to FutureNCCL, and the default value is nullptr. A default `record_stream_cb_` implementation is added to `PythonFutureWrapper,` where Python dependency is allowed. In addition, a few lines are reformatted by lint. caffe2/torch/csrc/distributed/c10d/init.cpp is only reformatted. #Closes: https://github.com/pytorch/pytorch/issues/44203 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- ProcessGroupNCCLTest buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_accumulate_gradients_no_sync_allreduce_with_then_hook buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_with_then_hook_nccl Reviewed By: pritamdamania87 Differential Revision: D23910257 fbshipit-source-id: 66920746c41f3a27a3689f22e2a2d9709d0faa15	2020-10-22 01:49:47 -07:00
Jerry Zhang	ab28bd528d	[quant][graphmode][fx] Support quantizing FloatFunctional (#46634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46634 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24438227 fbshipit-source-id: f33439d51112e13f59ee4292e804495d38fa3899	2020-10-22 01:21:17 -07:00
Jeff Hwang	9b5197b763	[mlf][efficiency] add tensor inference function to last-n collector op (#46693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46693 title Test Plan: unit tests Reviewed By: hx89 Differential Revision: D23946770 fbshipit-source-id: f7c3d4a1b4ef3b0e5f56e5a9a30f5003ce9f40b0	2020-10-22 01:15:00 -07:00
Xiao Wang	fe4f90c40b	Cusolver inverse check info (#46625 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46557 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46625 Reviewed By: zou3519 Differential Revision: D24438577 Pulled By: ngimel fbshipit-source-id: d00e6eb2eae4aa39ca6ecf5914fe9cf37c24b906	2020-10-21 21:46:33 -07:00
Yi Wang	adffd8eb6b	Add const to the first arg 'grad' of Reducer::copy_grad_to_bucket (#46501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46501 Gradients in this method will not be modified. ghstack-source-id: 114851646 Test Plan: waitforbuildbot Reviewed By: pritamdamania87 Differential Revision: D24374300 fbshipit-source-id: a2941891008f9f197a5234b50260218932d2d37d	2020-10-21 21:34:31 -07:00
Brian Hirsh	db83ddcb86	small doc fix (#46599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46599 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24426181 Pulled By: bdhirsh fbshipit-source-id: d0900d5c43574c80f1bf614824eafd21ba6a9caf	2020-10-21 20:17:31 -07:00
Rahul Nambiar	adbb50ea67	Enabling alias annotation checks for all operations during autograd tests (#46601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46601 * except excluded tests and magic methods. https://github.com/pytorch/pytorch/issues/38731 Previously, we'd only do run these tests for inplace operations. Since this is a lot more tests, fixed these issues that came up when running them - - Updated schema of conj() to reflect existing behaviour. - Updated deepEquals method in check_alias_annotation.cpp to re-use the overloaded == operator. Previous implementation did not cover all types of IValues. - Corrected the order inputs are passed in during autograd testing of 'view' & 'reshape'. - Subbed out atn::ger with the func its aliased to, atn::outer, for testing. The alias annotation checking code doesn't handle aliased operators properly. ghstack-source-id: 114830903 Test Plan: Ran all tests in test:jit and verified they pass. Reviewed By: eellison Differential Revision: D24424955 fbshipit-source-id: 382d7e2585911b81b1573f21fff1d54a5e9a2054	2020-10-21 20:01:57 -07:00
Ailing Zhang	33e82c0269	Update error message to include link to readme. (#46613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46613 Test Plan: CI Reviewed By: ezyang Differential Revision: D24430852 fbshipit-source-id: 811e4d10508d47ef830d2b8445f11592f342461f	2020-10-21 19:38:19 -07:00
Jerry Zhang	13decddae2	[reland][quant] Add FixedQParamsFakeQuantize module (#45538 ) (#46657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46657 This is used to simulate fake quantize operation for ops with fixed quantization parameters e.g. hardsigmoid Test Plan: Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D24451406 fbshipit-source-id: 26cc140c00f12bdec9a8f9dc880f4c425f4d4074	2020-10-21 16:47:11 -07:00
Jerry Zhang	746febdeac	[quant][graphmode][fx] Add additional_object_mapping argument to convert (#46338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46338 Should we merge quantized module and quantized operator configurations? Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24317435 fbshipit-source-id: 3575251fe9d80a6628b8c3243c2ed92ea5e921e3	2020-10-21 16:39:07 -07:00
Mingzhe Li	8908f6ad8e	[op-bench] modify import path of configs (#46679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46679 Current way of import configs will have runtime error when a single benchmark is launched directly with buck(e.g. `/buck-out/gen/caffe2/benchmarks/operator_benchmark/pt/conv_test.par`). The diff fixed that issue. ghstack-source-id: 114857978 Test Plan: waitforsandcastle Reviewed By: vkuzo Differential Revision: D24459631 fbshipit-source-id: 29df17e66962a8604dbb7b8b9106713c3c19bed5	2020-10-21 16:15:11 -07:00
Nikita Shulga	6011b36080	Fix `type qualifiers ignored on return type` warning (#46668 ) Summary: This fixes following warning: ``` ../aten/src/ATen/cpu/vec256/vec256_float_neon.h:262:3: warning: type qualifiers ignored on function return type [-Wignored-qualifiers] 262 \| const float operator[](int idx) const { \| ^~~~~ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46668 Reviewed By: seemethere, janeyx99 Differential Revision: D24454206 Pulled By: malfet fbshipit-source-id: 8ba86a6d6c144f236a76bcef7ce794def7ea131f	2020-10-21 15:49:28 -07:00
Lee Newberg	e02a3e190e	DOC: Building libtorch using CMake (#44196 ) Summary: I am adding documentation for building the C++-only libtorch.so without invoking Python in the build and install process. This works on my Ubuntu 20.04 system and is designed to be operating system agnostic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44196 Reviewed By: zou3519 Differential Revision: D24421066 Pulled By: malfet fbshipit-source-id: e77c222703353ff7f7383fb88f7bce705f88b7bf	2020-10-21 14:29:36 -07:00
Ivan Murashko	ff0e20b384	Config inheritance was added for pytorch project (#46584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46584 The diff enables clang-tidy config inheritance for pytorch project. Reviewed By: suo Differential Revision: D24418191 fbshipit-source-id: 5cc0cf2d564236cedc4333af9324387d6d7a55cc	2020-10-21 14:06:35 -07:00
Ansley Ussery	475b4e30e6	Allow for source code comments at any level of indentation (#46548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46548 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D24434778 Pulled By: ansley fbshipit-source-id: e24ed73d497381e02ef1155622641027ae34770a	2020-10-21 13:49:42 -07:00
Xiaodong Wang	e3b2bfa2a3	[pytorch] Early return in nn.EmbeddingBag when weight is empty (#46572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46572 When `num_samples == 0`, grid becomes zero. Although CUDA just silently proceeds, `cudaGetLastError()` will complain about the `Error: invalid configuration argument`. So it's actually failing in some future places that becomes really hard to debug. Reviewed By: jianyuh Differential Revision: D24409874 fbshipit-source-id: ca54de13b1ab48204bbad265e3f55b56b94a1a2f	2020-10-21 13:44:56 -07:00
Joel Lamy-Poirier	caed29a069	fix-process-group-counter (#46563 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46561 A minimal fix to issue https://github.com/pytorch/pytorch/issues/46561. Increment the global variable `_group_count` at the same time as the others so the global state remains consistent in case of a failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46563 Reviewed By: zou3519 Differential Revision: D24422354 Pulled By: mrshenli fbshipit-source-id: 32493cc2001d21ad366c396d16c303936959434e	2020-10-21 13:03:53 -07:00
Xiang Gao	ce04e527b4	Bump up windows cudnn version (#46436 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/46436 Reviewed By: zou3519 Differential Revision: D24421785 Pulled By: ezyang fbshipit-source-id: 5aab2ae673e9ae07344a5f3bf0dc374a91dd12b2	2020-10-21 12:30:12 -07:00
Andrei Vukolov	c3c249aa0b	Workaround to pay attention for CUDA version (#46535 ) Summary: Added a workaround for the cases when NVCC tries to compile the object for sm_30 GPU compute capability to avoid the error message telling that `__ldg` intrinsic is not defined. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46535 Reviewed By: zou3519 Differential Revision: D24422445 Pulled By: ezyang fbshipit-source-id: 66e8eb1cbe42d848cfff46d78720d72100e628f8	2020-10-21 12:00:47 -07:00
Hugo van Kemenade	09896eda14	Fix version comparisons for Python 3.6, 3.10 and 4 (#32389 ) Summary: There's some code which uses `six.PY3`, similar to: ```python if six.PY3: print("Python 3+ code") else: print "Python 2 code" ``` Where: ```python PY3 = sys.version_info[0] == 3 ``` When run on Python 4, this will run the Python 2 code! Instead, use `six.PY2` and avoid `six.PY3`. --- Similarly, there's some `sys.version_info[0] == 3` checks, better done as `sys.version_info[0] >= 3`. --- Also, it's better to avoid comparing the `sys.version` string, as it makes assumptions that each version component is exactly one character long, which will break in Python 3.10: ```pycon >>> sys.version '3.8.1 (v3.8.1:1b293b6006, Dec 18 2019, 14:08:53) \n[Clang 6.0 (clang-600.0.57)]' >>> sys.version < "3.3" False >>> fake_v3_10 = '3.10.1 (v3.8.1:1b293b6006, Dec 18 2019, 14:08:53) \n[Clang 6.0 (clang-600.0.57)]' >>> fake_v3_10 < "3.3" True ``` --- Finally, I think the intention here is to skip when the Python version is < 3.6: ```python unittest.skipIf(sys.version_info[0] < 3 and sys.version_info[1] < 6, "dict not ordered") ``` However, it will really skip for Python 0.0-0.5, 1.0-1.5 and 2.0-2.5. It's best to compare to the `sys.version_info` tuple and not `sys.version_info[1]`: ```python unittest.skipIf(sys.version_info < (3, 6), "dict not ordered") ``` --- Found using https://github.com/asottile/flake8-2020: ```console $ pip install -U flake8-2020 $ flake8 --select YTT ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32389 Reviewed By: zou3519 Differential Revision: D24424662 Pulled By: ezyang fbshipit-source-id: 1266c4dbcc8ae4d2e2e9b1d7357cba854562177c	2020-10-21 11:52:50 -07:00
Jithun Nair	65da50c099	Apply hip vs hipcc compilation flags correctly for building extensions (#46273 ) Summary: Fixes issues when building certain PyTorch extensions where the cpp files do NOT compile if flags such as `__HIP_NO_HALF_CONVERSIONS__` are defined. cc jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/46273 Reviewed By: zou3519 Differential Revision: D24422463 Pulled By: ezyang fbshipit-source-id: 7a43d1f7d59c95589963532ef3bd3c68cb8262be	2020-10-21 11:40:40 -07:00
Ollin Boer Bohan	ac4ee0ef5d	Fix typo in docs for interpolate (#46589 ) Summary: Removes a spurious backtick in [the docs for `torch.nn.functional.interpolate`](https://pytorch.org/docs/stable/nn.functional.html?highlight=grid_sample#torch.nn.functional.interpolate) Pull Request resolved: https://github.com/pytorch/pytorch/pull/46589 Reviewed By: zou3519 Differential Revision: D24422550 Pulled By: ezyang fbshipit-source-id: c1e6b7de4584b2a3f68b458801a33b3fc71c1944	2020-10-21 11:31:53 -07:00
Negin Raoof	96bc7faa50	[ONNX] Export var, var_mean and std_mean ops (#45678 ) Summary: Adding export for var, var_mean and std_mean ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/45678 Reviewed By: houseroad Differential Revision: D24398811 Pulled By: bzinodev fbshipit-source-id: bf51422a9e035d521156c0fa6e77898aac83a380	2020-10-21 11:23:54 -07:00
Ivan Yashchuk	6de619e4a4	Allow converting parameters of nn.Module to complex dtypes (#44788 ) Summary: This PR makes it possible to cast the parameters of nn.Module to complex dtypes. The following code works with the proposed changes. ```python In [1]: import torch In [2]: lin = torch.nn.Linear(5, 1).to(torch.complex64) In [3]: lin(torch.zeros(3, 5, dtype=torch.complex64)) Out[3]: tensor([[-0.1739+0.j], [-0.1739+0.j], [-0.1739+0.j]], grad_fn=<AddmmBackward>) ``` Fixes https://github.com/pytorch/pytorch/issues/43477. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44788 Reviewed By: zou3519 Differential Revision: D24307225 Pulled By: anjali411 fbshipit-source-id: dacc4f5c8c9a99303f74d1f5d807cd657b3b69b5	2020-10-21 08:54:59 -07:00
Howard Huang	611f028168	Add Batch-Updating Parameter Server Example to CI Tests (#46510 ) Summary: Resolves one item in https://github.com/pytorch/pytorch/issues/46321 This PR sets up DistExamplesTest which will be used as the class to implement future tests for examples. This class is run as part of CI tests. It also creates a dist_examples folder and includes the [batch server example](https://github.com/pytorch/examples/blob/master/distributed/rpc/batch/parameter_server.py) which is slightly modified to allow to be tested. Run test: pytest test/distributed/rpc/test_tensorpipe_agent.py -k test_batch_updating_parameter_server -vs pytest test/distributed/rpc/test_process_group_agent.py -k test_batch_updating_parameter_server -vs Pull Request resolved: https://github.com/pytorch/pytorch/pull/46510 Reviewed By: mrshenli Differential Revision: D24379296 Pulled By: H-Huang fbshipit-source-id: 1c102041e338b022b7a659a51894422addc0e06f	2020-10-21 08:46:46 -07:00
Brian Hirsh	cf3d7a2660	first cut of adding a dangling impl test. fix #45165 (#46484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46484 Test Plan: Imported from OSS Reviewed By: ezyang, izdeby Differential Revision: D24392625 Pulled By: bdhirsh fbshipit-source-id: a6ab9c53e3e580e5713e08b20682ee6f8ed3bd84	2020-10-21 08:39:40 -07:00
Gao, Xiang	62e714c9d9	Delete CUDAUnaryOps.cpp (#46280 ) Summary: This file is no longer used Pull Request resolved: https://github.com/pytorch/pytorch/pull/46280 Reviewed By: ezyang Differential Revision: D24392749 Pulled By: heitorschueroff fbshipit-source-id: 677e1ba8664e3c53448a962f8a5d05e806961c2d	2020-10-21 08:31:34 -07:00
Shen Li	cebe87fe3a	Revert D24379422: [py][vulkan] Add is_vulkan to py api, add vulkan to device type parsing Test Plan: revert-hammer Differential Revision: D24379422 (`e8fbe54cf5`) Original commit changeset: afab89bb9e17 fbshipit-source-id: 743c77e453239f10c155c67490cba5a42ab42f58	2020-10-21 08:23:05 -07:00
Sebastian Messmer	8328630315	avoid inlining kernel lambdas on mobile (#46249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46249 This saves 15kb binary size on ios and increases binary size on android x86 for 30kb. It also reduces size a bit for android arm. I've talked to Martin and we should land this since Android binary size is much less important because of Voltron. ghstack-source-id: 114177627 Test Plan: bsb Reviewed By: ezyang Differential Revision: D23057150 fbshipit-source-id: 43bd62901b81daf08ed96de561d711357689178f	2020-10-21 03:27:21 -07:00
Mehdi Mirzazadeh	8357e2edc3	Back out "Revert D24269034: [fx] Refactor Tracer so that find_module and root args creation could be overridden by implementations" (#46573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46573 Original commit changeset: 7dd709b585f8 ghstack-source-id: 114730143 Test Plan: Verified on circleci that previously broken test is fixed. Reviewed By: zdevito Differential Revision: D24413096 fbshipit-source-id: 439568c631c4556b8ed6af20fcaa4b1375e554cf	2020-10-20 22:17:36 -07:00
Ivan Kobzarev	e8fbe54cf5	[py][vulkan] Add is_vulkan to py api, add vulkan to device type parsing (#46511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46511 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D24379422 Pulled By: IvanKobzarev fbshipit-source-id: afab89bb9e17c50934083598262bbe14ea82e893	2020-10-20 20:04:24 -07:00
lixinyu	a651b876a7	preserve non-dense or overlapping tensor's layout in _like functions (#46046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46046 _like functions are used in pytorch to create a new tensor with the same shape of the input tensor. But we don’t always preserve the layout permutation of the tensor. Current behavior is that, for a dense and non-overlapping tensor, its layout permutation is preserved. For eg. passing a channel last contiguous tensor t with ‘shape/stride’ (2, 4, 3, 2)/(24, 1, 8, 4) to empty_like(t) function will create a new tensor with exactly the same ‘shape/stride’ as the input tensor t. However, if the input tensor is non-dense or has overlap, we simply create a contiguous tensor based on input tensor’s shape, so the tensor layout permutation is lost. This PR preserves the layout permutation for non-dense or overlapping tensor. The strides propagation rule that used in this PR is exactly the same as what is being used in TensorIterator. The behavior changes are listed below: \| code \| old \| new \| \|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|-------------------------------------------------------\|------------------------------------------------------\| \| #strided tensors<br>a=torch.randn(2,3,8)[:,:,::2].permute(2,0,1)<br>print(a.stride())<br>print(a.exp().stride())<br>print((a+a).stride())<br>out = torch.empty(0)<br>torch.add(a,a,out=out)<br>print(out.stride()) \| (2, 24, 8) <br>(6, 3, 1) <br>(1, 12, 4) <br>(6, 3, 1) \| (2, 24, 8)<br>(1, 12, 4)<br>(1, 12, 4)<br>(1, 12, 4) \| \| #memory dense tensors<br>a=torch.randn(3,1,1).as_strided((3,1,1), (1,3,3))<br>print(a.stride(), (a+torch.randn(1)).stride())<br>a=torch.randn(2,3,4).permute(2,0,1)<br>print(a.stride())<br>print(a.exp().stride())<br>print((a+a).stride())<br>out = torch.empty(0)<br>torch.add(a,a,out=out)<br>print(out.stride()) \| (1, 3, 3) (1, 1, 1)<br>(1, 12, 4)<br>(6, 3, 1)<br>(1, 12, 4)<br>(6, 3, 1) \| (1, 3, 3) (1, 3, 3)<br>(1, 12, 4)<br>(1, 12, 4)<br>(1, 12, 4)<br>(1, 12, 4) \| This is to solve the non-dense tensor layout problem in #45505 TODO: - [x] Fix all the BC broken test cases in pytorch - [ ] Investigate if any fb internal tests are broken This change will cover all kinds of non-dense tensors. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24288970 Pulled By: glaringlee fbshipit-source-id: 320fd4e0d1a810a12abfb1441472298c983a368d	2020-10-20 19:49:49 -07:00
Ashkan Aliabadi	2181449068	Revert D24004795: [quant] Add FixedQParamsFakeQuantize module Test Plan: revert-hammer Differential Revision: D24004795 (`253918ec55`) Original commit changeset: fc4797f80842 fbshipit-source-id: 663169e90a2f58e5a89e4d382291ae41c24d0fee	2020-10-20 19:40:21 -07:00
Daya Khudia	f47231bf0e	[caffe2][dnnlowp] Remove openmp usage in quantize dnnlowp op Summary: It creates cpu overload issues when openmp gets enabled and OMP_NUM_THREADS=1 is not set. Test Plan: buck test //caffe2/caffe2/quantization/server:quantize_dnnlowp_op_test Reviewed By: jspark1105 Differential Revision: D24437305 fbshipit-source-id: 426209fc33ce0d4680c478f584716837ee62cb5e	2020-10-20 19:33:56 -07:00
Ashkan Aliabadi	6cd8b5e9a7	Provide CMake option to enable Vulkan API. (#46503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46503 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D24379144 Pulled By: AshkanAliabadi fbshipit-source-id: 8d8c57f96bbac2a44615828a3474c912704f3a85	2020-10-20 18:45:52 -07:00
Ashkan Aliabadi	3e041b503f	Add Vulkan job dispatch and flush. (#46008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46008 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D24291507 Pulled By: AshkanAliabadi fbshipit-source-id: a3d02e76708a38e49398bb71e31bb2ad676d01af	2020-10-20 18:41:29 -07:00
Pritam Damania	cb3c1d17e4	Promote -Wcast-function-type to an error in builds. (#46356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46356 Adding the flag `-Werror=cast-function-type` to ensure we don't allow any invalid casts (ex: PyCFunction casts). For more details see: https://github.com/pytorch/pytorch/issues/45419 ghstack-source-id: 114632980 Test Plan: waitforbuildbot Reviewed By: albanD Differential Revision: D24319759 fbshipit-source-id: 26ce4650c220e8e9dd3550245f214c7e6c21a5dc	2020-10-20 18:09:06 -07:00
Yanan Cao	42a70dc5a8	Implement all communication APIs in DistributedC10d new frontend (#46053 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46053 Reviewed By: wanchaol Differential Revision: D24300487 Pulled By: gmagogsfm fbshipit-source-id: 0d0b01c4f9d9e1d59dd17d7606ce47d54d61951d	2020-10-20 17:52:07 -07:00
Jerry Zhang	253918ec55	[quant] Add FixedQParamsFakeQuantize module (#45538 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45538 This is used to simulate fake quantize operation for ops with fixed quantization parameters e.g. hardsigmoid Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24004795 fbshipit-source-id: fc4797f80842daacd3b3584c5b72035774634edd	2020-10-20 17:43:25 -07:00
Lillian Johnson	f83cf2dab3	[JIT] adding torch.jit.isinstance support (#46062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46062 Adds support for torch.jit.isinstance in both eager and script mode Example use: ``` import torch from typing import Any, List class TestModule(torch.nn.Module): def __init__(self): super(TestModule, self).__init__() def call(self, input1: str, input2: str) -> str: return input1 def forward(self, input: Any) -> None: if torch.jit.isinstance(input, List[str]): for el in input: print(el) TestModule().forward(["1","2"]) scripted_module = torch.jit.script(TestModule()) scripted_module(["1", "2"]) ``` Test Plan: Imported from OSS Reviewed By: bertmaher, zou3519 Differential Revision: D24264415 Pulled By: Lilyjjo fbshipit-source-id: 039c95bddd854c414027ac8332832e6bc830b5b9	2020-10-20 16:47:49 -07:00
Ansley Ussery	fdc5261a20	Support %-based string formatting (#45976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45976 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D24374215 Pulled By: ansley fbshipit-source-id: 2005fe7f09dc8d3c44c4bfdccab6b4dc46a5e517	2020-10-20 16:13:36 -07:00
Jerry Zhang	f9446cb15a	[quant][refactor] Remove register api and rename get__mapping to get_default__mapping (#46337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46337 We plan to pass around the mappings instead of using global registration api to keep the mappings local to the transformations user is performing Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24317436 fbshipit-source-id: 81569b88f05eeeaa9595447e482a12827aeb961f	2020-10-20 15:53:47 -07:00
Ashkan Aliabadi	4f5b55f722	Revert D24395956: [pytorch][PR] Replace flatten tensors with flatten loops. Test Plan: revert-hammer Differential Revision: D24395956 (`2f51ddb81f`) Original commit changeset: f3792903f206 fbshipit-source-id: ef70713f0f67f577b09674219631d22440ceec31	2020-10-20 15:42:23 -07:00
Pritam Damania	2b221a9599	Remove PyCFunction casts as much as possible. (#46227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46227 Follow up from https://github.com/pytorch/pytorch/issues/45419, in this PR I've removed as many PyCFunction casts as I could from the codebase. The only ones I didn't remove were the ones with `METH_VARARGS \| METH_KEYWORDS` which have 3 parameters instead of 2 and had to be casted. Example: ` {"copy_", (PyCFunction)(void(*)(void))THPStorage_(copy_), METH_VARARGS \| METH_KEYWORDS, nullptr},` ghstack-source-id: 114632704 Test Plan: waitforbuildbot Reviewed By: albanD Differential Revision: D24269435 fbshipit-source-id: 025cfd43a9a2a3e59f6b2951c1a78749193d77cf	2020-10-20 15:01:51 -07:00
Hao Lu	1a3ea46dbf	[StaticRuntime] Threading model (#46219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46219 - Refactor StaticRuntime and group common data structures, the jit graph, and the script module into a separate struct `InferenceModule`: ``` struct InferenceModule { explicit InferenceModule(const torch::jit::Module& m); explicit InferenceModule(std::shared_ptr<torch::jit::Graph> g); torch::jit::Module module; std::shared_ptr<torch::jit::Graph> graph; std::unique_ptr<c10::FunctionSchema> schema; std::unordered_map<Value*, size_t> value_to_reg; std::vector<size_t> input_regs; // inputs to the graph std::vector<size_t> output_regs; // outputs of the graph std::vector<size_t> internals; }; ``` which is stored in the PyTorchPredictor, as well as the static runtime, and shared across threads. Then this is what's left inside the Static Runtime: ``` mutable std::vector<IValue> reg_; // The nodes we need to run std::vector<ProcessedNode> nodes_; ``` `reg_` holds all the weights and activations, which is different across threads during running. `nodes_` holds the op nodes and input/output registers, and is the same across threads for now. We could potentially put other stateful data structures in it, so I kept it inside the static runtime. It could be easily moved into the `InferenceModule` if we decide not to anything else into `ProcessedNode`. - Added StaticRuntimeOptions so we can toggle certain optimizations on/off, for testing and benchmarking. `cleanup_activations` is an example. - Integration with PyTorchPredictor. Added a lockfree stack in the PyTorchPredictor to hold all the static runtime instances. Benchmark shows that the `push` and `pop` combo takes about 80 ns, which is quite acceptable. This diff focuses on threading model only. Benchmarks will be separate. Reviewed By: bwasti Differential Revision: D24237078 fbshipit-source-id: fd0d6347f02b4526ac17dec1f731db48424bade1	2020-10-20 14:37:30 -07:00
Xiang Gao	e18a8aba95	Add CUDA 11.1 docker build (#46283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46283 Reviewed By: ezyang Differential Revision: D24346026 Pulled By: malfet fbshipit-source-id: f69558f35527833b867a7352c78b4e8ebc370db3	2020-10-20 13:35:31 -07:00
Jane Xu	187e23397c	Remove non-existent trusty image references (#46594 ) Summary: Simplifies some parts of build.sh and removes old references in the code to non-existent trusty images. There are other parts of the code where trusty is referenced for travis (most of them in third party directories) and I did not touch those. https://github.com/pytorch/pytorch/search?q=trusty Pull Request resolved: https://github.com/pytorch/pytorch/pull/46594 Reviewed By: seemethere Differential Revision: D24426796 Pulled By: janeyx99 fbshipit-source-id: 428c52893d2d35c1ddd1fd2e65a4b6575f260492	2020-10-20 12:54:45 -07:00
Raghavan Raman	2f51ddb81f	Replace flatten tensors with flatten loops. (#46539 ) Summary: This diff changes `TensorExprKernel::generateStmt` to use flatten loops instead of flatten tensors. Checked all tests on CPU as well as CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46539 Reviewed By: nickgg Differential Revision: D24395956 Pulled By: navahgar fbshipit-source-id: f3792903f2069bda37b571c9f0a840e6fb02f189	2020-10-20 12:16:18 -07:00
Facebook Community Bot	9c02e2112e	Automated submodule update: FBGEMM (#46578 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `23cb1db72b` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46578 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: YazhiGao Differential Revision: D24415308 fbshipit-source-id: c353dcf86cfd833a571a509930a17d09277a73e4	2020-10-20 11:43:01 -07:00
Kurt Mohler	e6ed887908	Add view test for tensor_split (#46427 ) Summary: Fulfills Mike's suggestion here: https://github.com/pytorch/pytorch/pull/44868#discussion_r505095018 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46427 Reviewed By: ezyang Differential Revision: D24355107 Pulled By: mruberry fbshipit-source-id: bddef2f9c2c41b5c5ac47a17d5ecdda580072e99	2020-10-20 09:56:37 -07:00
Shen Li	5003fd189c	Add an option to getWriteableTensorData to avoid copy CUDA tensor to CPU (#46524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46524 Test Plan: Imported from OSS Reviewed By: wanchaol Differential Revision: D24392794 Pulled By: mrshenli fbshipit-source-id: 21bf81dfc6c1d81689f8278d81f4c8776bc76ec1	2020-10-20 08:54:58 -07:00
acxz	5e0bfd7455	[Build] [CMake] [ROCm] find hsa-runtime64 properly (#45550 ) Summary: Properly Fixes https://github.com/pytorch/pytorch/issues/44384 similar in vein to https://github.com/pytorch/pytorch/issues/42064 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45550 Reviewed By: ezyang Differential Revision: D24412674 Pulled By: malfet fbshipit-source-id: f3d056c7069cb9d8a7d4174b604b9e3fbb14180b	2020-10-20 08:38:32 -07:00
Jane Xu	35a35c3498	Move Open MPI installation to Ubuntu CUDA Docker images (#46569 ) Summary: Instead of installing Open MPI for build and test jobs with environment -xenial-cuda, install Open MPI into the relevant Docker images. This would save time and remove duplication in our scripts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46569 Reviewed By: walterddr Differential Revision: D24409534 Pulled By: janeyx99 fbshipit-source-id: 6152f2f5daf63744d907dd234bc12d2a5ec58f3d	2020-10-20 08:31:35 -07:00
Jane Xu	0d4590c279	renaming env var IN_CIRCLECI to a broader name of IN_CI (#46567 ) Summary: The `IN_CIRCLECI` variable is a misnomer since the flag really indicates when we enable XML reporting because we want to run the test in CI. Since this doesn't necessarily mean CircleCI in particular, IN_CI is more accurate and general. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46567 Reviewed By: walterddr Differential Revision: D24407642 Pulled By: janeyx99 fbshipit-source-id: 5e141a0571b914310a174a58ac0fde58e9521c6b	2020-10-20 08:25:39 -07:00
Richard Zou	1c8d0d8cc9	Allow vmap to accept nested python data structures as inputs (#46289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46289 Previously, vmap had the restriction that any Tensors in the inputs must not be a part of a nested python collection. This PR relaxes that restriction. We can also do the same thing for vmap outputs, but I'll leave that for future work The mechanism behind vmap is to convert any Tensor inputs (that have been specified via in_dims) into BatchedTensor. Using a pytree implementation, that logic becomes: - flatten inputs - broadcast in_dims to inputs and unflatten it - use the flat inputs and flat in_dims to construct BatchedTensors - unflatten the BatchedTensors into the same structure as the original inputs. - Send the unflattened BatchedTensors into the desired function. Performance ----------- Some benchmarking using ``` import torch def foo(a, b, c, d): return a, b, c, d x = torch.randn(2, 3) foo_vmap = torch.vmap(foo) %timeit foo_vmap(x, x, x, x) ``` shows a slowdown from 15us to 25us on my machine. The 10us overhead is not a lot, especially since our vmap implementation is a "prototype". We can work around the performance in the future by either moving part of the pytree implementation into C++ or depending on a library that has a performant pytree implementation. Test Plan --------- - New tests, also updated old tests. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D24392892 Pulled By: zou3519 fbshipit-source-id: 072b21dcc6065ab43cfd341e84a01a5cc8ec3daf	2020-10-20 07:52:17 -07:00
Richard Zou	6025f8148a	Implement `_broadcast_to_and_flatten(pytree, spec)` (#46288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46288 This "broadcasts" `pytree` to have the same structure as `spec` and then flattens it. I find it hard to describe what that does in words, so here's an example: - Broadcasting 1 to have the same structure as [0, [0, 0]] would return [1, [1, 1]]. Further flattening it gives us [1, 1, 1]. - Broadcasting [1, 2] to have the same structure as [0, [0, 0]] would return [1, [2, 2]]. Further flattening it gives us [1, 2, 2]. What is this used for? ---------------------- The next PR up in the stack uses this helper function to allow vmap to accept nested data structures. `vmap(fn, in_dims)(*inputs)` allows the user to specify in_dims with a tree structure that is a sub-graph of that of `inputs` (where both contain the root of the tree). For example, one can do `vmap(fn, in_dims=0)(x, y, z)`. `in_dims` is 0 and inputs is (x, y, z). We would like to broadcast in_dims up to the structure of inputs to get (0, 0, 0). Another example, is `vmap(fn, in_dims=(0, 1))(x, [y, z])`. `in_dims` is (0, 1) and inputs is (x, [y, z]). We would like to broadcast in_dims up to the structure of inputs to get (0, [1, 1]); this value of in_dims is used to say "let's vmap over dim 0 for x and dim 1 for y and z". Test Plan --------- New tests. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D24392891 Pulled By: zou3519 fbshipit-source-id: 6f494d8b6359582f1b4ab6b8dd6a956d8bfe8ed4	2020-10-20 07:52:14 -07:00
Richard Zou	0285618a11	Add utilities to support handling of nested python data structures (#46287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46287 This adds a lightweight `pytree` implementation that is similar to and inspired by JAX pytrees, tensorflow.nest, deepmind/tree, TorchBeast's TensorNest, etc. A pytree is Python nested data structure. It is a tree in the sense that nodes are Python collections (e.g., list, tuple, dict) and the leaves are Python values. Furthermore, a pytree should not contain reference cycles. This PR: - adds support for flattening and unflattening nested Python list/dict/tuples Context: nested Tensor inputs for vmap -------------------------------------- Right now, vmap is restricted to taking in flat lists of tensors. This is because vmap needs to be able to convert every tensor in the input that is being vmapped over into a BatchedTensor. With a pytree library, we can simply flatten the input data structure (returning the leaves), map all of the Tensors in the flat input to BatchedTensors, and unflatten the flat list of BatchedTensors into a new input. Or equivalently, with a `tree_map` function, we can map a nested python data structure containing Tensors into one containing BatchedTensors. Future work ----------- In some future PRs, we'll add nested input support for vmap. The prerequisites for that are: - a `broadcast_to(small, big)` that broadcasts `small` up to `big`. This is for handling the in_dims to vmap: the in_dims structure must be compatible with the structure of the inputs. Test Plan --------- - New tests in test/test_pytree.py Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D24392890 Pulled By: zou3519 fbshipit-source-id: 7daf7430c5a38354e7d203a72882bd7a9b24cfb1	2020-10-20 07:45:45 -07:00
Dhruv Matani	75322dbeb4	[PyTorch] [BUCK] Replace pt_deps.bzl with a YAML operator dependency file which is generated by the code analyser (#46057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46057 The code analyser (that uses LLVM and runs in the OSS PyTorch git repo) already produces a YAML file which contains base operator names and the operators that they depend on. Currently, this operator dependency graph is converted into a python dictionary to be imported in BUCK and used there. However, it is mostly fed into other executables by serializing the JSON and the consumer pieces this JSON together by concatenating each argument together. This seems unnecessary. Instead, this diff retains the original YAML file and makes all consumers consume that same YAML file. ghstack-source-id: 114641582 Test Plan: Build Lite Predictor + sandcastle. Reviewed By: iseeyuan Differential Revision: D24186303 fbshipit-source-id: eecf41bf673d90b960c3efe7a1271249f0a4867f	2020-10-20 02:00:36 -07:00
Mikhail Zolotukhin	e5ed037529	[StaticRuntime] Add a 'speed of light' benchmark. (#46308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46308 This PR adds a hand optimized version of DeepAndWide model with the goal of estimating overheads of static runtime. While static runtime is currently much faster than the existing JIT interpreter, it would be useful to understand how close we are to an absolutely 0-overhead system. Currently, this "ideal" implementation is 2x faster than the static runtime on batchsize=1. Full benchmark results: ``` Running build/bin/static_runtime_bench Run on (24 X 2394.71 MHz CPU s) CPU Caches: L1 Data 32K (x24) L1 Instruction 32K (x24) L2 Unified 4096K (x24) L3 Unified 16384K (x24) ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ BM_deep_wide_base/1 59518 ns 59500 ns 10909 BM_deep_wide_base/8 74635 ns 74632 ns 9317 BM_deep_wide_base/20 82186 ns 82147 ns 9119 BM_deep_wide_fast/1 13851 ns 13851 ns 49825 << new BM_deep_wide_fast/8 22497 ns 22497 ns 32089 << new BM_deep_wide_fast/20 23868 ns 23841 ns 31184 << new BM_deep_wide_jit_graph_executor/1 62786 ns 62786 ns 10835 BM_deep_wide_jit_graph_executor/8 76730 ns 76718 ns 7529 BM_deep_wide_jit_graph_executor/20 78886 ns 78883 ns 8769 BM_deep_wide_jit_profiling_executor/1 69504 ns 69490 ns 10309 BM_deep_wide_jit_profiling_executor/8 75718 ns 75715 ns 9199 BM_deep_wide_jit_profiling_executor/20 75364 ns 75364 ns 9010 BM_deep_wide_static/1 40324 ns 40318 ns 17232 BM_deep_wide_static/8 50327 ns 50319 ns 13335 BM_deep_wide_static/20 53075 ns 53071 ns 12855 BM_deep_wide_static_threaded/threads:8 6258 ns 49873 ns 14008 ``` PS: The implementation could probably be optimized even more. Differential Revision: D24300702 Test Plan: Imported from OSS Reviewed By: dzhulgakov Pulled By: ZolotukhinM fbshipit-source-id: 7870bdef127c39d11bcaa4f03a60eb80a46be58e	2020-10-19 23:35:55 -07:00
Nick Gibson	17f8c329df	[NNC] IRSimplifier rules for Compare and Mod (#46412 ) Summary: Adds new rules to the NNC IRSimplifier to take care of the following cases: * Comparisons which are symbolic but have a constant difference. E.g. this is most useful in cases like `if (x > x + 4) ...` which we can now eliminate. * Simplification of `Mod` nodes, including simple rules such as `0 % x` and `x % 1`, but also factorization of both sides to find common symbolic multiples. E.g. `(x * y) % x` can be cancelled out to `0`. See tests for many more examples! Pull Request resolved: https://github.com/pytorch/pytorch/pull/46412 Reviewed By: navahgar Differential Revision: D24396151 Pulled By: nickgg fbshipit-source-id: abb954dc930867d62010dcbcd8a4701430733715	2020-10-19 19:37:09 -07:00
Jerry Zhang	a06b95b2ba	[quant][graphmode][fx] Support non_traceable_module/module_class (#46298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46298 Allow user to specify a list of qualified names for non traceable submodule or type of the non traceable submodule See quantize_fx.py for api Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24294210 fbshipit-source-id: eb1e309065e3dfbf31e63507aaed73587f0dae29	2020-10-19 18:50:08 -07:00
Alexander Grund	5b0f400488	Replace list(map(...)) constructs by list comprehensions (#46461 ) Summary: As discussed in https://github.com/pytorch/pytorch/issues/46392 this makes the code more readable and possibly more performant. It also fixes a bug detected by this where the argument order of `map` was confused: `030a24906e (diff-5bb26bd3a23ee3bb540aeadcc0385df2a4e48de39f87ed9ea76b21990738fe98L1537-R1537)` Fixes https://github.com/pytorch/pytorch/issues/46392 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46461 Reviewed By: ailzhang Differential Revision: D24367015 Pulled By: ezyang fbshipit-source-id: d55a67933cc22346b00544c9671f09982ad920e7	2020-10-19 18:42:49 -07:00
Jiakai Liu	3d421b3137	[pytorch] rewrite of the python binding codegen with the v2 API (#46244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46244 - What does the generated binding code do? The Python binding codegen produces code that takes the input list of PyObjects, finds the matching ATen C++ function using PythonArgParser, converts the PyObjects into C++ types and calls the ATen C++ function: ``` +--------+ parsing +------------------------+ binding +-----------------------+ \| PyObjs \| ---------> \| PythonArgParser Output \| ---------> \| Cpp Function Dispatch \| +--------+ +------------------------+ +-----------------------+ ``` - Are Python arguments 1-1 mapped to C++ arguments? Python arguments might be reordered, packed, unpacked when binding to C++ arguments, as illustrated below: ``` // Binding - Reorder & Packing // aten::empty.names(int[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor Python Args Cpp Args ----------------------------------------------------------- 0: size size 1: names names 2: memory_format -------+ 3: dtype -----+-\|--> options 4: layout / \| 5: device / +--> memory_format 6: pin_memory / 7: requires_grad -+ // Binding - Unpacking // aten::max.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices) Python Args Cpp Args ----------------------------------------------------------- +----> max /-----> max_values 0: input / self 1: dim / dim 2: keepdim / keepdim 3: out -----+ ``` - Why do we want to rewrite the python binding codegen? The old codegen takes Declarations.yaml as input. It doesn't distinguish between Python arguments and C++ arguments - they are all mixed together as a bag of non-typed dict objects. Different methods process these arg objects and add new attributes for various different purposes. It's not so obvious to figure out the semantics of these attributes. The complicated binding logic happens implicitly and scatteredly. ``` +--------------------+ \| Native Functions \| +--------------------+ \| \| v +--------------------+ \| Cpp Signatures \| +--------------------+ \| \| v +--------------------+ \| Declarations.yaml \| +--------------------+ \| +-------------------------------------+ \| +-------> \| PythonArgParser Schema \| \| \| +-------------------------------------+ \| \| . \| \| . v \| . +--------------------+ +-------------------------------------+ \| NonTyped Args Objs \| --> \| PythonArgParser -> Cpp Args Binding \| +--------------------+ +-------------------------------------+ \| . \| . \| . \| +-------------------------------------+ +-------> \| Cpp Function Dispatch \| +-------------------------------------+ ``` This PR leverages the new immutable data models introduced in the new aten codegen. It introduces dedicated data models for python schema. This way, we can not only avoid subtle Declaration.yaml conversions but also decouple the generation of python schema, python to c++ binding and c++ function call. The ultimate state will be like the following diagram: ``` +-------------------+ +-------------------------------------+ +-------> \| Python Signatures \| --> \| PythonArgParser Schema \| \| +-------------------+ +-------------------------------------+ \| \| . \| \| . \| \| . +------------------+ \| +-------------------------------------+ \| Native Functions \| +-------> \| PythonArgParser -> Cpp Args Binding \| +------------------+ \| +-------------------------------------+ \| \| . \| \| . \| \| . \| +-------------------+ +-------------------------------------+ +-------> \| Cpp Signatures \| --> \| Cpp Function Dispatch \| +-------------------+ +-------------------------------------+ ``` This PR has migrated the core binding logic from tools/autograd/gen_python_functions.py to tools/codegen/api/python.py. It produces the byte-for-byte same results (tested with #46243). Will migrate the rest of gen_python_functions.py in subsequent PRs. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24388874 Pulled By: ljk53 fbshipit-source-id: f88b6df4e917cf90d868a2bbae2d5ffb680d1841	2020-10-19 17:36:45 -07:00
Nikita Shulga	8f12c0e786	Revert D24269034: [fx] Refactor Tracer so that find_module and root args creation could be overridden by implementations Test Plan: revert-hammer Differential Revision: D24269034 (`7b2e8bec85`) Original commit changeset: d7b67f2349dd fbshipit-source-id: 7dd709b585f82d52d9b9973508137e36d5b5871e	2020-10-19 17:29:18 -07:00
Richard Barnes	cda88e8e4b	Fix interval midpoint calculation in register_op_utils Summary: Interval midpoint calculations can overflow (integers). This fixes such an instance. Test Plan: Standard test rigs. Reviewed By: iseeyuan Differential Revision: D24392608 fbshipit-source-id: 0face1133d99cea342abbf8884b14262d50b0826	2020-10-19 16:11:22 -07:00
jiej	ac146c4820	[nvFuser] Switching to `CudaFusionGuard` from `BailOut` for nvfuser - update 2 (#46452 ) Summary: 1. Added CudaFusionGuard as the custom TypeCheck for nvfuser; enabled dynamic shape support with profiling executor; 2. dropped support for legacy fuser; 3. re-enabled nvfuser tests; 4. added registration for profiling record to allow profiling on user specified nodes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46452 Reviewed By: zou3519, anjali411 Differential Revision: D24364642 Pulled By: ngimel fbshipit-source-id: daf53a9a6b6636e1ede420a3a6d0397d4a8b450b	2020-10-19 15:44:31 -07:00
Jerry Zhang	30d687522d	[reland][quant][eagermode] Move custom_module registration to prepare/convert_custom_config_dict (#46293 ) (#46364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46364 Test Plan: Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D24322747 fbshipit-source-id: 4801ba1835fc805bf767fe9810b9edfa2ceeefb4	2020-10-19 15:21:00 -07:00
Facebook Community Bot	f0f10f82f4	Automated submodule update: FBGEMM (#46443 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `f20d61e119` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46443 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: dskhudia Differential Revision: D24355446 fbshipit-source-id: dc2900367d5de37e67efa963cb2c417e29fe7a88	2020-10-19 14:23:50 -07:00
Mehdi Mirzazadeh	7b2e8bec85	[fx] Refactor Tracer so that find_module and root args creation could be overridden by implementations (#46493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46493 This will allow us to override tow following methods of Tracer: -- get_module_qualified_name: to find qualified name of a module. In default implementation, it looks for module in registered modules and from there it gets to the name, but in some scenarios, the module being called could not be the exact same module that was registered. -- create_args_for_root: This allows to create and pass custom structured input (like dictionary with specific keys) to the main module, rather than pure proxy objects. This will also allows us to let proxy objects only represent tensors when they are passed to modules. ghstack-source-id: 114609258 Test Plan: Unit tests passed Reviewed By: zdevito, bradleyhd Differential Revision: D24269034 fbshipit-source-id: d7b67f2349dd516b6f7678e41601d6899403d9de	2020-10-19 13:55:31 -07:00
Vasiliy Kuznetsov	6dc763df30	PyTorch: add API usage logging to numeric suite (#46504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46504 As titled, so we can start seeing who is using this. Test Plan: CI Reviewed By: hx89 Differential Revision: D24375254 fbshipit-source-id: ff7b5560d0a6a175cecbf546eefc910759296dbb	2020-10-19 13:17:02 -07:00
Emilio Castillo	d38a71d579	`torch.nn.modules.LazyModuleMixin` and `torch.nn.LazyLinear` (Shape Inference II) (#44538 ) Summary: Retake on https://github.com/pytorch/pytorch/issues/40493 after all the feedback from albanD This PR implements the generic Lazy mechanism and a sample `LazyLinear` layer with the `UninitializedParameter`. The main differences with the previous PR are two; Now `torch.nn.Module` remains untouched. We don't require an explicit initialization or a dummy forward pass before starting the training or inference of the actual module. Making this much simpler to use from the user side. As we discussed offline, there was the suggestion of not using a mixin, but changing the `__class__` attribute of `LazyLinear` to become `Linear` once it's completely initialized. While this can be useful, by the time being we need `LazyLinear` to be a `torch.nn.Module` subclass since there are many checks that rely on the modules being instances of `torch.nn.Module`. This can cause problems when we create complex modules such as ``` class MyNetwork(torch.nn.Module): def __init__(self): super(MyNetwork, self).__init__() self.conv = torch.nn.Conv2d(20, 4, 2) self.linear = torch.nn.LazyLinear(10) def forward(self, x): y = self.conv(x).clamp(min=0) return self.linear(y) ``` Here, when the __setattr__ function is called at the time LazyLinear is registered, it won't be added to the child modules of `MyNetwork`, so we have to manually do it later, but currently there is no way to do such thing as we can't access the parent module from LazyLinear once it becomes the Linear module. (We can add a workaround to this if needed). TODO: Add convolutions once the design is OK Fix docstrings Pull Request resolved: https://github.com/pytorch/pytorch/pull/44538 Reviewed By: ngimel Differential Revision: D24162854 Pulled By: albanD fbshipit-source-id: 6d58dfe5d43bfb05b6ee506e266db3cf4b885f0c	2020-10-19 13:13:54 -07:00
Ksenija Stanojevic	7f8b02f5b7	[ONNX] Add test for Batchnorm (#45633 ) Summary: Add test for Batchnorm in training mode Pull Request resolved: https://github.com/pytorch/pytorch/pull/45633 Reviewed By: VitalyFedyunin Differential Revision: D24117026 Pulled By: bzinodev fbshipit-source-id: 2728d8732e856390a2a00c3e8425b9c312c00650	2020-10-19 13:07:40 -07:00
Nikita Shulga	172ed51a17	Mark parts of spectral tests as slow (#46509 ) Summary: According to https://app.circleci.com/pipelines/github/pytorch/pytorch/228154/workflows/31951076-b633-4391-bd0d-b2953c940876/jobs/8290059 TestFFTCUDA.test_fftn_backward_cuda_complex128 takes 242 seconds to finish, where most of the time spent checking 2nd gradient Refactor common part of test_fft_backward and test_fftn_backward into _fft_grad_check_helper Introduce `slowAwareTest` decorator Split test into fast and slow parts by checking 2nd degree gradient only during the slow part Pull Request resolved: https://github.com/pytorch/pytorch/pull/46509 Reviewed By: walterddr Differential Revision: D24378901 Pulled By: malfet fbshipit-source-id: 606670c2078480219905f63b9b278b835e760a66	2020-10-19 10:11:46 -07:00
Iurii Zdebskyi	e7564b076c	Refactor scalar list APIs to use overloads (#45673 ) Summary: Refactor foreach APIs to use overloads in case of scalar list inputs. Tested via unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45673 Reviewed By: heitorschueroff Differential Revision: D24053424 Pulled By: izdeby fbshipit-source-id: 35976cc50b4acfe228a32ed26cede579d5621cde	2020-10-19 09:28:49 -07:00
lixinyu	f06ea4bcac	Add myself as owner of C++ APIs related folder (#46487 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46487 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D24370164 Pulled By: glaringlee fbshipit-source-id: 7b25b15eb906917376d2e5290782572a3cd48d3c	2020-10-19 09:16:02 -07:00
Shen Li	eadc59df55	Enable TP_USE_CUDA and TP_ENABLE_CUDA_IPC (#46523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46523 Test Plan: Imported from OSS Reviewed By: beauby Differential Revision: D24385830 Pulled By: mrshenli fbshipit-source-id: 59a40843b4dc1585e176062476da9ab74c84179b	2020-10-19 09:05:00 -07:00
Brian Hirsh	00c779a92b	detect inplace modifications of views earlier (fix #21875 ) (#46204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46204 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D24259500 Pulled By: bdhirsh fbshipit-source-id: 223f8a07da4e4121009fc0a8b6760d90eef089b3	2020-10-19 08:58:33 -07:00
Dhruv Matani	0c5cd8c2b9	[RFC] Switch PyTorch Selective Build (Custom Build) to use the SelectiveBuilder abstraction (#45722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45722 This diff does a bunch of things: 1. Introduces some abstractions as detailed in https://fb.quip.com/2oEzAR5MKqbD to help with selective build related codegen in multiple files. 2. Adds helper methods to combine operators, debug info, operator lists, etc... 3. Currently, the selective build machinery querying `op_registration_whitelist` directly at various places in the code. `op_registration_whitelist` is a list of allowed operator names (without overload name). We want to move to a world where the overload names are also included so that we can be more selective about which operators we include. To that effect, it makes sense to hide the checking logic in a separate abstraction and have the build use that abstraction instead of putting all this selective build specific logic in the code-generator itself. This change is attempting to do just that. 4. Updates generate_code, unboxing-wrapper codegen, and autograd codegen to accept the operator selector paradigm as opposed to a selected operator list. 5. Update `tools/code_analyzer/gen_op_registration_allowlist.py` to expose providing an actual structured operator dependency graph in addition to a serialized string. There are a bunch of structural changes as well: 1. `root_op_list.yaml` and `combined_op_list.yaml` are now actual YAML files (not a space separated list of operator names) 2. `generate_code.py` accepts only paths to operator list YAML files (both old style as well as new style) and not list of operator names on the command line as arguments 3. `gen.py` optionally also accepts a custom build related operators YAML path (this file has information about which operators to register in the generated library). ghstack-source-id: 114578753 (Note: this ignores all push blocking failures!) Test Plan: `buck test caffe2/test:selective_build` Generated YAML files after the change: {P143981979} {P143982025} {P143982056} Ensure that the generated files are same before and after the change: ``` [dhruvbird@devvm2490 /tmp/TypeDefault.cpp] find -name ".cpp" \| xargs md5sum d72c3d125baa7b77e4c5581bbc7110d2 ./after_change/gen_aten/TypeDefault.cpp 42353036c83ebc7620a7159235b9647f ./after_change/lite_predictor_lib_aten/TypeDefault.cpp d72c3d125baa7b77e4c5581bbc7110d2 ./before_change/gen_aten/TypeDefault.cpp 42353036c83ebc7620a7159235b9647f ./before_change/lite_predictor_lib_aten/TypeDefault.cpp ``` `VariableTypes_N.cpp` are generated the same both before and after the change: ``` [dhruvbird@devvm2490 /tmp/VariableType] find -name ".cpp" \| xargs -n 1 md5sum \| sort 3be89f63fd098291f01935077a60b677 ./after/VariableType_2.cpp 3be89f63fd098291f01935077a60b677 ./before/VariableType_2.cpp 40a3e59d64e9dbe86024cf314f127fd6 ./after/VariableType_4.cpp 40a3e59d64e9dbe86024cf314f127fd6 ./before/VariableType_4.cpp a4911699ceda3c3a430f08c64e8243fd ./after/VariableType_1.cpp a4911699ceda3c3a430f08c64e8243fd ./before/VariableType_1.cpp ca9aa611fcb2a573a8cba4e269468c99 ./after/VariableType_0.cpp ca9aa611fcb2a573a8cba4e269468c99 ./before/VariableType_0.cpp e18f639ed23d802dc4a31cdba40df570 ./after/VariableType_3.cpp e18f639ed23d802dc4a31cdba40df570 ./before/VariableType_3.cpp ``` Reviewed By: ljk53 Differential Revision: D23837010 fbshipit-source-id: ad06b1756af5be25baa39fd801dfdf09bc565442	2020-10-18 15:10:42 -07:00
Nikita Shulga	bcd68dfa5f	Change codecov comment style to be less verbose (#46499 ) Summary: Do not post comments if there are no changes in coverage and post only diff stats Partially addresses an issue raised in https://github.com/pytorch/pytorch/issues/44187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46499 Reviewed By: walterddr Differential Revision: D24373480 Pulled By: malfet fbshipit-source-id: a49d7e661507ad98d5222c119b2f3f3550c1a949	2020-10-18 14:10:41 -07:00
Nikita Shulga	351670f004	Delete libtorch test jobs (#46508 ) Summary: As they are noops for the 90+ days but still take 10min from start to finish: `58f14d3a28/.jenkins/pytorch/test.sh (L376-L374)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46508 Reviewed By: walterddr Differential Revision: D24378877 Pulled By: malfet fbshipit-source-id: 2e9990a8e1524ef39e891bb5ea874447eef34b31	2020-10-18 14:05:44 -07:00
Nikolay Korovaiko	c3466dabaa	Disable profiling when `getGraphExecutorOptimize` is unset (#46479 ) Summary: `getGraphExecutorOptimize` mandates we don't do any optimizations beyond what's required to run graphs. In this scenario, we don't want to do any profiling as profiling information will not be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46479 Reviewed By: ZolotukhinM Differential Revision: D24368292 Pulled By: Krovatkin fbshipit-source-id: a2c7618d459efca9cb0700c4d64d829b352792a8	2020-10-17 22:30:05 -07:00
Yanan Cao	6a2f40dc66	Expose script_if_tracing as public API (#46494 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45921 `torch.jit._script_if_tracing` is still kept for BC Pull Request resolved: https://github.com/pytorch/pytorch/pull/46494 Reviewed By: ZolotukhinM Differential Revision: D24381621 Pulled By: gmagogsfm fbshipit-source-id: 35d9f2da38c591039ba95cd95ef186e6c7e47586	2020-10-17 17:31:57 -07:00
Peter Bell	da95eec613	torch.fft: Two dimensional FFT functions (#45164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45164 This PR implements `fft2`, `ifft2`, `rfft2` and `irfft2`. These are the last functions required for `torch.fft` to match `numpy.fft`. If you look at either NumPy or SciPy you'll see that the 2-dimensional variants are identical to `*fftn` in every way, except for the default value of `axes`. In fact you can even use `fft2` to do general n-dimensional transforms. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D24363639 Pulled By: mruberry fbshipit-source-id: 95191b51a0f0b8e8e301b2c20672ed4304d02a57	2020-10-17 16:23:06 -07:00
Tao Xu	495070b388	[Metal] Add the Python binding for optimize_for_mobile (#46456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46456 Add the python binding in CMake. The general workflow is - Build pytorch - `USE_PYTORCH_METAL=ON python setup.py install --cmake` - Run optimize_for_mobile ``` import torch from torch.utils.mobile_optimizer import optimize_for_mobile scripted_model = torch.jit.load('./mobilenetv2.pt') optimized_model = optimize_for_mobile(scripted_model, backend='metal') torch.jit.export_opnames(optimized_model) torch.jit.save(optimized_model, './mobilenetv2_metal.bc') ``` The exported ops are ``` ['aten::adaptive_avg_pool2d', 'aten::add.Tensor', 'aten::addmm', 'aten::reshape', 'aten::size.int', 'metal::copy_to_host', 'metal_prepack::conv2d_run'] ``` ghstack-source-id: 114559878 Test Plan: - Sandcastle CI - Circle CI Reviewed By: kimishpatel Differential Revision: D24356768 fbshipit-source-id: fb5c4c4b6316347b67edb4132da044a81470ddfd	2020-10-17 10:26:25 -07:00
Wanchao Liang	e8ff0f6c5c	[c10] add operator= of intrusive_ptr to weak_intrusive_ptr (#44045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44045 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23632281 Pulled By: wanchaol fbshipit-source-id: ea50427fc261f0c77ddaac2e73032827320d7077	2020-10-17 03:35:44 -07:00
Tao Xu	cc471c6daf	[Metal] Enable optimize_for_mobile on Linux (#46384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46384 Currently, the optimize_for_mobile binary only works on macOS, which is not very convenient to use. This diff introduces a new buck target that separates out the objective-c code. The goal here is to be able to export models for metal on linux machines. ghstack-source-id: 114499418 Test Plan: - set `enable_mpscnn` to 1 in pt_defs.bzl - buck build //xplat/caffe2:optimize_for_mobile --show-full-output - ./optimize_for_mobile --model="./model.pt" --backend="metal" - CI ``` [taox@devvm2780.vll0 ~/mobilenetv2] ./optimize_for_mobile --model="./model.pt" --backend="metal" pt_operator_library( name = "old_op_library", ops = [ "aten::Int.Tensor", "aten::_convolution.deprecated", "aten::adaptive_avg_pool2d", "aten::add.Tensor", "aten::addmm", "aten::batch_norm", "aten::dropout", "aten::hardtanh_", "aten::reshape", "aten::size.int", "aten::t", "prim::NumToTensor.Scalar", ], ) find output: %411 : Tensor = aten::addmm(%self.classifier.1.bias, %input0.1, %694, %26, %26) # /Users/taox/anaconda/lib/python3.7/site-packages/torch/nn/functional.py:1674:0 insert: %817 : Tensor = metal::copy_to_host(%411) pt_operator_library( name = "new_op_library", ops = [ "aten::adaptive_avg_pool2d", "aten::add.Tensor", "aten::addmm", "aten::reshape", "aten::size.int", "metal::copy_to_host", "metal_prepack::conv2d_run", ], ) ``` Reviewed By: linbinyu Differential Revision: D24322017 fbshipit-source-id: 2b8062b069659bfec78ca4f9f9a5bb9dfbd693d2	2020-10-16 23:46:02 -07:00
Mikhail Zolotukhin	5233ff9f15	[TensorExpr] Re-enable test for torch.cat, add a test for torch.cat being a single node in a fusion group. (#46447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46447 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D24356017 Pulled By: ZolotukhinM fbshipit-source-id: 847c1d9c4f0f77f53ea3412a5663d486e78bccad	2020-10-16 20:26:48 -07:00
Mikhail Zolotukhin	d6de9d573a	[TensorExpr] Properly handle input types promotion and special case of empty inputs for aten::cat. (#46500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46500 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D24373671 Pulled By: ZolotukhinM fbshipit-source-id: b3be73a89a9ab6654212cb7094f32bf1c445e876	2020-10-16 20:26:46 -07:00
Mikhail Zolotukhin	0f668d95b6	[TensorExpr] Fix shape inference logic for aten::cat. (#46482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46482 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D24366778 Pulled By: ZolotukhinM fbshipit-source-id: 000ff363b11599ba3827cdf2db3d4793878b84ab	2020-10-16 20:24:30 -07:00
Ailing Zhang	58f14d3a28	Remove catchAllKernel_. (#46354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46354 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24319418 Pulled By: ailzhang fbshipit-source-id: 295f0439087722b5cb60e43f2bca1ba8bd56a817	2020-10-16 19:11:17 -07:00
Tao Xu	04e5fcc0ed	[GPU] Introduce USE_PYTORCH_METAL (#46383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46383 The old `USE_METAL` is actually being used by Caffe2. Here we introduce a new macro to enable metal in pytorch. ghstack-source-id: 114499392 Test Plan: - Circle CI - The Person Segmentation model works Reviewed By: linbinyu Differential Revision: D24322018 fbshipit-source-id: 4e5548afba426b49f314366d89b18ba0c7e745ca	2020-10-16 18:19:32 -07:00
Raghavan Raman	fa108bd264	Add flatten loops transformation (#46365 ) Summary: This diff removes the dependency of flattening on tensors by performing flattening on loops instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46365 Reviewed By: ailzhang Differential Revision: D24366347 Pulled By: navahgar fbshipit-source-id: 4ba182f37212b6e4033cae13f8e75bc5144389f4	2020-10-16 17:05:26 -07:00
Tao Xu	5da4a08675	[GPU] Add metal to DispatchKeySet (#46455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46455 After this PR(https://github.com/pytorch/pytorch/pull/46236 ) landed, the `aten::copy_` can no longer be dispatched to Metal kernels. ghstack-source-id: 114499399 Test Plan: - Sandcastle CI - Circle CI Reviewed By: IvanKobzarev, ailzhang Differential Revision: D24356769 fbshipit-source-id: 8660ca5be663fdc8985d9eb710ddaadbb43b0ddd	2020-10-16 16:33:26 -07:00
Ailing Zhang	8c629ecc9a	[WIP] Move catchAll to Math (#45939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45939 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24165890 Pulled By: ailzhang fbshipit-source-id: 72fe71ea95a738251b2fafc9eea4ab3831cf426b	2020-10-16 16:17:16 -07:00
Yi Wang	d1ca7ef33e	[Gradient Compression] Rename the first arg of pybinding of _register_comm_hook: ddp_model -> reducer. (#46498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46498 The name of the first arg "ddp_model" is misleading, because it is actually the reducer of DDP model rather than entire model. This method is called in the file caffe2/torch/nn/parallel/distributed.py: `dist._register_comm_hook(self.reducer, state, hook)` ghstack-source-id: 114531188 (Note: this ignores all push blocking failures!) Test Plan: waitforbuildbot Reviewed By: pritamdamania87 Differential Revision: D24372827 fbshipit-source-id: dacb5a59e87400d93a2f35da43560a591ebc5499	2020-10-16 16:12:42 -07:00
Tristan Rice	0c9787c758	caffe2: use at::mt19937 instead of std::mt19937 (10x speedup) (#43987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43987 This replaces the caffe2 CPU random number (std::mt19937) with at::mt19937 which is the one currently used in pytorch. The ATen RNG is 10x faster than the std one and appears to be more robust given bugs in the std (https://fburl.com/diffusion/uhro7lqb) For large embedding tables (10GB+) we see UniformFillOp taking upwards of 10 minutes as we're bottlenecked on the single threaded RNG. Swapping to at::mt19937 cuts that time to 10% of the current. Test Plan: Ran all relevant tests + CI. This doesn't introduce new features (+ is a core change) so existing tests+CI should be sufficient to catch regressions. Reviewed By: dzhulgakov Differential Revision: D23219710 fbshipit-source-id: bd16ed6415b2933e047bcb283a013d47fb395814	2020-10-16 16:08:35 -07:00
Rong Rong	e6e83bf802	[hotfix] remove test.pypi.org (#46492 ) Summary: fix CI: https://app.circleci.com/pipelines/github/pytorch/pytorch/227894/workflows/67d2ded3-82eb-4a5d-be2c-dfccb8ed9133/jobs/8275321 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46492 Reviewed By: janeyx99 Differential Revision: D24371755 Pulled By: walterddr fbshipit-source-id: ae7e96f22920f85f04acdccc999774510a60cfa9	2020-10-16 16:01:20 -07:00
Zino Benaissa	11cc7f143d	Run __setstate__ when cloning modules (#45858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45858 When cloning a module that has __setstate__, __getstate__ methods. We need to load these methods to initialize these modules. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D24116524 Pulled By: bzinodev fbshipit-source-id: a5111638e2dc903781f6468838c000850d1f9a74	2020-10-16 15:55:31 -07:00
Jane Xu	478fa180ee	Removing hack so that NCCL is not removed in base Docker commands (#46495 ) Summary: This hack was introduced in 2018 and should be able to b removed now. Previously, all Docker images removed NCCL installation to allow some distributed tests to pass: https://github.com/pytorch/pytorch/issues/5877 This should no longer be the case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46495 Reviewed By: malfet Differential Revision: D24372198 Pulled By: janeyx99 fbshipit-source-id: 6285db77734367f0b8dd363bfd6e9f61a0b58084	2020-10-16 15:42:23 -07:00
Rong Rong	89108ba6ea	type check for torch.quantization.stubs (#46475 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42973 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46475 Reviewed By: malfet Differential Revision: D24368088 Pulled By: walterddr fbshipit-source-id: 7a0ccb4fa66b28d4ac59923d727e632351a02b3f	2020-10-16 15:34:23 -07:00
Jane Xu	997e672a27	Move NCCL installation for xenial-cuda10.1 to Docker image instead of for every job (#46476 ) Summary: Instead of installing NCCL for every build and test job with environment `-xenial-cuda10.1-`, install NCCL into the Docker image. This would save time and remove duplication in our scripts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46476 Reviewed By: ailzhang Differential Revision: D24369397 Pulled By: janeyx99 fbshipit-source-id: c107aac9845024d2621eb967ca83c5fc8127a950	2020-10-16 14:17:40 -07:00
Kurt Mohler	28f8372bf4	Avoid mat1 references in mm_mat1_backward (#45777 ) Summary: Avoiding references to `mat1` in `mm_mat1_backward` is a first step to solving issue https://github.com/pytorch/pytorch/issues/42371 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45777 Reviewed By: malfet Differential Revision: D24347967 Pulled By: albanD fbshipit-source-id: f09a8149d9795481b5ed5b48fdd0e598ba027d0b	2020-10-16 13:52:44 -07:00
Tristan Rice	dd169ca17c	caffe2/plan_executor: propagate exceptions from reporter substeps (#46424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46424 Currently if an exception occurs in a reporter thread the process is killed via std::terminate. This adds support for handling the reporter exception if FLAGS_caffe2_handle_executor_threads_exceptions is set to true. Test Plan: buck test mode/opt -c python.package_style=inplace //caffe2/caffe2/python:hypothesis_test //caffe2/caffe2:caffe2_test_cpu -- --stress-runs 100 Reviewed By: dahsh Differential Revision: D24345027 fbshipit-source-id: 0659495c9e27680ebae41fe5a3cf26ce2f455cb3	2020-10-16 12:28:57 -07:00
Zachary DeVito	24ca2763e1	[fx] allow custom behavior for args, kwargs, and bool (#45193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45193 This change makes it possible to subclass the tracer to add additional behavior when you know something about the shape of the Proxy objects, by overriding the defaults for how the tracer tries to make something iterable, looks for keys for **kwargs, or tries to convert to a boolean. An example test shows how this can be used to tag inputs with shapes. It can also be used combined with create_node to do type propagation during tracing to fullfil requests like iter. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D24258993 Pulled By: zdevito fbshipit-source-id: 6ece686bec292e51707bbc7860a1003d0c1321e8	2020-10-16 11:19:12 -07:00
Omkar Salpekar	2e2fe8cf3b	[NCCL] Modularize ncclCommWatchdog (#46051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46051 Creates a subroutine for aborting timed out collectives. This should help modularize the ncclCommWatchdog a bit, since it is growing too large. ghstack-source-id: 114398496 Test Plan: Successful Flow Run: f225037915 f217609101 Reviewed By: jiayisuse Differential Revision: D23607535 fbshipit-source-id: 0b1c9483bcd3a41847fc8c0bf6b22cdba01fb1e6	2020-10-16 11:06:40 -07:00
Richard Barnes	be0c431874	Fix implicit cast in custom_function (#46445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46445 Fix an instance in which a truncated integer prevents downstream type safety checks. Test Plan: I'm not sure what's appropriate here. Reviewed By: albanD Differential Revision: D24339292 fbshipit-source-id: 15748ec64446344ff1a8344005385906d3484d7c	2020-10-16 10:58:02 -07:00
Nikita Vedeneev	9300a27702	Make `torch.lu` support complex input on CUDA. (#45898 ) Summary: As per title. LU decomposition is used for computing determinants, and I need this functionality to implement the matrix square root. Next PR on my list is to enable `torch.det` on CUDA with complex input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45898 Reviewed By: heitorschueroff Differential Revision: D24306951 Pulled By: anjali411 fbshipit-source-id: 168f578fe65ae1b978617a66741aa27e72b2172b	2020-10-16 10:29:39 -07:00
Rohan Varma	5c5484c889	[Flaky tests] Fix test_all_gather_timeout test (#45989 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45989 This test was failing internally for the Thrift-based RPC agent, since it has a different error regex. Use `self.get_timeout_error_regex` which gets the timeout error string for each backend to fix this. ghstack-source-id: 114463458 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D24170394 fbshipit-source-id: 9b30945e3e30f36472268d042173f8175ad88098	2020-10-16 09:02:46 -07:00
Jongsoo Park	c37baa9177	[caffe2] add concat benchmark (#46457 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46457 Wanted to see if using CopyMatrix specialized for float that uses mkl_somatcopy can be faster but it wasn't. Still want to check in benchmark that can be used later. Test Plan: . Reviewed By: dskhudia Differential Revision: D24345901 fbshipit-source-id: d3e68dbb560e3138fda11c55789cd41bc0715c6d	2020-10-16 08:48:42 -07:00
Nikita Shulga	b5702e2350	Fix out-of-bounds access for caching allocator calls (#46439 ) Summary: In assertValidDevice() compare device index to `caching_allocator.device_allocator` rather than to `device_no` Fixes potential crashes when caching allocator is accessed before being initialized, for example by calling something like: `python -c "import torch;print(torch.cuda.memory_stats(0))"` Fixes https://github.com/pytorch/pytorch/issues/46437 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46439 Reviewed By: ngimel Differential Revision: D24350717 Pulled By: malfet fbshipit-source-id: 714e6e74f7c2367a9830b0292478270192f07a7f	2020-10-16 08:24:46 -07:00
Jane Xu	d6e6073016	install lcov in Docker image if coverage is specified (#46404 ) Summary: As a step to enable C++ code coverage in PyTorch, we want to install `lcov` in the Docker image so that lcov is available to use later on in the process. `lcov` is now installed in all non-rocm non-cuda ubuntu images. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46404 Reviewed By: seemethere Differential Revision: D24343450 Pulled By: janeyx99 fbshipit-source-id: 3fdab0c0f78c004488e115505740620417f76646	2020-10-16 08:15:57 -07:00
Xiong Wei	7b788d113e	Fix deprecated warnings for nan_to_num (#46309 ) Summary: Related to https://github.com/pytorch/pytorch/issues/44592 This PR is to fix the deprecated warnings for the nan_to_num function. Below is the warning message when building the latest code. ``` ../aten/src/ATen/native/UnaryOps.cpp: In function ‘at::Tensor& at::native::nan_to_num_out(at::Tensor&, const at::Tensor&, c10::optional<double>, c10::optional<double>, c10::optional<double>)’: ../aten/src/ATen/native/UnaryOps.cpp:397:45: warning: ‘bool c10::isIntegralType(c10::ScalarType)’ is deprecated: isIntegralType is deprecated. Please use the overload with 'includeBool' parameter instead. [-Wdeprecated-declarations] if (c10::isIntegralType(self.scalar_type())) { ``` The deprecated warning is defined in `ScalarType.h`. `d790ec6de0/c10/core/ScalarType.h (L255-L260)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46309 Reviewed By: mrshenli Differential Revision: D24310248 Pulled By: heitorschueroff fbshipit-source-id: 0f9f2ad304eb5a2da9d2b415343f2fc9029037af	2020-10-16 06:07:14 -07:00
Jeff Hwang	ecf63351bc	[mlf][efficiency] modify equalization scale operator to return single output (#46449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46449 modifies `ComputeEqualizationScale` to have a single output `S` Test Plan: ``` buck test caffe2/caffe2/quantization/server:compute_equalization_scale_test ``` plus e2e tests Reviewed By: hx89 Differential Revision: D23946768 fbshipit-source-id: 137c2d7a58bb858db411248606a5784b8066ab23	2020-10-16 01:22:37 -07:00
Mikhail Zolotukhin	4359c5e036	[TensorExpr] Correctly handle negative dimensions in aten::cat when lowering to tensor expressions. (#46446 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46446 Fixes #46440. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D24356016 Pulled By: ZolotukhinM fbshipit-source-id: b759760bb8c765aeb128eb94d18af20cddd888a2	2020-10-16 01:13:14 -07:00
Ailing Zhang	ec5f81f9d3	Remove variable_excluded_from_dispatch() check for factory functions. (#46371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46371 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24324545 Pulled By: ailzhang fbshipit-source-id: 78038054690dff14883df711073be4c2da4e1f8b	2020-10-15 21:15:41 -07:00
Rong Rong	d1745c36dc	fix type check for torch.quantization._numeric_suite (#46330 ) Summary: fix https://github.com/pytorch/pytorch/issues/42977 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46330 Reviewed By: malfet Differential Revision: D24320449 Pulled By: walterddr fbshipit-source-id: f892b5c83cb932aee53245d6c825568b3e05f3c6	2020-10-15 20:45:07 -07:00
Jinwoo Park	92921c82bb	Add named tuple's error message and workaround for RET failure (#46347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46347 Added the named tuple's error messages & workarounds when it returns from a function of a class in Pytorch Mobile. To identify the error cases (returning NamedTuple type), I used the following coditions: 1) ins.op == RET (for returing) 2) type->kind() == TypeKind::TupleType (for pruning non-tuple types) 3) type->cast<TupleType>().name() (for pruning Tuple type) - I could use the type's str (str() or repr_str()) directly, but I used whether it has the "name" attribute. Please give the comment for this. [Information of Tuple and NamedTuple types] 1. Tuple type->str(): (int, int) type->repr_str(): Tuple[int, int] type->kind(): TypeKind::TupleType # different with other types type()->cast<NamedType>(): True type()->cast<NamedType>()>name(): False # different with NamedTuple 2. NamedTuple type->str(): __torch__.myNamedTuple type->repr_str(): __torch__.myNamedTuple type->kind(): TypeKind::TupleType # different with other types type()->cast<NamedType>(): True type->cast<TupleType>().name() = True # different with Tuple (From the next diff, I will handle the other error cases: 1) returning List<module class>, Dict<module class> and 2) accessing Module class's member functions) ghstack-source-id: 114361762 Test Plan: [Added test results] buck test mode/dev caffe2/test:mobile -- 'test_unsupported_return' Summary Pass: 2 ListingSuccess: 1 Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7036874440497926 [Whole test results] buck test mode/dev caffe2/test:mobile -- 'test' Summary Pass: 11 ListingSuccess: 1 Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4503599664074084 Reviewed By: iseeyuan Differential Revision: D24291962 fbshipit-source-id: a1a9e24e41a5f1e067738f59f1eae34d07cba31a	2020-10-15 17:41:06 -07:00
Ailing Zhang	d278e83e69	Update VariableTypeManual.cpp to not use catchAllKernel. (#46353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46353 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24319416 Pulled By: ailzhang fbshipit-source-id: e6ca74919949f757112a35e8fce74bded45dcfde	2020-10-15 17:10:28 -07:00
vfdev-5	dc7cd97402	Fixes bug in sspaddmm (#45113 ) (#45963 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45113 Description: - Fixed bug in sspaddmm by calling contiguous on indices. - Added tests We have to make indices contiguous as we use `indices.data_ptr` in `_to_csr` which assumes row-contiguous storage: `be45c3401a/aten/src/ATen/native/sparse/SparseTensorMath.cpp (L1087-L1090)` > Part 1 of fixing this is probably to document sspaddmm. Part 2 may be to rewrite it using other ops. (https://github.com/pytorch/pytorch/issues/45113#issuecomment-700166809) - Docs will be written here: https://github.com/pytorch/pytorch/pull/45400 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45963 Reviewed By: malfet Differential Revision: D24335599 Pulled By: ngimel fbshipit-source-id: 8278c73a1b4cccc5e22c6f3818dd222588c46b45	2020-10-15 16:50:16 -07:00
Taylor Robie	dda95e6914	More Timer refinement (#46023 ) Summary: This PR just adds more polish to the benchmark utils: 1) `common.py`, `timer.py`, and `valgrind_wrapper/timer_interface.py` are now MyPy strict compliant. (except for three violations due to external deps.) Compare and Fuzzer will be covered in a future PR. 2) `CallgrindStats` now uses `TaskSpec` rather than accepting the individual fields which brings it closer to `Measurement`. 3) Some `__repr__` logic has been moved into `TaskSpec` (which `Measurement` and `CallgrindStats` use in their own `__repr__`s) for a more unified feel and less horrible f-string hacking, and the repr's have been given a cleanup pass. 4) `Tuple[FunctionCount, ...]` has been formalized as the `FunctionCounts` class, which has a much nicer `__repr__` than just the raw tuple, as well as some convenience methods (`__add__`, `__sub__`, `filter`, `transform`) for easier DIY stat exploration. (I find myself using the latter two a lot now.) My personal experience is that manipulating `FunctionCounts` is massively more pleasant than the raw tuples of `FunctionCount`. (Though it's still possible to get at the raw data if you want.) 5) Better support for multi-line `stmt` and `setup`. 6) Compare now also supports rowwise coloring, which is often the more natural layout for A/B testing. 7) Limited support for `globals` in `collect_callgrind`. This should make it easier to benchmark JIT models. (CC ZolotukhinM) 8) More unit tests, including extensive tests for the Callgrind stats manipulation APIs. 9) Mitigate issue with `MKL_THREADING_LAYER` when run in Jupyter. (https://github.com/pytorch/pytorch/issues/37377) Pull Request resolved: https://github.com/pytorch/pytorch/pull/46023 Test Plan: changes should be covered by existing and new unit tests. Reviewed By: navahgar, malfet Differential Revision: D24313911 Pulled By: robieta fbshipit-source-id: 835d4b5cde336fb7ff0adef3c0fd614d64df0f77	2020-10-15 16:32:53 -07:00
Ben Koopman	757173a4da	Add Sigmoid operator from Caffe2 (#46286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46286 commonize fp16 unary operators Reviewed By: hyuen Differential Revision: D24199660 fbshipit-source-id: 99bffa24dc3fa459561a7a2743b1a4dce4be5d58	2020-10-15 16:13:37 -07:00
Hao Lu	16c52d918b	[caffe2] Bypass memonger for in-place ops (#46378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46378 Reviewed By: dzhulgakov Differential Revision: D24236604 fbshipit-source-id: 9f599687467ea969e89243482f8e2a41f7db0a23	2020-10-15 16:03:52 -07:00
David Reiss	faf03bd226	Update default ouput extension in optimize_for_mobile.cc (#45598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45598 .bc is causing issues on Android. Let's switch to .ptl. Test Plan: CI Reviewed By: kimishpatel Differential Revision: D24026180 fbshipit-source-id: 9f252f3652d748bccb19dc61a783d693e171b2c6	2020-10-15 15:34:34 -07:00
Richard Zou	f96cb9de79	vmap: added fallback for in-place operators (#46191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46191 This PR adds a fallback for in-place operators to vmap. We define an in-place operator to be an operator that operators in-place on its first argument and returns the first argument. The "iteration over batch" logic is mostly copied from the out-of-place vmap fallback. I wanted to try to not copy this but the iteration logic is pretty entangled with the rest of the logic; one alternative was to use if/else statements inside batchedTensorForLoopFallback but then there are ~3-4 different sites where we would need that. When in-place operations are not possible ========================================= Sometimes, an in-place operation inside of vmap is not possible. For example, `vmap(Tensor.add_, (None, 0))(torch.rand(3), torch.rand(B0, 3))` is not possible because the tensor being written to in-place has size [3] and the other tensor has size [B0, 3]. We detect if this is the case and error out inside the in-place fallback. Test Plan ========= Added some new tests to `test_vmap.py`. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24335240 Pulled By: zou3519 fbshipit-source-id: 1f60346059040dc226f0aeb80a64d9458208fd3e	2020-10-15 15:21:25 -07:00
Richard Zou	401c85b4d3	Rename createLevelsBitset -> createVmapLevelsBitset; move it (#46190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46190 Moved `createLevelsBitset` to BatchedTensorImpl.h and renamed it to `createVmapLevelsBitset`. I moved it there because I want to use it in another file in a future diff. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24335241 Pulled By: zou3519 fbshipit-source-id: 1c225a00b6da9c69dc7fbc61516bf7257298355c	2020-10-15 15:19:53 -07:00
albanD	849bc77ee4	Add quick fix for view/inplace issue with DDP (#46406 ) Summary: As per title, temporary mitigation for https://github.com/pytorch/pytorch/issues/46242 for which https://github.com/pytorch/pytorch/pull/46296 will be a proper fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46406 Reviewed By: malfet Differential Revision: D24339689 Pulled By: albanD fbshipit-source-id: 0726e5abe4608d8ffcd7846cbaaffbb8564b04ab	2020-10-15 15:13:11 -07:00
Ailing Zhang	ba92920a28	Remove codegen for old RegistrationDeclarations.h (#46370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46370 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24324546 Pulled By: ailzhang fbshipit-source-id: 6dea28c0c7ab5d00f8887735c32304f1b68bf923	2020-10-15 14:02:15 -07:00
Bugra Akyildiz	03c7d5be6b	Add operator benchmark for 4bit/8bit embedding lookups Summary: Add operator benchmark for 4bit/8bit embedding lookups in `aibench`. Test Plan: ``` buck build //caffe2/benchmarks/operator_benchmark/pt:qembedding_bag_lookups_test aibench-cli adhoc -c 'buck run //caffe2/benchmarks/operator_benchmark/pt:qembedding_bag_lookups_test' ```` The run was successful in aibench: https://www.internalfb.com/intern/aibench/details/738300474 https://www.internalfb.com/intern/aibench/details/346463246 Reviewed By: radkris-git Differential Revision: D24268413 fbshipit-source-id: 7fb4ff75da47f8f327edab562c5d29bb69e00b8d	2020-10-15 13:51:32 -07:00
Jane Xu	c99378af1b	Fixing pow for special case between cuda tensors and cpu tensors and reframed test cases a tiny bit (#46320 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46037 I now isolated the special case to be only between cuda tensor bases and cpu tensor exponents. My previous fix was not a complete fix--it fixed some stuff but broke others. The current fix is a more complete fix: ``` In [1]: import torch In [2]: a=torch.randn(3) In [3]: b=torch.tensor(2, device="cuda") In [4]: torch.pow(a,b) #should not work and throws exception now! In [5]: a=torch.tensor(3, device="cuda") In [6]: b=torch.tensor(2) In [7]: torch.pow(a,b) #should work, and now does In [8]: a=torch.randn(3, device="cuda") In [9]: torch.pow(a,b) # yeah, that one is fixed and still works ``` To add a test case to reflect the change, I had to modify the existing setup a little bit. I think it is an improvement but would appreciate any tips on how to make it better! Pull Request resolved: https://github.com/pytorch/pytorch/pull/46320 Reviewed By: malfet Differential Revision: D24306610 Pulled By: janeyx99 fbshipit-source-id: cc74c61373d1adc2892a7a31226f38895b83066a	2020-10-15 13:43:47 -07:00
Muthu Arivoli	7d6d5f4be0	Migrate CPU_tensor_apply to TensorIterator (#44242 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24487 Closes https://github.com/pytorch/pytorch/issues/24488 Closes https://github.com/pytorch/pytorch/issues/24489 Closes https://github.com/pytorch/pytorch/issues/24490 Closes https://github.com/pytorch/pytorch/issues/24491 Closes https://github.com/pytorch/pytorch/issues/24492 Closes https://github.com/pytorch/pytorch/issues/24493 Closes https://github.com/pytorch/pytorch/issues/24494 Closes https://github.com/pytorch/pytorch/issues/24495 Closes https://github.com/pytorch/pytorch/issues/24496 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44242 Reviewed By: mruberry Differential Revision: D24212123 Pulled By: VitalyFedyunin fbshipit-source-id: feca9a0bf3b25be76409e8c83f90e7c5c91dce1a	2020-10-15 13:22:41 -07:00
Linbin Yu	8f61fa653f	use @mode/ndk_libcxx instead of mode/gnustl Summary: as title Test Plan: run ai bench buck run aibench:run_bench -- -b aibench/specifications/models/caffe2/squeezenet/squeezenet.json --remote --devices s9f INFO 2020-10-15 12:24:21 everstore.py: 177: Everstore: Try internal upload INFO 2020-10-15 12:24:21 subprocess_with_logger.py: 56: Running: clowder put --fbtype=24665 /tmp/aibenchj5w4craj/build.sh INFO 2020-10-15 12:24:25 subprocess_with_logger.py: 39: Process Succeeded: clowder put --fbtype=24665 /tmp/aibenchj5w4craj/build.sh INFO 2020-10-15 12:24:25 everstore.py: 185: Uploading /tmp/aibenchj5w4craj/build.sh everstore handle: GIxbhAQoAOdQNqwCAEam42Z-pfQrbllgAAAP url: INFO 2020-10-15 12:24:25 run_remote.py: 182: program: //everstore/GIxbhAQoAOdQNqwCAEam42Z-pfQrbllgAAAP/build.sh INFO 2020-10-15 12:24:25 everstore.py: 177: Everstore: Try internal upload INFO 2020-10-15 12:24:25 subprocess_with_logger.py: 56: Running: clowder put --fbtype=24665 /tmp/aibenchj5w4craj/program INFO 2020-10-15 12:24:32 subprocess_with_logger.py: 39: Process Succeeded: clowder put --fbtype=24665 /tmp/aibenchj5w4craj/program INFO 2020-10-15 12:24:32 everstore.py: 185: Uploading /tmp/aibenchj5w4craj/program everstore handle: GICWmAA66cD0qNMCAJh4vKJzTU8-bllgAAAP url: INFO 2020-10-15 12:24:32 run_remote.py: 182: program: //everstore/GICWmAA66cD0qNMCAJh4vKJzTU8-bllgAAAP/program Scuba => https://fburl.com/scuba/caffe2_benchmarking/w6tvinl0 Job status for SM-G960F-8.0.0-26 (sm-g960f-26,galaxy-s9f,s9f) is changed to CLAIMED Job status for SM-G960F-8.0.0-26 (sm-g960f-26,galaxy-s9f,s9f) is changed to RUNNING Job status for SM-G960F-8.0.0-26 (sm-g960f-26,galaxy-s9f,s9f) is changed to DONE ID:0 NET latency: 82192.9 You can find more info via https://our.intern.facebook.com/intern/aibench/details/915762192256210 Reviewed By: smeenai Differential Revision: D24340103 fbshipit-source-id: cea15bc866361b397b4e1b001e609eb7e9f9aa47	2020-10-15 13:09:26 -07:00
Facebook Community Bot	e69f2f82ab	Automated submodule update: FBGEMM (#46395 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `abc56f644d` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46395 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: malfet Differential Revision: D24333846 fbshipit-source-id: 307bbb7e857cd9e472d03374d3d3941128d807b5	2020-10-15 13:06:26 -07:00
Ivan Yashchuk	c1141b6f68	Added support for complex torch.pinverse (#45819 ) Summary: This PR adds support for complex-valued input for `torch.pinverse`. Fixed cuda SVD implementation to return singular values with real dtype. Fixes https://github.com/pytorch/pytorch/issues/45385. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45819 Reviewed By: heitorschueroff Differential Revision: D24306539 Pulled By: anjali411 fbshipit-source-id: 2fe19bc630de528e0643132689e1bc5ffeaa162a	2020-10-15 12:28:22 -07:00
Xiang Gao	5ce46fbbca	BFloat16 support for torch.sign (#45244 ) Summary: Added BF16 support for torch.sign on CUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/45244 Reviewed By: zou3519 Differential Revision: D23932304 Pulled By: izdeby fbshipit-source-id: e50b9510ecf2337ec0288392d6950046116b2599	2020-10-15 12:23:14 -07:00
Ashkan Aliabadi	8c26111adb	Add fence. (#45148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45148 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23906774 Pulled By: AshkanAliabadi fbshipit-source-id: 93fe923bbd59d6e8bf3f13372217bd998856e8d7	2020-10-15 12:15:03 -07:00
Ashkan Aliabadi	e3eef0cd7a	Add image sampler. (#45037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45037 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23820824 Pulled By: AshkanAliabadi fbshipit-source-id: 2b71f24fb590ad87a963d00a4e380b4d990a11ef	2020-10-15 12:14:59 -07:00
Ashkan Aliabadi	50f833248d	Redo Vulkan command and descriptor pools. (#44496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44496 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23820829 Pulled By: AshkanAliabadi fbshipit-source-id: 3e114a3adcb2df01fb151c0536ce1a2e3f9dfbc1	2020-10-15 12:10:35 -07:00
albanD	1e654a4b7f	Fix error message for scatter reduction (#46397 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/41377 to update the error message to match the removed arguments Pull Request resolved: https://github.com/pytorch/pytorch/pull/46397 Reviewed By: malfet Differential Revision: D24336009 Pulled By: albanD fbshipit-source-id: b9bf2f9ef7fd2ae622c4079384afc93e9c473f47	2020-10-15 11:34:59 -07:00
Ivan Yashchuk	528158af47	Updated derivatives for complex mm, mv, ger, bmm, triangular_solve (#45737 ) Summary: This PR updates derivatives for a few functions so that `gradgradcheck` for `torch.cholesky` is passed ([ref](https://github.com/pytorch/pytorch/pull/45267#discussion_r494439967)). Some tests (that call to `bmm_cuda`) fail with with `RuntimeError: _th_bmm_out not supported on CUDAType for ComplexDouble` until PR https://github.com/pytorch/pytorch/issues/42553 is merged. Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45737 Reviewed By: bdhirsh Differential Revision: D24279917 Pulled By: anjali411 fbshipit-source-id: 7b696d2cfc2ef714332c2e3e5d207e257be67744	2020-10-15 11:27:30 -07:00
Ailing Zhang	7f458e16ba	Allow Undefined to get kernel from Math/DefaultBackend. (#46352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46352 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24319417 Pulled By: ailzhang fbshipit-source-id: de2d7db2cb931b0dcf2fbabd7d292e22cfc5e7b7	2020-10-15 11:17:08 -07:00
Elias Ellison	908c23579d	[JIT] Revert Freezing shared type PR (#46285 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45902 by reverting https://github.com/pytorch/pytorch/pull/42457 The test case introduced by https://github.com/pytorch/pytorch/pull/42457 was fixed by https://github.com/pytorch/pytorch/pull/46250, which I'm assuming is the real source of the bug. In the future it would be good to provide repro's for freezing issues without including a quantization dependency; there was another another issue in freezing (see: https://github.com/pytorch/pytorch/pull/46054) who's root cause was the same quantization issue https://github.com/pytorch/pytorch/pull/46250. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46285 Reviewed By: bdhirsh Differential Revision: D24288739 Pulled By: eellison fbshipit-source-id: b69ee8c713f749cd93d5eba370c3eafed86568bb	2020-10-15 10:57:30 -07:00
Elijah Rippeth	b5479737d7	Add windows JNI support (#44257 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32516 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44257 Reviewed By: malfet Differential Revision: D24332820 Pulled By: ezyang fbshipit-source-id: 1dd97e9c8140129a02a9078623b190b33f30d5b0	2020-10-15 10:48:45 -07:00
Kurt Mohler	bd449334b8	Fix formatting issues in torch.tensor_split documentation (#46328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46328 Reviewed By: heitorschueroff Differential Revision: D24318003 Pulled By: mruberry fbshipit-source-id: 140d391dd927ff3374dd6c4c6e2da7cb67417b31	2020-10-15 10:08:38 -07:00
Gregory Chanan	75809626fb	Stop running clang-tidy on torch/csrc/generic/*.cpp. (#46335 ) Summary: Those files are never directly built, only included in other files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46335 Reviewed By: albanD Differential Revision: D24316737 Pulled By: gchanan fbshipit-source-id: 67bb95e7f4450e3bbd0cd54f15fde9b6ff177479	2020-10-15 08:28:28 -07:00
Hameer Abbasi	e366591dc8	Fix incorrect signatures in get_testing_overrides, and add test for incorrect signatures (#45983 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45494 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45983 Reviewed By: agolynski Differential Revision: D24220048 Pulled By: ezyang fbshipit-source-id: 67826efdb203d849e028467829f7b5ad4559ec67	2020-10-15 07:48:20 -07:00
Sebastian Messmer	2d6fd22e24	Rationalize inlining of kernels into the unboxing wrapper (#42845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42845 - In server builds, always allow the compiler to inline the kernel into the unboxing wrapper, i.e. optimize for perf. - In mobile builds, never inline the kernel into the unboxing wrapper, i.e. optimize for binary size. Note that this only applies for registration API calls where we can actually inline it, i.e. calls with `TORCH_FN` or some of the old API calls. Registrations that give the registration API a runtime function pointer can't inline and won't do so on server either. Note also that in server builds, all we do is allow the compiler to inline. We don't force inlining. ghstack-source-id: 114177591 Test Plan: waitforsandcastle https://www.internalfb.com/intern/fblearner/details/225217260/ Reviewed By: ezyang Differential Revision: D23045772 fbshipit-source-id: f74fd600eaa3f5cfdf0da47ea080801a03db7917	2020-10-15 04:02:51 -07:00
Tao Xu	053c252c66	Update COMPILE_TIME_MAX_DEVICE_TYPES to 12 (#46327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46327 ### Summary Update the COMPILE_TIME_MAX_DEVICE_TYPES to 12 as we landed a new Metal backend. ### Test Plan - Circle CI Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D24309189 Pulled By: xta0 fbshipit-source-id: eec076b7e4fc94bab11840318821aa554447e541	2020-10-15 02:09:17 -07:00
Zeliang Chen	38c97fb6f0	[shape inference] add shape inference support Summary: * To make pruning op compatible with shape inference, we introduced a new quantile argument (as in D23463390) to differentiate dynamic/fixed pruning. * The fixed pruning op has defined output shapes. However, the input shapes are not determined therefore we want to bypass the input shapes checking for two pruning ops, as implemented in this diff. Test Plan: buck test caffe2/caffe2/opt:bound_shape_inference_test ``` Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425102187909 ✓ ListingSuccess: caffe2/caffe2/opt:bound_shape_inference_test - main (1.973) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.FC3D (2.604) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.SparseLengthsSumFused4BitRowwise (2.635) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.FC (2.690) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Int8QuantizeInferInputBackwards (2.705) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.SparseLengthsSum (2.729) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Reshape (2.754) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.ConcatMissingInput (2.770) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.ElementwiseOp (2.770) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Tile (2.785) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Bucketize (2.789) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.SparseLengthsSumFused8BitRowwise (2.807) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.SparseLengthsSum8BitRowwiseSparse (2.841) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Split (2.863) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.ConcatInferInputBackwards (2.894) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.ElementwiseInferInputBackwards (2.898) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Combo0 (2.902) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.LengthsRangeFill (2.964) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Quantization (2.964) Summary Pass: 18 ListingSuccess: 1 Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425102187909 ``` buck test caffe2/caffe2/fb/opt:bound_shape_inference_net_test ``` Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/3096224780078093 ✓ ListingSuccess: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - main (14.092) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.ClipLengths (15.508) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdListFeaturePreProcessing (15.521) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.ClipRanges (16.198) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.RowwisePrune (16.302) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - FbBoundShapeInferencerTest.GatherRanges1 (16.585) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.Combo3 (16.865) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdListFeaturePreProcessingWithCast (16.907) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.GatherRanges2 (16.921) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - FbBoundShapeInferencerTest.LengthsRangeFill (17.157) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.ClipRangesAndGatherRanges (17.277) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdScoreListFeaturePreProcessing (17.274) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.ClipRangesGatherSigridHash (17.554) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.Combo1 (17.645) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdScoreListFeaturePreProcessingDEFAULT (17.887) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdListFeaturePreProcessingDEFAULT (17.929) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.f97293388_0 (19.343) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - FbBoundShapeInferencerTest.GatherRangesToDense1 (19.489) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdScoreListFeaturePreProcessingWithCast (19.887) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.xray_v11 (19.905) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - FbBoundShapeInferencerTest.SigridTransforms (20.080) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.Combo2 (20.086) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.vanillaSparseNN (59.847) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.gather (97.822) Summary Pass: 23 ListingSuccess: 1 ``` ## Workflow testing === * non-DI/fixed quantile/user side/non-self-binning f224250571 * non-DI/fixed quantile/user+ad side/non-self-binning f224250610 * DI/fixed quantile/user side/self-binning f224250637 * DI/fixed quantile/user+ad side/self-binning f224250662 * non-DI/dynamic quantile/user+ad side/non-self-binning f224250705 * DI/dynamic quantile/user+ad side/self-binning f224250760 Reviewed By: ChunliF Differential Revision: D23647390 fbshipit-source-id: 3ec1c0eaea53bd4d5eda4a0436577216f7fa8ead	2020-10-15 00:46:06 -07:00
Xiaomeng Yang	a87a1c1103	Fix perfornance issue of GroupNorm on CUDA when feature map is small. (#46170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46170 Fix perfornance issue of GroupNorm on CUDA when feature map is small. Benchmark script: ``` import torch import torch.nn.functional as F from timeit import Timer norm = torch.nn.GroupNorm(8, 512).cuda() num = 5000 sizes = [(1024, 512, 14, 14), (1024, 512, 7, 7), (1024, 512)] def forward(x): _ = norm(x) torch.cuda.synchronize() def backward(y, grad): y.backward(grad, retain_graph=True) torch.cuda.synchronize() if __name__ == "__main__": # warm up x = torch.rand((sizes[0]), dtype=torch.float, device="cuda", requires_grad=True) for _ in range(100): forward(x) for size in sizes: x = torch.rand(size, dtype=torch.float, device="cuda", requires_grad=True) t = Timer("forward(x)", "from __main__ import forward, x") print(f"size = {size}:") t1 = t.timeit(num) / num * 1e6 print(f"avg_forward_time = {t1}us") y = norm(x) grad = torch.randn_like(y) t = Timer("backward(y, grad)", "from __main__ import backward, y, grad") t2 = t.timeit(num) / num * 1e6 print(f"avg_backward_time = {t2}us") ``` Benchmark result before this Diff: ``` size = (1024, 512, 14, 14): avg_forward_time = 1636.729855206795us avg_backward_time = 5488.682465581223us size = (1024, 512, 7, 7): avg_forward_time = 465.88476160541177us avg_backward_time = 3129.9425506033003us size = (1024, 512): avg_forward_time = 96.90486900508404us avg_backward_time = 2319.4099438143894us ``` Benchmark result after this Diff: ``` size = (1024, 512, 14, 14): avg_forward_time = 1635.6191572034732us avg_backward_time = 4140.7730475999415us size = (1024, 512, 7, 7): avg_forward_time = 463.6513736099005us avg_backward_time = 1641.7451039887965us size = (1024, 512): avg_forward_time = 66.59087920561433us avg_backward_time = 128.6882139975205us ``` Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "GroupNorm" Reviewed By: hl475, houseroad Differential Revision: D24242738 fbshipit-source-id: b52c82d7b6e47855c48fa8ceacd0c55d03bb92d5	2020-10-14 23:34:33 -07:00
Meghan Lele	75bf5f2b59	[JIT] Improve class type annotation inference (#45940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45940 Summary In `try_ann_to_type`, if an annotation has an attribute named `__torch_script_class__`, it is assumed to be a TorchScript class that has already been scripted. However, if it is a class that extends another class, this code path causes a crash because it looks up the JIT type for the class by name in the compilation unit. This JIT type obviously cannot exist because inheritance is not supported. This commit fixes this by looking up the qualified name of a class in torch.jit._state._script_class in order to ascertain whether it has already been scripted (instead of looking for a `__torch_script_class__` attribute on the class object. Test Plan This commit adds a unit test consisting of the code sample from the issue that reported this problem. Fixes This commit fixes #45860. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D24310027 Pulled By: SplitInfinity fbshipit-source-id: 9f8225f3316fd50738d98e3544bf5562b16425b6	2020-10-14 23:28:47 -07:00
Yanan Cao	86abc8cd48	[JIT] Make InsertInstruction overflow check a warning instead of fatal (#46369 ) Summary: This diff restores previous behavior of silently allow overflowing when inserting instructions. The behavior was changed recently in https://github.com/pytorch/pytorch/issues/45382. But it started to break some existing use cases that haver overflow problems. Restoring original behavior but throw a warning to to unblock existing use cases where overflowing happens. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46369 Reviewed By: kwanmacher, wanchaol, fbhuba Differential Revision: D24324345 Pulled By: gmagogsfm fbshipit-source-id: 1c0fac421d4de38f070e21059bbdc1b788575bdf	2020-10-14 23:09:53 -07:00
Ailing Zhang	5393588e11	Add guideline about which dispatch keyword to use in native_functions.yaml. (#46126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46126 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D24233887 Pulled By: ailzhang fbshipit-source-id: 640543494e0d5211f2f910a75fa2e9bdf558f7ce	2020-10-14 22:53:56 -07:00
Kimish Patel	4aaad88790	Bug fixes in profiling allocator (#45993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45993 Some bug exposed via updated test and validation code. Also enabled this test to be run on CI instead of just mobile only test. Test Plan: cpu_profiling_allocator_test Imported from OSS Reviewed By: dzhulgakov Differential Revision: D24172599 fbshipit-source-id: da0d2e1d1dec87b476bf39a1c2a2ffa0e4b5df66	2020-10-14 22:45:04 -07:00
Ailing Zhang	419dafe791	[Reland] Update native_functions.yaml to add DefaultBackend. (#46236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46236 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D24273378 Pulled By: ailzhang fbshipit-source-id: bed1d4c84c0bba88a7da4d9bd2ccaa58253cf91e	2020-10-14 22:37:28 -07:00
Weiyi Zheng	22f4a58a45	[pytorch] activation checkpointing: enable mixing tensor without requires_grad (#45934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45934 https://pytorch.org/docs/stable/checkpoint.html pytorch checkpoint requires all input to the function being checkpointed to requires_grad, but this assumption is not necessarily try. consider the following two examples ``` output = MultiheadedMaskedAtten(input, mask) output = LSTM(input, seq_length) ``` both length and mask are tensors that won't requires grad, currently if you try to checkpoint torch.autograd.backward will complain ``` File "/mnt/xarfuse/uid-124297/7d159c34-seed-nspid4026531836-ns-4026531840/torch/autograd/function.py ", line 87, in apply return self._forward_cls.backward(self, *args) File "/mnt/xarfuse/uid-124297/7d159c34-seed-nspid4026531836-ns-4026531840/torch/utils/checkpoint.py" , line 99, in backward torch.autograd.backward(outputs, args) File "/mnt/xarfuse/uid-124297/7d159c34-seed-nspid4026531836-ns-4026531840/torch/autograd/__init__.py ", line 132, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: element 1 of tensors does not require grad and does not have a grad_fn ``` this diff allows skipping the non-grad-requiring tensor when running autograd.backward. added documentation for this feature as well. Test Plan: added unit test to make sure partial tensor grads can be used in checkpoint(). Differential Revision: D24094764 fbshipit-source-id: 6557e8e74132d5a392526adc7b57b6998609ed12	2020-10-14 21:28:02 -07:00
guol-fnst	103b100ddc	Bazel build has warnings (#46233 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43212 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46233 Reviewed By: bdhirsh Differential Revision: D24278560 Pulled By: ezyang fbshipit-source-id: d7e1c7e97f57f6f0dcf2ff966b795a6d13b07e95	2020-10-14 20:05:34 -07:00
Dhruv Matani	af8c75e211	[PyTorch] Stringize kernel tag names consistently during macro expansion and require all tag names to be a compile time character array (#46074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46074 I found 2 instances where the NAME parameter passed in to the dispatch macros is not a C++ string (constant) (i.e. double quoted compile time string). In one instance it is a single quoted multi-character constant (I don't know what this resolves to in practice), and in the other instance, it is an unquoted identified generated as a result of concatenating 2 identifiers using the `##` operator. In addition, I found 2 instances where the `NAME` of the tag passed in is not a constant character array, but an `std::string` variable instead. I am changing it to a constant character array with the same name as the variable name (earlier). For the purposes of any code using this data, eveything remains the same since the code was string-izing the value anyway using `#NAME` so it would get the name of the variable and not the contents. ghstack-source-id: 113928208 Test Plan: I have a commit (not part of this change set) that attempts to print the `NAME` argument passed in to the various dispatch macros to be able to do some analysis. These weren't expanding correctly for the uses cases that are fixed in this diff. Reviewed By: ezyang Differential Revision: D24211393 fbshipit-source-id: 28953d9f859315b371a60ae34b19671720209c99	2020-10-14 18:13:59 -07:00
HyunJun	a69910868a	Fix possible padding length overflow in DistributedSampler (#45329 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45324 This fix handles cases for `len(dataset) * 2 < num_replica` in DistributedSampler. (which previous code resulted in error.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45329 Reviewed By: mruberry Differential Revision: D24205035 Pulled By: rohan-varma fbshipit-source-id: f94329d9c1e7deaee41e5af319e7c7d0c741910c	2020-10-14 17:19:44 -07:00
Mike Ruberry	ff0af7242b	Revert D24290811: [quant][eagermode] Move custom_module registration to prepare/convert_custom_config_dict Test Plan: revert-hammer Differential Revision: D24290811 (`3ad797c937`) Original commit changeset: 7d2aee98e194 fbshipit-source-id: 24013e92044f2a1b36b1a9f475bbaa6f17bdaa11	2020-10-14 16:42:55 -07:00
Nikita Shulga	a38eeeff5c	Make setup.py python 2 friendly (#46317 ) Summary: import print_function to make setup.py invoked by Python2 print human readable error: ``` % python2 setup.py Python 2 has reached end-of-life and is no longer supported by PyTorch. ``` Also, remove `future` from the list of the PyTorch package install dependencies Pull Request resolved: https://github.com/pytorch/pytorch/pull/46317 Reviewed By: walterddr, bugra Differential Revision: D24305004 Pulled By: malfet fbshipit-source-id: 9181186170562384dd2c0e6a8ff0b1e93508f221	2020-10-14 16:37:06 -07:00
Alexander Golynski	e7e919fc34	Add warning on ProcessGroup and ProcessGroup::Work APIs (#46220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46220 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24294437 Pulled By: gmagogsfm fbshipit-source-id: 198f8e5760beeb1d18740f971647d2537afb3dd6	2020-10-14 16:27:37 -07:00
Zachary DeVito	fc1d6bf135	[fx] make sure args/kwargs are immutable (#46325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46325 Otherwise, mutating them would make the uses/users lists inaccurate. You can still mutate the node by assigning a new value to .args or .kwargs Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D24308672 Pulled By: zdevito fbshipit-source-id: a5305e1d82668b36e46876c3bc517f6f1d03dd78	2020-10-14 15:51:43 -07:00
Aiden Nibali	2bc6caa9e4	Add three-phase option to OneCycleLR (#42715 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40362 The new `three_phase` option provides a way of constructing schedules according to the scheme recommended in [Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates](https://arxiv.org/abs/1708.07120). Note that this change maintains backwards compatibility, and as a result the default behaviour of OneCycleLR remains quite counter-intuitive. vincentqb Pull Request resolved: https://github.com/pytorch/pytorch/pull/42715 Reviewed By: heitorschueroff Differential Revision: D24289744 Pulled By: vincentqb fbshipit-source-id: e4aad87880716bb14613c0aa8631e43b04a93e5c	2020-10-14 15:05:14 -07:00
Zafar	635aebdfab	[quant] Refactoring the mappings files (#44847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44847 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23747007 Pulled By: z-a-f fbshipit-source-id: 7d8fcc84a77454cc1479e5158f5a62eda5824a87	2020-10-14 13:15:34 -07:00
BowenBao	b28b5d3c68	[ONNX] Update squeeze test for opset 9 (#45369 ) Summary: Only under static axes does opset 9 supports no-op squeeze when dim is not 1. Updating the test case where it was setting dynamic axes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45369 Reviewed By: anjali411 Differential Revision: D24280180 Pulled By: bzinodev fbshipit-source-id: d7cda88ab338a1c41a68052831dcebe739a3843c	2020-10-14 12:53:13 -07:00
Ksenija Stanojevic	6ca03aeb96	[ONNX] Fix flatten operator (#45632 ) Summary: Even when dim is None, there are cases when flatten can be exported. Also enable test_densenet in scripting mode Pull Request resolved: https://github.com/pytorch/pytorch/pull/45632 Reviewed By: VitalyFedyunin Differential Revision: D24116994 Pulled By: bzinodev fbshipit-source-id: 76da6c073ddf79bba64397fd56b592de850034c4	2020-10-14 12:44:25 -07:00
Omkar Salpekar	d655341adb	[Distributed] General Function for Parsing Environment Variable Flags in PG (#46045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46045 PG NCCL functionality differs based on certain binary environment variables such as NCCL_BLOCKING_WAIT and NCCL_ASYNC_ERROR_HANDLING. Previously we had separate helper function to parse these env vars and set class variables accordingly. This PR introduces a general purpose function for this purpose. ghstack-source-id: 114209823 Test Plan: Ran the following flow with NCCL_BLOCKING_WAIT set, and ensured the ProcessGroup constructor set blcokingWait_ to true: f223454701 Reviewed By: jiayisuse Differential Revision: D24173982 fbshipit-source-id: b84db2dda29fcf5d163ce8860e8499d5070f8818	2020-10-14 12:21:11 -07:00
Jerry Zhang	3ad797c937	[quant][eagermode] Move custom_module registration to prepare/convert_custom_config_dict (#46293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46293 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D24290811 fbshipit-source-id: 7d2aee98e1946c2a4268efb94443f1e5daaa793e	2020-10-14 12:10:37 -07:00
Omkar Salpekar	2ffb768607	[Distributed] deleteKey support for HashStore (#46049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46049 Adding support for the deleteKey API in the c10d HashStore. ghstack-source-id: 113874207 Test Plan: Added C++ tests to check whether deleteKey function works, and whether it returns an exception for attempting to delete non-existing keys. Reviewed By: jiayisuse Differential Revision: D24067657 fbshipit-source-id: 4c58dab407c6ffe209585ca91aa430850261b29e	2020-10-14 12:04:42 -07:00
Omkar Salpekar	74f13a8b8f	[Distributed] Adding getNumKeys support to the HashStore (#46048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46048 This PR adds support for the getNumKeys API for the HashStore ghstack-source-id: 113874241 Test Plan: Added C++ tests for the HashStore::getNumKeys Reviewed By: jiayisuse Differential Revision: D24067658 fbshipit-source-id: 2db70a90f0ab8ddf0ff03cedda59b45ec987af07	2020-10-14 12:01:22 -07:00
ashish	5500b62f28	Enable zero batch conv tests for ROCm (#46305 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26669 This PR enables convolution tests for zero batch size implemented in https://github.com/pytorch/pytorch/pull/26214/. jamesr66a jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46305 Reviewed By: navahgar Differential Revision: D24307981 Pulled By: heitorschueroff fbshipit-source-id: dfc595fa855ae084b60a693e209b0fdcc714221d	2020-10-14 11:36:30 -07:00
Daily, Jeff	dec61f93f2	[ROCm] update GPG key URL in circleci Dockerfile (#46256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46256 Reviewed By: mrshenli Differential Revision: D24308563 Pulled By: heitorschueroff fbshipit-source-id: 33ef6e5490bdd59e14db4851c03f6df6ce227358	2020-10-14 11:29:55 -07:00
Jerry Zhang	53316e8b97	[quant] Remove prehook option (#46292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46292 since it is not needed Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D24290815 fbshipit-source-id: 5cc24a305dbdfee5de3419dc83a9c3794d949300	2020-10-14 11:08:38 -07:00
shubhambhokare1	9d389b1dcc	[ONNX] Preprocess index_put with bool inputs to masked_scatter/masked_fill (#45584 ) Summary: When the input to an indexing operation is a boolean, for example array[True] = value, the subsequent index_put node formed needs to be converted to masked_scatter/masked_fill node based on the type of val the indexing node is equated. If that value is just a single scalar, then we use the masked_fill functionality and if value is a tensor of appropriate size, we use the masked_scatter functionality. Fixes https://github.com/pytorch/pytorch/issues/34054 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45584 Reviewed By: VitalyFedyunin Differential Revision: D24116921 Pulled By: bzinodev fbshipit-source-id: ebd66e06d62e15f0d49c8191d9997f55edfa520e	2020-10-14 10:58:55 -07:00
Jerry Zhang	49903a5cd5	[quant][graphmode][fx] Move custom_module_class config to prepare/convert_custom_config_dict (#46251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46251 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D24290810 fbshipit-source-id: 7a96f04a0f33f0315943ac18ef2d08e4f5a5d1c0	2020-10-14 10:43:48 -07:00
senius	e7dbaa252e	Update optim.rst for better understanding (#45944 ) Summary: The `i` variable in `Line 272` may cause ambiguity in understanding. I think it should be named as `epoch` variable. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45944 Reviewed By: agolynski Differential Revision: D24219486 Pulled By: vincentqb fbshipit-source-id: 2af0408594613e82a1a1b63971650cabde2b576e	2020-10-14 09:36:06 -07:00
Brian Hirsh	1f791c06f0	adding BAND/BOR/BXOR reduce ops to unsupported list for complex numbers. added tests (#46270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46270 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24284702 Pulled By: bdhirsh fbshipit-source-id: 7e6c3fce83a4367808a638f0400999399b2c35b0	2020-10-14 08:48:14 -07:00
Iurii Zdebskyi	8a074af929	Added scalar lists APIs for addcdiv and addcmul (#45932 ) Summary: 1) Added new APIs: _foreach_addcdiv(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars) _foreach_addcdiv_(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars) _foreach_addcmul(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars) _foreach_addcmul_(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars) 2) Updated optimizers to use new APIs Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/45932 Reviewed By: navahgar Differential Revision: D24150306 Pulled By: izdeby fbshipit-source-id: c2e65dedc95d9d81a2fdd116e41df0accb0b6f26	2020-10-14 08:12:37 -07:00
Alexander Grund	f2e5ae4ba2	Undefine bool and vector after including altivec.h (#46179 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46179 Reviewed By: bdhirsh Differential Revision: D24258470 Pulled By: glaringlee fbshipit-source-id: f9d3589a30ed396cb88404d3471788aed8dea237	2020-10-14 07:52:51 -07:00
Nikita Shulga	45de2ee3ac	Remove Python version upper boundary check (#46315 ) Summary: This prevents setup.py from erroring out when Python-3.9 is used Fixes https://github.com/pytorch/pytorch/issues/46314 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46315 Reviewed By: heitorschueroff Differential Revision: D24304846 Pulled By: malfet fbshipit-source-id: 573a88ea8c1572d7d8a9991539effb3c228bffc9	2020-10-14 07:36:55 -07:00
Sebastian Messmer	69e152e60b	Fix device guard for c10-full ops (#46091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46091 ghstack-source-id: 114269274 Test Plan: vs prev diff: https://www.internalfb.com/intern/fblearner/details/224487971/ vs D23328718 (`6ba6ecb048`) : https://www.internalfb.com/intern/fblearner/details/224488043/ Reviewed By: ezyang Differential Revision: D24219943 fbshipit-source-id: bbabafb5c5b76ce0e93df4fdae2f08221354d9f7	2020-10-14 06:32:43 -07:00
Sebastian Messmer	4534bf5799	Fix NativeFunctions.h for c10-full ops (#46090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46090 ghstack-source-id: 114269272 Test Plan: vs base diff: https://www.internalfb.com/intern/fblearner/details/223884639/ Reviewed By: ezyang Differential Revision: D24219942 fbshipit-source-id: 6f338c7c0dd5adfe2fba8b36ccc340032d3faef8	2020-10-14 06:32:36 -07:00
Nikita Shulga	84771fc64f	[caffe2] Add 10s deadline for all Caffe2 hypothesis fuzz tests Test Plan: CI Reviewed By: walterddr Differential Revision: D24298118 fbshipit-source-id: 2286c1e37ed9c43f404b888386c0bd4b0b6a55c6	2020-10-14 06:30:09 -07:00
Wang Xu	62d37b9f26	add size_based_partition final (#46282 ) Summary: Reopen the PR: https://github.com/pytorch/pytorch/pull/45837 This PR add a new feature for Partitioner() class called size_based_partition. Given a list of devices with the same memory size, this function could distribute graph nodes into different devices. To implement this feature, several help functions are created in Partitioner.py and GraphManipulation.py. An unit test is also added in test/test_fx_experimental.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/46282 Reviewed By: gcatron Differential Revision: D24288470 Pulled By: scottxu0730 fbshipit-source-id: e81b1e0c56e34f61e497d868882126216eba7538	2020-10-14 03:44:05 -07:00
Yuxin Wu	b64cf93f05	[jit] support tracing tensor __setitem__ with dynamic shape (#45828 ) Summary: fix (partially) https://github.com/pytorch/pytorch/issues/43548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45828 Test Plan: buck test mode/dev-nosan //caffe2/test:jit -- 'test_trace_slice' --jobs 1 Reviewed By: bdhirsh Differential Revision: D24106641 Pulled By: ppwwyyxx fbshipit-source-id: 8036c9819c9816e040796dac8f9c98bd33ce80a8	2020-10-14 02:52:57 -07:00
Mike Ruberry	38e64cf949	Revert D24232288: [fx] make sure args/kwargs are immutable Test Plan: revert-hammer Differential Revision: D24232288 (`61df99b78e`) Original commit changeset: c95b1a73ae55 fbshipit-source-id: b910a6618f76ef64caead20e8207997317bc2f5e	2020-10-14 01:39:33 -07:00
Mikhail Zolotukhin	d790ec6de0	[JIT] Update comment in jit_log.h. (#46301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46301 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D24295281 Pulled By: ZolotukhinM fbshipit-source-id: a4f84c773029845065895a81f9d753a9c82a99e0	2020-10-13 23:42:28 -07:00
Basil Hosmer	d22455128f	[dispatcher] avoid autograd fixup step on non-backend keys (#46135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46135 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D24235974 Pulled By: bhosmer fbshipit-source-id: 21215b31146673caae904bb82395858419641633	2020-10-13 23:33:15 -07:00
Zachary DeVito	61df99b78e	[fx] make sure args/kwargs are immutable (#46121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46121 Otherwise, mutating them would make the uses/users lists inaccurate. You can still mutate the node by assigning a new value to .args or .kwargs Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D24232288 Pulled By: zdevito fbshipit-source-id: c95b1a73ae55ad9bdb922ca960c8f744ff732100	2020-10-13 21:33:19 -07:00
Rohan Varma	965046c445	[NCCL] Provide additional information about NCCL error codes. (#45950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45950 A pain point for debugging failed training jobs due to NCCL errors has been understanding the source of the error, since NCCL does not itself report too many details (usually just "unhandled {system, cuda, internal} error"). In this PR, we add some basic debug information about what went wrong. The information is collected by grepping the NCCL codebase for when these errors are thrown. For example, `ncclSystemError` is what is thrown when system calls such as malloc or munmap fail. Tested by forcing `result = ncclSystemError` in the macro. The new error message looks like: ```RuntimeError: NCCL error in: caffe2/torch/lib/c10d/ProcessGroupNCCL.cpp:759, unhandled system error, NCCL version 2.7.3 ncclSystemError: System call (socket, malloc, munmap, etc) failed. ``` The last line is what we have added to the message. In the future, we will also evaluate setting NCCL_DEBUG=WARN, by which NCCL provides more details about errors sa well. ghstack-source-id: 114219288 Test Plan: CI Reviewed By: mingzhe09088 Differential Revision: D24155894 fbshipit-source-id: 10810ddf94d6f8cd4989ddb3436ddc702533e1e1	2020-10-13 21:18:20 -07:00
Rohan Varma	f7398759b4	Only populate grad accumulator to var mapping for find_unused_parameters=True in DDP (#45942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45942 We only need to keep track of this for traversing the autograd graph when find_unused_parameters=True. Without that, we populate and keep this mapping in memory, which occupies sizeof(pointer) * number of grad accumulators of extra memory. ghstack-source-id: 114219289 Test Plan: CI Reviewed By: mrshenli Differential Revision: D24154407 fbshipit-source-id: 220d723e262f36590a03a3fd2dab47cbfdb87d40	2020-10-13 21:12:59 -07:00
Radhakrishnan Venkataramani	31bcd96395	Parallelize the quantization conversion operators (#45536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45536 Quantization conversion/reverse conversion operators will be used in critical serving path. The operators can make use of aten::parallel to parallelize the rowwise quantization of tensors. Overall, i see 20-25% improvement with the parallelization optimization added here. The following result is from running benchmark on my `devvm`. Requested a dedicated machine and will post benchmark results again. Easier view to compare results https://our.intern.facebook.com/intern/diffing/?paste_number=143973933 Baseline results are based on D23675777 (`677a59dcaa`) ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: qembeddingbag_byte_prepack # Mode: Eager # Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 10.782 # Benchmarking PyTorch: qembeddingbag_byte_prepack # Mode: Eager # Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 17.443 # Benchmarking PyTorch: qembeddingbag_byte_prepack # Mode: Eager # Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 25.898 # Benchmarking PyTorch: qembeddingbag_4bit_prepack # Mode: Eager # Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 13.903 # Benchmarking PyTorch: qembeddingbag_4bit_prepack # Mode: Eager # Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 18.575 # Benchmarking PyTorch: qembeddingbag_4bit_prepack # Mode: Eager # Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 30.650 # Benchmarking PyTorch: qembeddingbag_2bit_prepack # Mode: Eager # Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 14.158 # Benchmarking PyTorch: qembeddingbag_2bit_prepack # Mode: Eager # Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 19.818 # Benchmarking PyTorch: qembeddingbag_2bit_prepack # Mode: Eager # Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 30.852 # Benchmarking PyTorch: qembeddingbag_byte_unpack # Mode: Eager # Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 47.596 # Benchmarking PyTorch: qembeddingbag_byte_unpack # Mode: Eager # Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 91.025 # Benchmarking PyTorch: qembeddingbag_byte_unpack # Mode: Eager # Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 131.425 # Benchmarking PyTorch: qembeddingbag_4bit_unpack # Mode: Eager # Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 12.637 # Benchmarking PyTorch: qembeddingbag_4bit_unpack # Mode: Eager # Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 20.856 # Benchmarking PyTorch: qembeddingbag_4bit_unpack # Mode: Eager # Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 33.944 # Benchmarking PyTorch: qembeddingbag_2bit_unpack # Mode: Eager # Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 21.181 # Benchmarking PyTorch: qembeddingbag_2bit_unpack # Mode: Eager # Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 34.213 # Benchmarking PyTorch: qembeddingbag_2bit_unpack # Mode: Eager # Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 59.622 ``` Results with the parallelization ``` # Benchmarking PyTorch: qembeddingbag_byte_prepack # Mode: Eager # Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 8.852 # Benchmarking PyTorch: qembeddingbag_byte_prepack # Mode: Eager # Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 13.594 # Benchmarking PyTorch: qembeddingbag_byte_prepack # Mode: Eager # Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 20.120 # Benchmarking PyTorch: qembeddingbag_4bit_prepack # Mode: Eager # Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 12.049 # Benchmarking PyTorch: qembeddingbag_4bit_prepack # Mode: Eager # Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 20.710 # Benchmarking PyTorch: qembeddingbag_4bit_prepack # Mode: Eager # Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 23.320 # Benchmarking PyTorch: qembeddingbag_2bit_prepack # Mode: Eager # Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 11.998 # Benchmarking PyTorch: qembeddingbag_2bit_prepack # Mode: Eager # Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 15.972 # Benchmarking PyTorch: qembeddingbag_2bit_prepack # Mode: Eager # Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 23.619 # Benchmarking PyTorch: qembeddingbag_byte_unpack # Mode: Eager # Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 30.764 # Benchmarking PyTorch: qembeddingbag_byte_unpack # Mode: Eager # Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 50.969 # Benchmarking PyTorch: qembeddingbag_byte_unpack # Mode: Eager # Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 129.960 # Benchmarking PyTorch: qembeddingbag_4bit_unpack # Mode: Eager # Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 10.797 # Benchmarking PyTorch: qembeddingbag_4bit_unpack # Mode: Eager # Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 15.767 # Benchmarking PyTorch: qembeddingbag_4bit_unpack # Mode: Eager # Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 27.032 # Benchmarking PyTorch: qembeddingbag_2bit_unpack # Mode: Eager # Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 16.521 # Benchmarking PyTorch: qembeddingbag_2bit_unpack # Mode: Eager # Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 26.050 # Benchmarking PyTorch: qembeddingbag_2bit_unpack # Mode: Eager # Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 45.231 ``` Test Plan: 1. buck test //caffe2/test:quantization -- 'test_embedding_bag*' --print-passing-details 2. Ran benchmarks with ```buck build mode/opt caffe2/benchmarks/operator_benchmark/pt:qembedding_pack_test; ./buck-out/gen/caffe2/benchmarks/operator_benchmark/pt/qembedding_pack_test.par``` Reviewed By: qizzzh Differential Revision: D24002456 fbshipit-source-id: 23b9b071b2ce944704b2582be40d0aaaaeceb298	2020-10-13 20:46:58 -07:00
ashishfarmer	d5ca53c955	Performance fix for torch.cat operator on ROCm (#46097 ) Summary: This pull request is a partial revert of https://github.com/pytorch/pytorch/pull/44833 for ROCm to fix the performance of the concatenate operator. The changes only affect execution on ROCm and are guarded by the define `__HIP_PLATFORM_HCC__` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46097 Test Plan: Benchmark `python -m pt.cat_test --tag_filter all --device cuda` Results on ROCm before the PR: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1,1,1)_N2_dim0_cuda # Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 10828.314 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 11888.028 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(128,1024,2)_N2_dim1_cuda # Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 11898.945 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim0_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 11787.744 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1025,1023,2)_N2_dim1_cuda # Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 11792.479 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim2_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 11769.718 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f989e5c2510>,111,65]_N5_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f989e5c2510>, 111, 65], N: 5, dim: 0, device: cuda Forward Execution Time (us) : 11633.882 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[96,<function<lambda>at0x7f989e5c2620>,64]_N5_dim1_cuda # Input: sizes: [96, <function <lambda> at 0x7f989e5c2620>, 64], N: 5, dim: 1, device: cuda Forward Execution Time (us) : 11617.768 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[128,64,<function<lambda>at0x7f96eee4df28>]_N5_dim2_cuda # Input: sizes: [128, 64, <function <lambda> at 0x7f96eee4df28>], N: 5, dim: 2, device: cuda Forward Execution Time (us) : 11625.143 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f96ef874048>,32,64]_N50_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f96ef874048>, 32, 64], N: 50, dim: 0, device: cuda Forward Execution Time (us) : 13079.204 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[32,<function<lambda>at0x7f96ef8740d0>,64]_N50_dim1_cuda # Input: sizes: [32, <function <lambda> at 0x7f96ef8740d0>, 64], N: 50, dim: 1, device: cuda Forward Execution Time (us) : 13095.620 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[33,65,<function<lambda>at0x7f96ef874158>]_N50_dim2_cuda # Input: sizes: [33, 65, <function <lambda> at 0x7f96ef874158>], N: 50, dim: 2, device: cuda Forward Execution Time (us) : 13403.086 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda # Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 118.704 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda # Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda Forward Execution Time (us) : 263.273 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda # Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda Forward Execution Time (us) : 463.024 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f96ef8741e0>]_N100_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f96ef8741e0>], N: 100, dim: 0, device: cuda Forward Execution Time (us) : 23818.032 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f96ef874268>]_N1000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f96ef874268>], N: 1000, dim: 0, device: cuda Forward Execution Time (us) : 234778.296 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f96ef8742f0>]_N2000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f96ef8742f0>], N: 2000, dim: 0, device: cuda Forward Execution Time (us) : 470288.132 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f96ef874378>]_N3000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f96ef874378>], N: 3000, dim: 0, device: cuda Forward Execution Time (us) : 704361.221 ``` Results on ROCm after the PR: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1,1,1)_N2_dim0_cuda # Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 29.292 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 46.320 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(128,1024,2)_N2_dim1_cuda # Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 36.969 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim0_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 92.816 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1025,1023,2)_N2_dim1_cuda # Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 93.943 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim2_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 163.914 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1da3186510>,111,65]_N5_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1da3186510>, 111, 65], N: 5, dim: 0, device: cuda Forward Execution Time (us) : 75.475 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[96,<function<lambda>at0x7f1da3186620>,64]_N5_dim1_cuda # Input: sizes: [96, <function <lambda> at 0x7f1da3186620>, 64], N: 5, dim: 1, device: cuda Forward Execution Time (us) : 68.880 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[128,64,<function<lambda>at0x7f1bf3c50f28>]_N5_dim2_cuda # Input: sizes: [128, 64, <function <lambda> at 0x7f1bf3c50f28>], N: 5, dim: 2, device: cuda Forward Execution Time (us) : 85.268 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bf4669048>,32,64]_N50_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bf4669048>, 32, 64], N: 50, dim: 0, device: cuda Forward Execution Time (us) : 111.543 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[32,<function<lambda>at0x7f1bf46690d0>,64]_N50_dim1_cuda # Input: sizes: [32, <function <lambda> at 0x7f1bf46690d0>, 64], N: 50, dim: 1, device: cuda Forward Execution Time (us) : 110.644 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[33,65,<function<lambda>at0x7f1bf4669158>]_N50_dim2_cuda # Input: sizes: [33, 65, <function <lambda> at 0x7f1bf4669158>], N: 50, dim: 2, device: cuda Forward Execution Time (us) : 116.201 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda # Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 117.708 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda # Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda Forward Execution Time (us) : 264.953 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda # Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda Forward Execution Time (us) : 480.304 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bf46691e0>]_N100_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bf46691e0>], N: 100, dim: 0, device: cuda Forward Execution Time (us) : 116.385 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bf4669268>]_N1000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bf4669268>], N: 1000, dim: 0, device: cuda Forward Execution Time (us) : 913.591 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bf46692f0>]_N2000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bf46692f0>], N: 2000, dim: 0, device: cuda Forward Execution Time (us) : 2003.212 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bf4669378>]_N3000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bf4669378>], N: 3000, dim: 0, device: cuda Forward Execution Time (us) : 3004.174 ``` Reviewed By: bdhirsh Differential Revision: D24286324 Pulled By: malfet fbshipit-source-id: 291f3f3f80f9d2f9ba52a455a942f3fb0406e7d2	2020-10-13 19:22:35 -07:00
James Reed	09842a44fa	[FX] Allow tracing free functions (#46268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46268 Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D24283019 Pulled By: jamesr66a fbshipit-source-id: 938322e13a16386ac931a666f4eecfc4d9c68a5a	2020-10-13 19:18:04 -07:00
olegfaust	ac3f23deb0	Fixed usage of std::move function (#46199 ) Summary: Removed std::move in situations when move wasn't really possible (therefore std::move didn't move anything but created copy instead). Pull Request resolved: https://github.com/pytorch/pytorch/pull/46199 Reviewed By: bdhirsh Differential Revision: D24287408 Pulled By: glaringlee fbshipit-source-id: f88b9500e7bbaa709bff62b845966e2adc7fa588	2020-10-13 19:13:30 -07:00
Martin Yuan	173363f31a	Use tensor's quantized properties directly in pickler (#46267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46267 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D24283008 Pulled By: iseeyuan fbshipit-source-id: 76c8410d428a5fc487381e65a9f3a789a9f04eb0	2020-10-13 19:05:52 -07:00
Nikita Shulga	1fcec6e72b	[caffe2] Add operator schema for FP16SparseNorm (#46300 ) Summary: Fixes regression introduced by https://github.com/pytorch/pytorch/pull/45551 Also Fix signed-unsigned comparison warnings in test/cpp/tensorexpr/test_train_impl.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/46300 Reviewed By: walterddr Differential Revision: D24294821 Pulled By: malfet fbshipit-source-id: 16bffa71ec0d2d38208855223a3c5efb18414ab5	2020-10-13 18:58:23 -07:00
Pritam Damania	f89498f3f8	Allow RPC framework to use rank in addition to WorkerInfo and name. (#46221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46221 The RPC framework only allowed sending RPCs based on provided WorkerInfo or name. When using RPC with DDP, sometimes it might just be easier to refer to everything in terms of ranks since DDP doesn't support names yet. As a result, support a `to` parameter in the RPC APIs which allow for specifying a rank as well would be helpful. ghstack-source-id: 114207172 Test Plan: 1) waitforbuildbot 2) Unit Tests Reviewed By: mrshenli Differential Revision: D24264989 fbshipit-source-id: 5edf5d92e2bd2f213471dfe7c74eebfa9efc9f70	2020-10-13 17:52:54 -07:00
Yi Wang	e1c9aa918a	Reformat ivalue_inl.h and ivalue.h (#46174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46174 Want to separate the real changes in this file from noisy reformatting changes, so check in this reformatting first. Test Plan: N/A Reviewed By: pritamdamania87 Differential Revision: D24246841 fbshipit-source-id: 50bb671b0a2feab38acaa4fc171608e379fc92e9	2020-10-13 16:31:54 -07:00
Omkar Salpekar	952dc7ed87	[NCCL] Fix Hang in Async Error Handling due to Work logging (#46265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46265 tl;dr - we must remove tensor-related logging from the WorkNCCL::operator<< function, otherwise printing the work objects tracked in the workMetaList_ will cause segfaults. The Work objects we track in the workMetaList for the NCCL Async Error Handling mechanism don't have any `outputs_`. As described in the workEnqueue function, destructing the output tensors calls into autograd_meta, which happens in the user thread, but our system destructs work objects in the workCleanupThread, so this could lead to a deadlock scenario. We avoid this problem by not tracking the tensors in the work objects in the workMetaList (it's called work meta list because these work objects only track the metadata and not the actual tensors), so when the WorkNCCL::operator<< function tried to log tensor shapes for work objects from the watchdog thread, the async error handling mechanism hanged (in the desync test) or segfaulted (in the desync flow). This PR removes the tensor-related logging from the operator<< function. ghstack-source-id: 114192929 Test Plan: Verified that this fixes the desync test and desync flow. Reviewed By: jiayisuse Differential Revision: D24268204 fbshipit-source-id: 20ccb8800aa3d71a48bfa3cbb65e07ead42cd0dc	2020-10-13 16:23:56 -07:00
Michael Ranieri	b1d24dded1	make a way to disable callgrind (#46116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46116 Ideally I would just use one of the existing preprocessor flags such as `FBCODE_CAFFE2`, but this implies a whole bunch of other things elsewhere, so it is not really a solution for ovrsource. Test Plan: CI green, we are able to disable it internally with `-DNVALGRIND` Reviewed By: malfet Differential Revision: D24227360 fbshipit-source-id: 24a3b393cf46d6a16acca0a9ec52610d4bb8704f	2020-10-13 16:18:04 -07:00
Dhruv Matani	6ef41953e6	[RFC] Generate generated_unboxing_wrappers_everything.cpp for unboxing wrappers codegen to aid debugging (#45872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45872 `VariableType_N.cpp` is generated in a sharded manner to speed up compilationt time. Same for `generated_unboxing_wrappers_N.cpp`. However, `VariableTypeEverything.cpp` exists, but `generated_unboxing_wrappers_everything.cpp` does not. These files have all the registration/implementation code in them for easier debugging of codegen logic. This diff adds `generated_unboxing_wrappers_everything.cpp`. ghstack-source-id: 113606771 Test Plan: Build + CI Reviewed By: iseeyuan Differential Revision: D24124405 fbshipit-source-id: 1f6c938105e17cd4b14502978483a1b178c777dd	2020-10-13 15:44:09 -07:00
Jianyu Huang	5c67cc7a9e	[caffe2] Enable fp16 for SparseNormalize op (#45551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45551 The FP16 version of SparseNormalize op in Caffe2 is missing. This Diff adds FP16 support to unblock MC process of adding FP16 to Dper3. Check https://fb.quip.com/L0T2AXGwUY3n#EReACAeifk3 . One question is whether the pure FP16 Sparse Normalized op will affect the accuracy? Maybe we should do it in FP32 domain. ghstack-source-id: 114184398 Test Plan: ``` buck run mode/opt //caffe2/caffe2/python/operator_test:sparse_normalize_test ``` ``` buck run mode/opt -c python.package_style=inplace mode/no-gpu //caffe2/caffe2/python/benchmarks:sparse_normalize_benchmark -- --fp16 ``` Reviewed By: jspark1105 Differential Revision: D24005618 fbshipit-source-id: 8b918ec4063fdaafa444779b95206ba2b7b38537	2020-10-13 15:35:22 -07:00
Edward Yang	2118d58d45	Add some more docs to expecttest. (#46263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46263 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: robieta Differential Revision: D24281640 Pulled By: ezyang fbshipit-source-id: 88c5b3bf091f47b69ce58aa321669158c5afda79	2020-10-13 15:17:11 -07:00
Allan Di Wu	1c3e335c4b	[pytorch][glow][NNPI] Using int32 as indices for embedding_bag operators (#45878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45878 Support int32 as indices and offsets for embedding_bag_byte\|4bit_rowwise_offsets, to avoid costly casting operators such as `aten::to`. Currently we don't make the assumption that indices and offsets should be the same type, which should not be a problem since downstream fbgemm supports either cases. Test Plan: ``` buck test mode/dev caffe2/test:quantization -- --stress-runs 100 test_embedding_bag ``` Reviewed By: radkris-git Differential Revision: D23854367 fbshipit-source-id: 6758a4252b36a7fe2890f37d38d66f20651e850e	2020-10-13 15:08:39 -07:00
Ailing Zhang	a37f2749cd	Avoid computing AutogradKey if not needed. (#46252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46252 Test Plan: CI Reviewed By: ngimel Differential Revision: D24272744 fbshipit-source-id: 6cb66d13e6c910df1ad1a8badd43f990e7b55368	2020-10-13 15:01:55 -07:00
anjali411	ac245f6b45	Complex autograd doc fix (#46258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46258 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D24286512 Pulled By: anjali411 fbshipit-source-id: 60bc98d69336101c0d8fe5ab542b9757b5e7faac	2020-10-13 14:36:50 -07:00
Jerry Zhang	67a0c0af27	[quant][fx][graphmode] Add prepare_custom_config_dict and convert_custom_config_dict (#46223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46223 Also move standalone module config to the prepare_custom_config_dict Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24266900 fbshipit-source-id: fe3ff5b8c657af3f377041e7881d400938e044f8	2020-10-13 14:19:49 -07:00
Facebook Community Bot	dac680721c	Automated submodule update: FBGEMM (#46271 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `a570f94657` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46271 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D24285369 fbshipit-source-id: a5928251ec8386891d31d2f88193aa97e4ad715f	2020-10-13 13:15:40 -07:00
Supriya Rao	95ccf34fb9	[quant][graph][fix] Set type for GetAttr nodes in remapTypes (#46250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46250 Previously the type of GetAttr nodes was getting set incorrectly and wasn't matching the module type Test Plan: Existing quantization tests Imported from OSS Reviewed By: jerryzh168 Differential Revision: D24279872 fbshipit-source-id: 2b2e3027f6e9ad8ba9e9b7937bd5cc5daaf6e17c	2020-10-13 12:59:28 -07:00
Thomas Viehmann	7b7f2519d9	Use storage.cpu() for moving storage to CPU in serialization. (#46028 ) Summary: As reported in https://github.com/pytorch/pytorch/issues/46020, something seems to go wrong with the storage._write_file method used with a BytesIO and a GPU buffer. Given that we were going to create the intermediate buffer (currently via BytesIO) anyway, we might as well use storage.cpu() to move the storage to the CPU. This appears to work better. This is a hot fix, further investigation is highly desirable. In particular, I don't have a reproducing test to show. Fixes https://github.com/pytorch/pytorch/issues/46020 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46028 Reviewed By: bdhirsh Differential Revision: D24194370 Pulled By: gchanan fbshipit-source-id: 99d463c4accb4f1764dfee42d7dc98e7040e9ed3	2020-10-13 12:51:10 -07:00
Eli Uriegas	fc846db667	.circleci: Fix android publish snapshot job (#46266 ) Summary: The android publish snapshot job was failing since it wasn't utilizing the new docker tagging system Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/46266 Reviewed By: walterddr Differential Revision: D24282375 Pulled By: seemethere fbshipit-source-id: 58e6ca80bda0b81b09f8614b9ccec764a2f26b49	2020-10-13 11:35:30 -07:00
Zafar	5604997b09	[quant][refactor] Alphabetize the entries in the quantized import (#46218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46218 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24264414 Pulled By: z-a-f fbshipit-source-id: 6d6fb8cc0e1ab28c64fa16dd343ff8f540ccf773	2020-10-13 11:24:38 -07:00
Neeraj Pradhan	faa9c22a51	Support pytest for distribution testing (#45648 ) Summary: In response to https://github.com/pytorch/pytorch/issues/11578. This is a test run to see if CI (and other internal systems) works fine with pytest style tests. - Creates a separate `distributions` directory within `test`. - For testing, this rewrites the `constraint` tests as parameterized tests in pytest. I don't plan to convert any other tests to pytest style, but only expose this option for adding new tests, if required. If this is a success, we can move `EXAMPLES` in `test_distributions` into a separate file that can be imported by both pytest and unittest style tests. cc. fritzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/45648 Reviewed By: ezyang, colesbury Differential Revision: D24080248 Pulled By: neerajprad fbshipit-source-id: 1f2e7d169c3c291a3051d0cece17851560fe9ea9	2020-10-13 10:56:50 -07:00
Jane Xu	ad376f1a62	trying to make pow work for tensor raised to the power of a scalar (#46185 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46037 I'm not sure this is the most performant solution, but this works: torch.pow(cuda_tensor, 5) should work and worked before. torch.pow(cuda_tensor, torch.tensor(5)), should work and works now! torch.pow(cuda_tensor, torch.tensor((5,))), should NOT work and complain the tensors are on different devices and indeed continues to complain. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46185 Reviewed By: glaringlee, malfet Differential Revision: D24257687 Pulled By: janeyx99 fbshipit-source-id: 2daf235d62ec5886d7c153da05445c2ec71dec98	2020-10-13 10:14:36 -07:00
Iurii Zdebskyi	1a57b390e8	Add torch._foreach_maximum(TensorList, TensorList) & torch._foreach_minimum(TensorList, TensorList) APIs (#45692 ) Summary: - Adding torch._foreach_maximum(TensorList, TensorList) API - Adding torch._foreach_minimum(TensorList, TensorList) API - Updated Adam/AdamW optimizers Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/45692 Reviewed By: anjali411 Differential Revision: D24142464 Pulled By: izdeby fbshipit-source-id: 6a4fc343a1613cb1e26c8398450ac9cea0a2eb51	2020-10-13 09:22:30 -07:00
chengjun	5741de883a	Define the record_stream method in native_functions.yaml (#44301 ) Summary: The record_stream method was hard coded for CUDA device. Define the record_stream in the native_functions.yaml to enable the dynamic dispatch to different end device. Fixes https://github.com/pytorch/pytorch/issues/36556 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44301 Reviewed By: glaringlee Differential Revision: D23763954 Pulled By: ezyang fbshipit-source-id: e6d24f5e7892b56101fa858a6cad2abc5cdc4293	2020-10-13 09:15:22 -07:00
Edward Yang	d705083c2b	Refactor dispatcher and native to use Signature structure. (#45990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45990 In #45890 we introduced the concept of a CppSignature, which bundled up all of the information necessary to declare a C++ signature for the cpp API. This PR introduces analogous concepts for dispatcher and native: DispatcherSignature and NativeSignature. The three interfaces are not particularly well coupled right now, but they do have some duck typing coincidences: - defn() which renders the C++ definition "bool f(int x)" - decl() which renders the C++ declaration "bool f(int x = 2)" - type() which renders the C++ function type "bool(int)" Maybe at some point we'll introduce a Protocol, or a supertype. Many other methods (like arguments()) have varying types. These signatures also have some helper methods that forward back to real implementations in the api modules. Something to think about is whether or not we should attempt to reduce boilerplate here or not; I'm not too sure about it yet. The net effect is we get to reduce the number of variables we have to explicitly write out in the codegen, since now these are all bundled together into a signature. Something extra special happens in BackendSelect, where we now dynamically select between dispatcher_sig and native_sig as "how" the backend select is implemented. A little bit of extra cleanup: - Some places where we previously advertised Sequence, we now advertise a more informative Tuple. - defn() may take an optional positional parameter overriding the entire name, or a kwarg-only prefix parameter to just add a prefix to the name. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D24223100 Pulled By: ezyang fbshipit-source-id: f985eced08af4a60ba9641d125d0f260f8cda9eb	2020-10-13 08:34:48 -07:00
Edward Yang	f086032676	Remove unnecessary byte-for-byte compatibility code that is not needed. (#45975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45975 I reordered declarations in the faithful API reimplementation to make sure the diffs lined up nicely; they're not necessary now. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D24223102 Pulled By: ezyang fbshipit-source-id: 77c6ae40c9a3dac36bc184dd6647d6857c63a50c	2020-10-13 08:34:46 -07:00
Edward Yang	8d5c899b19	Rename legacy_dispatcher to native. (#45974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45974 The term "legacy dispatcher" caused a bunch of confusion between me and Sebastian when discussing what the intended semantics of legacy dispatcher argument is. Legacy dispatcher argument implies that you ought NOT to use it when you have use_c10_dispatcher: full; but that's not really what's going on; legacy dispatcher API describes the API that you write native:: functions (NativeFunctions.h) to. Renaming it here makes this more clear. I applied these seds: ``` git grep -l 'legacy_dispatcher' \| xargs sed -i 's/legacy_dispatcher/native/g' git grep -l 'legacydispatcher' \| xargs sed -i 's/legacydispatcher/native/g' git grep -l 'LegacyDispatcher' \| xargs sed -i 's/LegacyDispatcher/Native/g' ``` and also grepped for "legacy" in tools/codegen and fixed documentation. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D24223101 Pulled By: ezyang fbshipit-source-id: d1913b8b823b3b95e4546881bc0e876acfa881eb	2020-10-13 08:34:43 -07:00
Edward Yang	527a8bee02	Reorder dispatcher/legacy_dispatcher types (#45973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45973 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D24163527 Pulled By: ezyang fbshipit-source-id: 2631a2ccd7ab525fe32fa56192ded4ff7ac3723f	2020-10-13 08:34:39 -07:00
Edward Yang	944eb0e31d	Add NativeFunctionGroup (#45918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45918 This groups together related native functions (functional, inplace, out) into a single group. It's not used by anything but Jiakai said this would be useful for his stuff so I'm putting it in immediately. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D24163526 Pulled By: ezyang fbshipit-source-id: 9979b0fe9249c78e4a64a50c5ed0e2ab99f499b9	2020-10-13 08:34:36 -07:00
Edward Yang	9079aea1ac	Rewrite implementation of faithful cpp signatures (#45890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45890 This rewrite is as per my comments at https://github.com/pytorch/pytorch/pull/44087#issuecomment-701664506 I did the rewrite by reverting #44087 and then reimplementing it on top. You may find it easier to review by diffing against master with only #44087 reverted. There are two main ideas. First, we now factor cpp argument processing into two phases operating on three representations of data: 1. `FunctionSchema` - this is the source from native_functions.yaml 2. `Union[Argument, ThisArgument, TensorOptionsArgument]` - this is the arguments after doing some basic semantic analysis to group them (for TensorOptions) or identify the this argument (if this is a method). There is only ever one of these per functions. 3. `Union[CppArgument, CppThisArgument, CppTensorOptionsArgument]` - this is the arguments after we've elaborated them to C++. There may be multiple of these per actual C++ signature. You can think of (2) as common processing, whereas (3) bakes in specific assumptions about whether or not you have a faithful or non-faithful signature. Second, we now have CppSignature and CppSignatureGroup representing the total public C++ API signature. So those dataclasses are what know how to render definitions/declarations, and you no longer have to manually type it out in the Functions/TensorMethods codegen. Here is an exhaustive accounting of the changes. tools.codegen.api.types - CppSignature and CppSignatureGroup got moved to tools.codegen.api.types - Add new CppThisArgument and CppTensorOptionsArguments (modeled off of ThisArgument and TensorOptionsArguments) so that we can retain high level semantic structure even after elaborating terms with C++ API information. Once this is done, we can refine CppArgument.argument to no longer contain a ThisArgument (ThisArgument is always translated to CppThisArgument. Note that this doesn't apply to TensorOptionsArguments, as those may be expanded or not expanded, and so you could get a single CppArgument for 'options') - Add no_default() functional mutator to easily remove default arguments from CppArgument and friends - Add an explicit_arguments() method to CppArgument and friends to extract (flat) argument list that must be explicitly written in the signature. This is everything except (Cpp)ThisArgument, and is also convenient when you don't care about the extra structure of CppTensorOptionsArguments tools.codegen.api.cpp - group_arguments is back, and it doesn't send things directly to a CppSignatureGroup; instead, it moves us from representation (1) to (2) (perhaps it should live in model). Here I changed my mind from my PR comment; I discovered it was not necessary to do classification at grouping time, and it was simpler and easier to do it later. - argument got split into argument_not_this/argument/argument_faithful. argument and argument_faithful are obvious enough what they do, and I needed argument_not_this as a more refined version of argument so that I could get the types to work out on TensorOptionsArguments tools.codegen.api.dispatcher - Here we start seeing the payoff. The old version of this code had a "scatter" mode and a "gather" mode. We don't need that anymore: cppargument_exprs is 100% type-directed via the passed in cpp arguments. I am able to write the functions without any reference to use_c10_dispatcher tools.codegen.gen - Instead of having exprs_str and types_str functions, I moved these to live directly on CppSignature, since it seemed pretty logical. - The actual codegen for TensorMethods/Functions is greatly simplified, since (1) all of the heavy lifting is now happening in CppSignature(Group) construction, and (2) I don't need to proxy one way or another, the new dispatcher translation code is able to handle both cases no problem. There is a little faffing about with ordering to reduce the old and new diff which could be removed afterwards. Here are codegen diffs. For use_c10_dispatcher: full: ``` +// aten::_cudnn_init_dropout_state(float dropout, bool train, int dropout_seed, , ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor Tensor _cudnn_init_dropout_state(double dropout, bool train, int64_t dropout_seed, const TensorOptions & options) { - return _cudnn_init_dropout_state(dropout, train, dropout_seed, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt()); + static auto op = c10::Dispatcher::singleton() + .findSchemaOrThrow("aten::_cudnn_init_dropout_state", "") + .typed<Tensor (double, bool, int64_t, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>)>(); + return op.call(dropout, train, dropout_seed, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt()); } ``` Otherwise: ``` +// aten::empty_meta(int[] size, , ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor Tensor empty_meta(IntArrayRef size, c10::optional<ScalarType> dtype, c10::optional<Layout> layout, c10::optional<Device> device, c10::optional<bool> pin_memory, c10::optional<MemoryFormat> memory_format) { - return empty_meta(size, TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory), memory_format); + static auto op = c10::Dispatcher::singleton() + .findSchemaOrThrow("aten::empty_meta", "") + .typed<Tensor (IntArrayRef, const TensorOptions &, c10::optional<MemoryFormat>)>(); + return op.call(size, TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory), memory_format); } ``` Things that I probably did not get right: - The Union[Argument, TensorOptionsArguments, ThisArgument] and the Cpp variants are starting to get a little unwieldy. Not sure if this means I should add a supertype (or at the very least an alias); in some cases I do purposely omit one of these from the Union - Code may not necessarily live in the most logical files. There isn't very much rhyme or reason to it. - The fields on CppSignature. They're not very well constrained and it will be better if people don't use them directly. - Disambiguation. We should do this properly in #44087 and we don't need special logic for deleting defaulting for faithful signatures; there is a more general story here. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D24144035 Pulled By: ezyang fbshipit-source-id: a185f8bf9df8b44ca5718a7a44dac23cefd11c0a	2020-10-13 08:31:54 -07:00
Brian Hirsh	a3caa719af	fix #45552 - adding add_done_callback(fn) to torch.futures.Future (#45675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45675 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24055353 Pulled By: bdhirsh fbshipit-source-id: 9233c8e17acc878f0fecbe740a4397fb55cf722f	2020-10-13 07:47:36 -07:00
Alexander Grund	282f4ab947	Workaround for bug in DistributedDataParallel (#46186 ) Summary: Fix the DistributedDataParallelSingleProcessTest to work around a limitation in DistributedDataParallel where the batch_size needs to evenly divide by the number of GPUs used See https://github.com/pytorch/pytorch/issues/46175 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46186 Reviewed By: bdhirsh Differential Revision: D24264664 Pulled By: mrshenli fbshipit-source-id: 6cfd6d29e97f3e3420391d03b7f1a8ad49d75f48	2020-10-13 07:34:02 -07:00
Tao Xu	a277c097ac	[iOS][GPU] Add Metal/MPSCNN support on iOS (#46112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46112 ### Summary This PR adds the support of running torchscript models on iOS GPU via Metal (Inference only). The feature is currently in prototype state, API changes are expected. The tutorial and the documents will be added once it goes to beta. allow-large-files - Users API ``` auto module = torch::jit::load(model); module.eval(); at::Tensor input = at::ones({1,3,224,224}, at::ScalarType::Float).metal(); auto output = module.forward({input}).toTensor().cpu(); ``` - Supported Models - Person Segmentation v106 (FB Internal) - Mobilenetv2 - Supported Operators - aten::conv2d - aten::addmm - aten::add.Tensor - aten::sub.Tensor - aten::mul.Tensor - aten::relu - aten::hardtanh - aten::hardtanh_ - aten::sigmoid - aten::max_pool2d - aten::adaptive_avg_pool2d - aten::reshape - aten::t - aten::view - aten::log_softmax.int - aten::upsample_nearest2d.vec - Supported Devices - Apple A9 and above - iOS 10.2 and above - CMake scripts - `IOS_ARCH=arm64 ./scripts/build_ios.sh -DUSE_METAL=ON` ### Test Plan - Circle CI ghstack-source-id: 114155638 Test Plan: 1. Sandcastle CI 2. Circle CI Reviewed By: dreiss Differential Revision: D23236555 fbshipit-source-id: 98ffc48b837e308bc678c37a9a5fd8ae72d11625	2020-10-13 01:46:56 -07:00
Jerry Zhang	7f6a1b2bd5	[quant][fx][graphmode][api] Change API for custom module (#45920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45920 See docs for new way of defining custom modules Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24145856 fbshipit-source-id: 488673fba503e39e8e303ed5a776fe36899ea4e3	2020-10-12 23:42:27 -07:00
Ailing Zhang	e6d30c89c1	Revert D24165889: Update native_functions.yaml to add DefaultBackend. Test Plan: revert-hammer Differential Revision: D24165889 (`1f9ddf64d2`) Original commit changeset: 7f3ccdb3499b fbshipit-source-id: b5d0de57d918011f1e19c9ef6aafa89fefcb42d5	2020-10-12 23:17:06 -07:00
Ailing Zhang	1f9ddf64d2	Update native_functions.yaml to add DefaultBackend. (#45938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45938 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24165889 Pulled By: ailzhang fbshipit-source-id: 7f3ccdb3499b40795bc34af716d0e63241ae8de3	2020-10-12 22:06:50 -07:00
Nikita Shulga	ba1e0a88bb	Use const-references in nodes_to_rewrite range loop Test Plan: CI Reviewed By: supriyar Differential Revision: D24267389 fbshipit-source-id: c56d6bf1924b4c4c993fdf1328cfd5ab0d890869	2020-10-12 20:08:34 -07:00
Nikita Shulga	4ad4715643	Fix JIT test config (#46230 ) Summary: Kill jit_simple test config because it was no different than a regular diff config Fix pattern matching in test.sh for `jit_legacy` config, as it was expecting `legacy_jit` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46230 Reviewed By: walterddr Differential Revision: D24270144 Pulled By: malfet fbshipit-source-id: 2e00dba288af1f1e904334b952033aa21062927a	2020-10-12 19:42:06 -07:00
Kurt Mohler	66505b64a5	Fix incorrect CUDA `torch.nn.Embedding` result when max_norm is not None and indices are not sorted (#45248 ) Summary: Sorting indices before calling `thrust::unique` fixes the issue. Fixes https://github.com/pytorch/pytorch/issues/44792 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45248 Reviewed By: mruberry Differential Revision: D24194696 Pulled By: ngimel fbshipit-source-id: ab59ef9d46b9917b1417bab25f80ce9780f0c930	2020-10-12 18:28:07 -07:00
Zachary DeVito	88dcb95e22	[fx] use a linked list for nodes (#45708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45708 This makes it possible to define reasonable semantics for what happens when a node in the list is deleted. In particular the iteration over nodes will continue at the node that was after the deleted node _when it was deleted_. If the new node is also deleted, we skip it and, continue to the node after it. Eventually we either reach a node still in the list or we reach the end of the list. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D24089516 Pulled By: zdevito fbshipit-source-id: d01312d11fe381c8d910a83a08582a2219f47dda	2020-10-12 18:20:14 -07:00
Vitaly Fedyunin	31ee5d8d8b	Adding information how to control randomness with DataLoader (#45749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45749 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24088407 Pulled By: VitalyFedyunin fbshipit-source-id: 398b73ec5e8c83000ebc692001da847fc0aaa48f	2020-10-12 16:57:58 -07:00
Yi Wang	ee3d3e6dba	[pytorch][PR][Gradient Compression] Reduce the peak memory of fp16 compression provided by ddp comm hook (#46078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46078 The peak memory usage of ddp comm hook has increased due to an extra copy of gradient tensors. To reduce the memory usage, decompress the fp16 tensor in place of the tensor stored in the the gradient bucket. #Closes: https://github.com/pytorch/pytorch/issues/45968 ghstack-source-id: 113996453 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_accumulate_gradients_no_sync_allreduce_hook Also verified the decrease in memory consumption with some toy modeling exmaples. Reviewed By: pritamdamania87 Differential Revision: D24178118 fbshipit-source-id: 453d0b52930809bd836172936b77abd69610237a	2020-10-12 16:15:38 -07:00
Daya Khudia	87a4baf616	[pt][quant] Support either min or max in qclamp (#45937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45937 torch.clamp can now be used with quantized tensors with either min argument or max argument only Fixes https://github.com/pytorch/pytorch/issues/45928 ghstack-source-id: 114085914 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_qclamp' --print-passing-details ``` Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/4222124686876909 ✓ ListingSuccess: caffe2/test:quantization - main (7.602) ✓ Pass: caffe2/test:quantization - test_qclamp (quantization.test_quantized_op.TestQuantizedOps) (7.233) Summary Pass: 1 ListingSuccess: 1 Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4222124686876909 ``` Reviewed By: jerryzh168 Differential Revision: D24153431 fbshipit-source-id: 9735635a48bcdd88d1dd6dc2f18b59311d45ad90	2020-10-12 16:07:31 -07:00
Erjia Guan	bed3b40523	Implement ravel (#46098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46098 Doc: ![image](https://user-images.githubusercontent.com/68879799/95611323-ae5cf380-0a2f-11eb-9b8e-56bf79ce68af.png) Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24253213 Pulled By: ejguan fbshipit-source-id: 42a866c902272cbe3743a9d0cb3afb9165d51c0b	2020-10-12 16:00:44 -07:00
Rong Rong	b98e35948f	fix test_serialization not working with Windows. (#46120 ) Summary: fixes https://github.com/pytorch/pytorch/issues/45917. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46120 Reviewed By: janeyx99 Differential Revision: D24253317 Pulled By: walterddr fbshipit-source-id: 6caa0970b3e3eb972d314639be773a104a4e89a5	2020-10-12 15:18:46 -07:00
Nick Gibson	f3db68776c	[NNC] Fix two more bugs in Cuda Half support (#46129 ) Summary: Fixes two bugs reported by https://github.com/pytorch/pytorch/issues/45953 in the NNC Cuda codegen which could break when using Half floats: 1. The Registerizer will generate new scalars with the type of the load being replaced, and doesn't have Cuda specific logic to avoid using the half type. I've added a quick mutator to coerce these to float, similar to the existing load casting rules. 2. We're not handling explicit casts to Half inserted by the user (in the report the user being the JIT). Addressing this by replacing these with casts to Float since thats the type we do Half math in. Fixes https://github.com/pytorch/pytorch/issues/45953. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46129 Reviewed By: glaringlee Differential Revision: D24253639 Pulled By: nickgg fbshipit-source-id: 3fef826eab00355c81edcfabb1030332cae595ac	2020-10-12 13:31:07 -07:00
Brian Hirsh	c02efdefa8	adding complex support for distributed functions and . fix #45760 (#45879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45879 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24127949 Pulled By: bdhirsh fbshipit-source-id: 8061b14fa1c0adbe22b9397c2d7f92618556d223	2020-10-12 12:44:47 -07:00
Rong Rong	8de9aa196a	clean up dataclasses installation to only <3.7 (#46182 ) Summary: clean up docker image installation of dataclasses. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46182 Reviewed By: glaringlee Differential Revision: D24257553 Pulled By: walterddr fbshipit-source-id: 065a607f52c7e1dc6d0765d87e4468d1752c063b	2020-10-12 12:18:29 -07:00
Jane Xu	ba78eb80ff	including tensorexpr tests in CI for all configs (#46188 ) Summary: Removed test_tensorexpr from the JIT-EXECUTOR exclude list. CI will now run those tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46188 Reviewed By: glaringlee Differential Revision: D24255433 Pulled By: janeyx99 fbshipit-source-id: f18e5b41d49b439407c1c24ef6190ef68bc809bf	2020-10-12 12:03:06 -07:00
Danny Huang	85c3ba5588	[caffe2] add PlanExecutorTest ErrorPlanWithCancellableStuckNet (#46110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46110 ## Motivation * `Cancel` is now added to `OperatorBase` and `NetBase` (https://github.com/pytorch/pytorch/pull/44145). * We need a test to cover and exhibit that we can cancel stuck net and propagate error with plan executor. ## Summary * Added PlanExecutorTest `ErrorPlanWithCancellableStuckNet` for plan executor. * Set cancelCount to zero at the beginning of tests to avoid global state be carried over in some test environment. Test Plan: ## Unit Test Added ``` buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 1000 ``` Reviewed By: d4l3k Differential Revision: D24226577 fbshipit-source-id: c834383bfe6ab50747975c229eb42a363eed3458	2020-10-12 12:00:15 -07:00
partypyro	8d5256e6dd	Made exception message for torch.LongTensor() legacy constructor more readable (#46147 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46085 Made exception message for torch.LongTensor() legacy constructor more readable ![exception_screenshot](https://user-images.githubusercontent.com/13827698/95664789-e3387b80-0aff-11eb-8e8e-bd2ee449cd7e.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/46147 Reviewed By: glaringlee Differential Revision: D24252617 Pulled By: mrshenli fbshipit-source-id: 6c03b66fef50cf18f9d37c7047d3b98c847ae287	2020-10-12 11:26:38 -07:00
Gregory Chanan	2070834b9e	Improve error checking of Storage._writeFile. (#46036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46036 Previously, this function didn't do error-bounds checking on the GetItem (GET_ITEM) calls, which led to issues like https://github.com/pytorch/pytorch/issues/46020. A better solution would be to use pybind, but given writing the file is going to dominate bounds checking, this is strictly better. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24228370 Pulled By: gchanan fbshipit-source-id: f5d0a3d21ff12b4380beefe1e9954fa81ea2f567	2020-10-12 11:10:04 -07:00
neerajprad	9202c44379	Fix error in Binomial to retain lazy logit initialization (#46055 ) Summary: Some internal tests were sporadically failing for https://github.com/pytorch/pytorch/pull/45648. The cause of this is a bug in `Binomial.__init__` that references the lazy `logits` attribute and sets it when not needed. This cleans up the `is_scalar` logic too which isn't needed given that `broadcast_all` will convert `Number` to a `tensor`. The reason for the flakiness is the mutation of the params dict by the first test, which is fixed by doing a shallow copy. It will be better to convert this into a pytest parameterized test once https://github.com/pytorch/pytorch/pull/45648 is merged. cc. fritzo, ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/46055 Reviewed By: agolynski Differential Revision: D24221151 Pulled By: neerajprad fbshipit-source-id: 15aae90a692ee6aed729c9f1d2d1b1388170a3c0	2020-10-12 10:56:06 -07:00
Xu Zhao	146721f1df	Fix typing errors in the torch.distributions module (#45689 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42979. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45689 Reviewed By: agolynski Differential Revision: D24229870 Pulled By: xuzhao9 fbshipit-source-id: 5fc87cc428170139962ab65b71cacba494d46130	2020-10-12 10:29:45 -07:00
Jerry Zhang	6a001decf2	[quant][test] Add mul_scalar test (#46106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46106 make sure quantized::mul_scalar matches dequantize - mul - quantize Test Plan: Imported from OSS Reviewed By: dskhudia Differential Revision: D24230790 fbshipit-source-id: 1adcc82b9c41f1b53c9a761477f7c5c08aba1001	2020-10-12 10:24:08 -07:00
Sebastian Messmer	6ba6ecb048	Only use hacky_wrapper_for_legacy_signatures if an op needs it (#45742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45742 Add a new flag to native_functions.yaml: `use_c10_dispatcher: hacky_wrapper_for_legacy_signatures` and the codegen only wraps kernels in the aforementioned wrapper if that flag is set. Apart from that, `use_c10_dispatcher: hacky_wrapper_for_legacy_signatures` is equivalent to `full`, i.e. it has full boxing and unboxing support. This greatly reduces the number of ops we apply the hacky_wrapper to, i.e. all ops marked as `use_c10_dispatcher: full` don't have it anymore. ghstack-source-id: 113982139 Test Plan: waitforsandcastle vs fbcode: https://www.internalfb.com/intern/fblearner/details/214511705/ vs base diff: https://www.internalfb.com/intern/fblearner/details/214693207/ Reviewed By: ezyang Differential Revision: D23328718 fbshipit-source-id: be120579477b3a05f26ca5f75025bfac37617620	2020-10-12 09:39:18 -07:00
pinzhenx	e1f74b1813	Fix mkldnn build on legacy x64 arch (#46082 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45838 `ARCH_OPT_FLAGS` was the old name of `MKLDNN_ARCH_OPT_FLAGS`, which has been renamed in [this commit](`2a011ff02e (diff-a0abcbf647ed740b80615fb5b1614a44L97)`), but not updated in pytorch. As its default value will be set to sse4.1, some kernels are going to fail on the legacy arch that does not support SSE4.1. This patch was to make this flag effective. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46082 Reviewed By: glaringlee Differential Revision: D24252149 Pulled By: agolynski fbshipit-source-id: 7079deed373d664763c5888feb28795e5235caa8	2020-10-12 08:45:06 -07:00
kshitij12345	a814231616	[fix] torch.kthvalue : handle non-contiguous CUDA tensor (#45802 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45721 TODO * [x] Test Pull Request resolved: https://github.com/pytorch/pytorch/pull/45802 Reviewed By: ngimel Differential Revision: D24236706 Pulled By: mruberry fbshipit-source-id: 5a51049233efa710f9500a6f7d099c90d43062c9	2020-10-11 20:13:08 -07:00
John Lundell	3883cdb87e	TensorInferenceFunction checks Summary: Added OpSchema::NeedsAllInputShapes wrapper around the TensorInferenceFunction to fix exception when referencing the dim array when the input shape was unknown. There may be other operators that could use a similar change, these are just the ones that was causing InferShapesAndTypes throw an exception for my examples. Test Plan: Tested with notebook n352716 Differential Revision: D23745442 fbshipit-source-id: d63eddea47d7ba595e73c4693d34c790f3a329cc	2020-10-11 16:08:58 -07:00
Mark Santaniello	1a99689d71	[caffe2] Fix preprocessor checks for FMA Summary: I think this preprocessor check is incorrect. The fused multiply-add (FMA) instructions are not part of AVX2. Test Plan: CI Reviewed By: jspark1105 Differential Revision: D24237836 fbshipit-source-id: 44f9b9179918332eb85ac087827726300f56224e	2020-10-11 11:48:32 -07:00
Facebook Community Bot	bbb3f09377	Automated submodule update: FBGEMM (#46151 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `da05c8db75` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46151 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D24239896 fbshipit-source-id: 78ff9c100e39ef9a429eafd11a4c158dabd5cb15	2020-10-10 21:26:37 -07:00
Kurt Mohler	a0a8bc8870	Fix mistakes and increase clarity of norm documentation (#42696 ) Summary: * Removes incorrect statement that "the vector norm will be applied to the last dimension". * More clearly describe each different combination of `p`, `ord`, and input size. * Moves norm tests from `test/test_torch.py` to `test/test_linalg.py` * Adds test ensuring that `p='fro'` and `p=2` give same results for mutually valid inputs Fixes https://github.com/pytorch/pytorch/issues/41388 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42696 Reviewed By: bwasti Differential Revision: D23876862 Pulled By: mruberry fbshipit-source-id: 36f33ccb6706d5fe13f6acf3de8ae14d7fbdff85	2020-10-10 14:12:43 -07:00
Mikhail Zolotukhin	496d72d700	[TensorExpr] Disable and/or fix some failing tests. (#46146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46146 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D24238545 Pulled By: ZolotukhinM fbshipit-source-id: 0d8242da9d1c6960f7b5e9065c3e8defd3d32494	2020-10-10 13:54:25 -07:00
Jongsoo Park	4c87d337af	[Caffe2] use the real new fbgemm sparse adagrad interface (#46132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46132 As title Test Plan: . Reviewed By: dskhudia Differential Revision: D24197694 fbshipit-source-id: 2bfe8f52409fa500d2ea359dec7f521cffb20efb	2020-10-10 08:57:54 -07:00
Basil Hosmer	9f743015bf	a few more comments on dispatch key computation methods (#46128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46128 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D24233868 Pulled By: bhosmer fbshipit-source-id: efb80fb25d4e3ece3ef9190ee1ed834dff505d7c	2020-10-10 01:17:40 -07:00
Bert Maher	b7261de0df	[pytorch][te] Add compilation time benchmark (#46124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46124 We want to make sure we can actually fuse kernels within a fairly tight time budget. So here's a quick benchmark of codegen for a simple pointwise activation function (swish). I kept all the intermediate tensors separate to force TE to actually do inlining. Test Plan: ``` buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench ``` I've only run in debug mode so results aren't super meaningful, but even in that mode it's 18ms for compilation, 15 of which are in llvm. Update, opt build mode: ``` ---------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------- BM_CompileSwish 5123276 ns 5119846 ns 148 BM_CompileSwishLLVMOnly 4754361 ns 4753701 ns 160 ``` Reviewed By: asuhan Differential Revision: D24232801 fbshipit-source-id: d58a8b7f79bcd9244c49366af7a693e09f24bf76	2020-10-09 23:11:37 -07:00
shmsong	43fe45ab0f	[JIT] Add dynamic shape benchmark for NV Fuser (#46107 ) Summary: This PR modifies `benchmarks/tensorexpr`. It follows up[ https://github.com/pytorch/pytorch/issues/44101](https://github.com/pytorch/pytorch/pull/44101) and further supports characterizing fusers with dynamic shape benchmarks. Dynamic shape condition models the use case when the input tensor shape changes in each call to the graph. Changes include: Added an auxiliary class `DynamicShape `that provides a simple API for enabling dynamic shapes in existing test cases, example can be found with `DynamicSimpleElementBench` Created new bench_cls: `DynamicSimpleElementBench`, `DynamicReduce2DInnerBench`, `DynamicReduce2DOuterBench`, and `DynamicLSTM`. They are all dynamic shaped versions of existing benchmarks and examples of enabling dynamic shape with `DynamicShape`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46107 Reviewed By: glaringlee Differential Revision: D24229400 Pulled By: bertmaher fbshipit-source-id: 889fece5ea87d0f6f6374d31dbe11b1cd1380683	2020-10-09 22:09:21 -07:00
Michael Ranieri	689499ffa8	remove duplicate autograd srcs (#46059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46059 These files are not generated sources, and they also already exist in another variable in the same file (`libtorch_extra_sources`). Test Plan: CI green Reviewed By: malfet Differential Revision: D24203450 fbshipit-source-id: 0c9e12cd1a292c5484961876d4fa7f2341a3165b	2020-10-09 20:17:21 -07:00
Facebook Community Bot	5e4b3dd25a	Automated submodule update: FBGEMM (#46125 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `75ea7ce6f8` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46125 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D24233275 fbshipit-source-id: 526c44aba92e622c6b46c17b467e146303a77b57	2020-10-09 19:59:52 -07:00
Zeliang Chen	34951e9adc	[shape inference] adding a new flag to the struct Summary: Adding a new flag shape_is_set to the structs for shape inference on in-place op to prevent duplicated inference. Test Plan: buck test mode/opt-clang caffe2/caffe2/opt:bound_shape_inference_test buck test mode/opt-clang caffe2/caffe2/fb/opt:shape_info_utils_test Reviewed By: ChunliF Differential Revision: D24134767 fbshipit-source-id: 5142e749fd6d1b1092a45425ff7b417a8086f215	2020-10-09 19:29:08 -07:00
Vasiliy Kuznetsov	138c22f8e3	qnnpack quantized activations: fix memory format issues (#46077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46077 Some of QNNPACK quantized kernels were not handling NHWC correctly, the data written respected the input format but the memory flag was always set to contiguous. This PR 1. adds testing for NHWC for qnnpack activations 2. fixes those activations which did not set the memory format on the output Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_qhardsigmoid python test/test_quantization.py TestQuantizedOps.test_leaky_relu python test/test_quantization.py TestQuantizedOps.test_hardswish python test/test_quantization.py TestQNNPackOps.test_qnnpack_tanh python test/test_quantization.py TestQNNPackOps.test_qnnpack_sigmoid ``` Imported from OSS Reviewed By: supriyar Differential Revision: D24213257 fbshipit-source-id: 764fb588a8d8a0a6e6e4d86285904cdbab26d487	2020-10-09 19:18:15 -07:00
Omkar Salpekar	172036a565	[NCCL] Add Error log when ProcessGroupNCCL takes down process upon (#44988 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44988 The new NCCL async error handling feature throws an exception from the workCleanup Thread if one of the NCCL operations encounters an error or times out. This PR adds an error log to make it more clear to the user why the training process crashed. ghstack-source-id: 114002493 Test Plan: Verified that we see this error message when running with the desync test. Reviewed By: pritamdamania87 Differential Revision: D23794801 fbshipit-source-id: 16a44ce51f01531062167fb762a8553221363698	2020-10-09 16:58:50 -07:00
Vasiliy Kuznetsov	7094c09ff7	quantizaton: add API usage logging (#46095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46095 Adds logging on usage of public quantization APIs. This only works in FB codebase and is a no-op in OSS. Test Plan: The test plan is fb-only Reviewed By: raghuramank100 Differential Revision: D24220817 fbshipit-source-id: a2cc957b5a077a70c318242f4a245426e48f75e5	2020-10-09 16:51:27 -07:00
James Reed	c73af6040e	[FX] Make `graph_copy` examine existing values in val_map (#46104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46104 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D24224505 Pulled By: jamesr66a fbshipit-source-id: ffdf8ea8cb92439f3aacf08b0c0db63ce3a15b8f	2020-10-09 16:37:55 -07:00
Ailing Zhang	d811d4d7ba	Support DefaultBackend keyword in native_functions.yaml. (#45719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45719 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24165888 Pulled By: ailzhang fbshipit-source-id: 9b3c5e71f5b6a985e1a43157813e7d77dbe13b07	2020-10-09 16:28:26 -07:00
Omkar Salpekar	e33d455ef7	[Distributed] Set smaller Store timeouts to make c10d tests run faster (#46067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46067 In our store tests, we expect there to be an exception when we call get on a recently-deleted key. Unforunately, the store waits for the timeout period for the key to be set before throwing, which causes the tests to idel wait for 5+ minutes. This PR decreases the timeouts before this set call so these tests run faster. ghstack-source-id: 113917315 Test Plan: Ran both the Python and C++ tests. Reviewed By: pritamdamania87 Differential Revision: D24208617 fbshipit-source-id: c536e59ee305e0c01c44198a3b1a2247b8672af2	2020-10-09 15:45:42 -07:00
Nick Gibson	2fa91fa305	[NNC] Fix crash when simplifying certain subtractions (#46108 ) Summary: Fixes a crash bug in the IRSimplifier when the LHS is a Term (e.g. 2x) and the RHS is a Polynomial (e.g. 2x+1). This case crashes 100% of the time so I guess it's not very common in models we've been benchmarking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46108 Reviewed By: agolynski Differential Revision: D24226593 Pulled By: nickgg fbshipit-source-id: ef454c855ff472febaeba16ec34891df932723c0	2020-10-09 15:15:55 -07:00
Mingzhe Li	281463ba0b	[NCCL] Enable send/recv tests (#45994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45994 Send/Recv tests were disabled because of the https://github.com/pytorch/pytorch/issues/42517. With that issue fixed, this diff enables those tests. ghstack-source-id: 113970569 Test Plan: waitforsandcastle Reviewed By: jiayisuse Differential Revision: D24172484 fbshipit-source-id: 7492ee2e9bf88840c0d0086003ce8e99995aeb91	2020-10-09 15:00:39 -07:00
Aliaksandr Ivanou	3ffd2af8cd	Add exception classification to torch.multiprocessing.spawn (#45174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45174 Introduce different types of exceptions that map to different failures of torch.multiprocessing.spawn. The change introduces three different exception types: ProcessRaisedException - occurs when the process initiated by spawn raises an exception ProcessExitedException - occurs when the process initiated by spawn exits The following logic will allow frameworks that use mp.spawn to categorize failures. This can be helpful for tracking metrics and enhancing logs. Test Plan: Imported from OSS Reviewed By: taohe Differential Revision: D23889400 Pulled By: tierex fbshipit-source-id: 8849624c616230a6a81158c52ce0c18beb437330	2020-10-09 12:59:41 -07:00
Jongsoo Park	da033e0b2d	[Caffe2] use new fbgemm sparse adagrad interface with temp name (#46089 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46089 Follow-up of D24195799 Test Plan: . Reviewed By: dskhudia Differential Revision: D24196753 fbshipit-source-id: 216512822cfb752984bb97bd229af9746e866eaa	2020-10-09 12:51:43 -07:00
Ailing Zhang	0ddcc0ce35	Add alias dispatch key DefaultBackend. (#45718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45718 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24165892 Pulled By: ailzhang fbshipit-source-id: ed28bf62b7c6320d966fd10b7a44b14efffe2f62	2020-10-09 12:02:44 -07:00
Hameer Abbasi	f8b3af21f2	Allow Tensor-likes in torch.autograd.gradcheck (#45732 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42942 Re-do of https://github.com/pytorch/pytorch/issues/43877. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45732 Reviewed By: mruberry Differential Revision: D24195820 Pulled By: albanD fbshipit-source-id: 8f43353077f341e34371affd76be553c0ef7d98a	2020-10-09 11:51:27 -07:00
Erjia Guan	59414b359d	Document fix for logspace and linspace (#46056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46056 The result: * logspace ![image](https://user-images.githubusercontent.com/68879799/95513793-e6f5c200-0988-11eb-8279-b093612743ca.png) * linspace ![image](https://user-images.githubusercontent.com/68879799/95513824-f543de00-0988-11eb-9910-72d28d7b6277.png) Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D24204441 Pulled By: ejguan fbshipit-source-id: fe1179fdbebb326d33e9c474b1efc8282a391901	2020-10-09 10:20:57 -07:00
Pritam Damania	c83314e982	[ci-all tests] Improve logging in ProcessGroupNCCL for debugging purposes. (#46010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46010 When training jobs running with NCCL fail sometimes it is hard to debug the reason of the failure and our logging doesn't provide enough information at times to narrow down the issue. To improve the debugging experience, I've enhanced our logging to add a lot more information about what the ProcessGroup is doing under the hood. #Closes: https://github.com/pytorch/pytorch/issues/45310 Sample output: ``` > I1002 15:18:48.539551 1822062 ProcessGroupNCCL.cpp:528] [Rank 2] NCCL watchdog thread started! > I1002 15:18:48.539533 1821946 ProcessGroupNCCL.cpp:492] [Rank 2] ProcessGroupNCCL initialized with following options: > NCCL_ASYNC_ERROR_HANDLING: 0 > NCCL_BLOCKING_WAIT: 1 > TIMEOUT(ms): 1000 > USE_HIGH_PRIORITY_STREAM: 0 > I1002 15:18:51.080338 1822035 ProcessGroupNCCL.cpp:530] [Rank 1] NCCL watchdog thread terminated normally > I1002 15:18:52.161218 1821930 ProcessGroupNCCL.cpp:385] [Rank 0] Wrote aborted communicator id to store: NCCLABORTEDCOMM:a0e17500002836080c8384c50000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > I1002 15:18:52.161238 1821930 ProcessGroupNCCL.cpp:388] [Rank 0] Caught collective operation timeout for work: WorkNCCL(OpType=ALLREDUCE, TensorShape=[10], Timeout(ms)=1000) > I1002 15:18:52.162120 1821957 ProcessGroupNCCL.cpp:530] [Rank 0] NCCL watchdog thread terminated normally > I1002 15:18:58.539937 1822062 ProcessGroupNCCL.cpp:649] [Rank 2] Found key in store: NCCLABORTEDCOMM:a0e17500002836080c8384c50000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, from rank: 0, aborting appropriate communicators > I1002 15:19:34.740937 1822062 ProcessGroupNCCL.cpp:662] [Rank 2] Aborted communicators for key in store: NCCLABORTEDCOMM:a0e17500002836080c8384c50000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > I1002 15:19:34.741678 1822062 ProcessGroupNCCL.cpp:530] [Rank 2] NCCL watchdog thread terminated normally ``` ghstack-source-id: 113961408 Test Plan: waitforbuildbot Reviewed By: osalpekar Differential Revision: D24183463 fbshipit-source-id: cb09c1fb3739972294e7edde4aae331477621c67	2020-10-09 09:46:58 -07:00
Rohan Varma	362d9a932e	Remove object-based collective APIs from public docs (#46075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46075 Removes these from public docs for now as we are still iterating/formalizing these APIs. Will add them back once they are part of a PyTorch release. ghstack-source-id: 113928700 Test Plan: CI Reviewed By: mrshenli Differential Revision: D24211510 fbshipit-source-id: 3e36ff6990cf8e6ef72b6e524322ae06f9097aa2	2020-10-09 09:24:51 -07:00
Rohan Varma	62554a3bd2	Prioritize raising error message about unused parameters when rebuild_buckets fails (#45933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45933 Occasionally users run DDP with models with unused params, in this case we would like to surface an error message telling them to run with find_unused_params=True. However, a recent change to rebuild_buckets logic (https://github.com/pytorch/pytorch/pull/44798) made it so that we raise a size mismatch error when this happens, but the information about unused parameters is likely to be more useful and likely to be the most common case of failure. Prefer raising this error over the subsequent size mismatch errors. ghstack-source-id: 113914759 Test Plan: Added unittest Reviewed By: mrshenli Differential Revision: D24151256 fbshipit-source-id: 5d349a988b4aac7d3e0ef7b3cd84dfdcbe9db675	2020-10-09 09:16:45 -07:00
generatedunixname89002005325676	9fb8e33a5b	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D24215555 fbshipit-source-id: 21d10bd60ab302c7cf7e245979b2d2ef0a142a1c	2020-10-09 08:37:54 -07:00
Facebook Community Bot	9443033e71	Automated submodule update: FBGEMM (#46079 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `974d2b41e7` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46079 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D24213375 fbshipit-source-id: b80786490079f9f56a90e10fbb476d0963cf2abc	2020-10-09 07:40:18 -07:00
Alexandre Saint	c734961e26	[cpp-extensions] Ensure default extra_compile_args (#45956 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45835 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45956 Reviewed By: ngimel Differential Revision: D24162289 Pulled By: albanD fbshipit-source-id: 9ba2ad51e818864f6743270212ed94d86457f4e6	2020-10-09 07:33:28 -07:00
Raghavan Raman	a5c0dbc519	Add support for Softmax. (#45286 ) Summary: This PR adds support for Softmax in NNC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45286 Reviewed By: mrshenli Differential Revision: D24042901 Pulled By: navahgar fbshipit-source-id: 120bafe17586d3ecf0918f9aee852a7c3a8f4990	2020-10-08 23:57:02 -07:00
Danny Huang	87226f72d2	[caffe2] temp remove ErrorPlanWithCancellableStuckNet (#46080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46080 temp removal of ErrorPlanWithCancellableStuckNet, will fill out more Test Plan: ``` buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest ``` remove a test Reviewed By: fegin Differential Revision: D24213971 fbshipit-source-id: e6e600bad00b45c726311193b4b3238f1700526e	2020-10-08 23:35:45 -07:00
Jane Xu	0983ddbfd2	add sharding option to test framework (#45988 ) Summary: Adding a sharding node to our python CONFIG_TREE Pull Request resolved: https://github.com/pytorch/pytorch/pull/45988 Reviewed By: mruberry Differential Revision: D24200636 Pulled By: janeyx99 fbshipit-source-id: 08c8c4cf98bbd4980fe6082ae6caa64fbc2ca792	2020-10-08 21:22:51 -07:00
Nikita Shulga	f363a2e106	Mark top 3 slowest tests as slow (#46068 ) Summary: `TCPStoreTest.test_numkeys_delkeys` takes 5+ min (mostly in idle wait for socket timeout) `TestDataLoader.test_proper_exit` and `TestDataLoaderPersistentWorkers.test_proper_exit` take 2.5 min each `TestXNNPACKConv1dTransformPass.test_conv1d_with_relu_fc` takes 2 min to finish Add option to skip reporting test classes that run for less than a second to `print_test_stats.py` and speed up `TestTorchDeviceTypeCUDA.test_matmul_45724_cuda` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46068 Reviewed By: mruberry Differential Revision: D24208660 Pulled By: malfet fbshipit-source-id: 780e0d8be4f0cf69ea28de79e423291a1f3349b7	2020-10-08 21:10:03 -07:00
Danny Huang	487624e369	[caffe2] plan executor error propagation test with blocking cancellable op (#45319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45319 ## Motivation * `Cancel` is now added to `OperatorBase` and `NetBase` (https://github.com/pytorch/pytorch/pull/44145) * We need a test to cover and exhibit that we can cancel stuck net and propagate error with plan executor. ## Summary * Added `ErrorPlanWithCancellableStuckNet` for plan executor. * We set a plan with two nets: one stuck net with blocking operator that never returns, and one with error net with error op that throws, and tested it throw and cancel. Test Plan: ## Unit Test added ``` buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 ``` ``` Summary Pass: 400 ListingSuccess: 2 ``` Reviewed By: d4l3k Differential Revision: D23920548 fbshipit-source-id: feff41f73698bd6ea9b744f920e0fece4ee44438	2020-10-08 19:54:49 -07:00
Mingzhe Li	8cd3857bc7	[NCCL] Add torch::cuda::nccl::send/recv (#45926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45926 torch/csrc/cuda/nccl.cpp is compiled as part of torch_cuda library and thus by calling this function from ProcessGroupNCCCL.cpp it avoids linking 2nd instance of libnccl.a into torch_python Fixes similiar issue as https://github.com/pytorch/pytorch/issues/42517 ghstack-source-id: 113910530 Test Plan: waitforsandcastle Reviewed By: jiayisuse Differential Revision: D24147802 fbshipit-source-id: d8901fdb31bdc22ddca2364f8050844639a1beb3	2020-10-08 19:20:40 -07:00
Mingzhe Li	b7f7378b2d	[NCCL] support send/recv to/from self when communicator is created on demand (#45873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45873 This diff adds support for sending/receiving to/from self. It also fixed a bug when p2p operations are not used by all processes. ghstack-source-id: 113910526 Test Plan: waitforsandcastle Reviewed By: jiayisuse Differential Revision: D24124413 fbshipit-source-id: edccb830757ac64f569e7908fec8cb2b43cd098d	2020-10-08 19:19:15 -07:00
Shen Li	96d48178c8	Make pipeWrite and pipeRead noexcept (#45783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45783 After the previous device maps commits, `pipeWrite` might throw. In this case, if we increment active calls before `pipeWrite` on the caller, that active call won't be decremented properly when `pipeWrite` throws. As a result, `shutdown` can silently timeout. I noticed this as some tests take more than 60s to finish. This commit extract the tensor device checking logic out of pipeWrite, and make sure the error is thrown before the active call count is incremented. Differential Revision: D24094803 Test Plan: Imported from OSS Reviewed By: mruberry Pulled By: mrshenli fbshipit-source-id: d30316bb23d2afd3ba4f5540c3bd94a2ac10969b	2020-10-08 18:53:51 -07:00
Peter Bell	c86ee082a2	torch.fft: Add helper functions section to docs (#46032 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/44877#issuecomment-705411068 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46032 Reviewed By: ngimel Differential Revision: D24191580 Pulled By: mruberry fbshipit-source-id: 58a32de886b40f85653ddc3b65bf8d551395f023	2020-10-08 17:57:12 -07:00
Jerry Zhang	2b204e6db3	[quant][fx][graphmode] Run symbolic_trace in quantization (#45919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45919 As discussed with JIT team, we'll run symbolic trace in quantization functions prepare_fx now takes orginal pytorch model (torch.nn.Module) instead of `GraphModule` as input Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D24145857 fbshipit-source-id: 2b7a4ca525a7a8c23a26af54ef594c6a951e4024	2020-10-08 17:26:03 -07:00
Michael Ranieri	c6672a608b	caffe2 missing cctype header (#46052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46052 `<cctype>` is what provides `isuppper`, etc. https://en.cppreference.com/w/cpp/header/cctype clang on windows complaining about the missing header. Test Plan: CI green Reviewed By: yinghai Differential Revision: D24201925 fbshipit-source-id: 7b242200f09c30bf78dde226e14ee4be71758b87	2020-10-08 16:48:49 -07:00
Supriya Rao	31888b2e77	[quant][pyper] Rename the sparse argument for embedding_bag ops (#46003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46003 sparse is confusing because itt is used in training for sparse gradients Test Plan: Imported from OSS Reviewed By: radkris-git, qizzzh Differential Revision: D24178248 fbshipit-source-id: 0a2b595f3873d33b2ce25839b6eee31d2bfd3b0d	2020-10-08 16:15:28 -07:00
Supriya Rao	8c80ee8ba5	[quant] Set sparse to False for embedding_bag ops in graph mode (#45997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45997 The current sparse field using in the float module is for sparse gradients, which is not applicable to inference. The sparse field in the quantizd ops denotes pruned weights. Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Imported from OSS Reviewed By: qizzzh Differential Revision: D24176543 fbshipit-source-id: a05b4ff949e0375462ae411947f68076e1b460d2	2020-10-08 16:13:12 -07:00
Sam Tsai	0cf0b5f2e8	Minor refactor to normalize assignments (#45671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45671 This is a follow up on D23977080 (`2596113a79`) and https://github.com/pytorch/pytorch/pull/45474. Test Plan: See D23977080 (`2596113a79`). Reviewed By: z-a-f Differential Revision: D24043125 fbshipit-source-id: 0c05930668533bfd7145fa605f3785484391130b	2020-10-08 16:06:48 -07:00
n-v-k	64b0686986	Expose ChannelShuffle (#46000 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45999 Also small fix for caffe2 counterpart Pull Request resolved: https://github.com/pytorch/pytorch/pull/46000 Reviewed By: mruberry Differential Revision: D24185855 Pulled By: ngimel fbshipit-source-id: c5d599bb8100b86b81c6901f1b8b8baefc12cb16	2020-10-08 16:00:01 -07:00
anjali411	89256611b5	Doc note update for complex autograd (#45270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45270 <img width="1679" alt="Screen Shot 2020-10-07 at 1 45 59 PM" src="https://user-images.githubusercontent.com/20081078/95368324-fa7b2d00-08a3-11eb-9066-2e659a4085a2.png"> <img width="1673" alt="Screen Shot 2020-10-07 at 1 46 10 PM" src="https://user-images.githubusercontent.com/20081078/95368332-fbac5a00-08a3-11eb-9be5-77ce6deb8967.png"> <img width="1667" alt="Screen Shot 2020-10-07 at 1 46 30 PM" src="https://user-images.githubusercontent.com/20081078/95368337-fe0eb400-08a3-11eb-80a2-5ad23feeeb83.png"> <img width="1679" alt="Screen Shot 2020-10-07 at 1 46 48 PM" src="https://user-images.githubusercontent.com/20081078/95368345-00710e00-08a4-11eb-96d9-e2d544554a4b.png"> <img width="1680" alt="Screen Shot 2020-10-07 at 1 47 03 PM" src="https://user-images.githubusercontent.com/20081078/95368350-023ad180-08a4-11eb-89b3-f079480741f4.png"> <img width="1680" alt="Screen Shot 2020-10-07 at 1 47 12 PM" src="https://user-images.githubusercontent.com/20081078/95368364-0535c200-08a4-11eb-82fc-9435a046e4ca.png"> Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D24203257 Pulled By: anjali411 fbshipit-source-id: cd637dade5fb40cecf5d9f4bd03d508d36e26fcd	2020-10-08 15:04:52 -07:00
huaidong.xiong	e3112e3ed6	aten::set_grad_enabled should not push as it does not return a value (#45559 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45558 This assertion failure is caused by the incorrect implementation of ``aten::set_grad_enabled`` in [torch/csrc/jit/runtime/register_special_ops.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/register_special_ops.cpp#L436). The current implementation is: ```cpp Operator( "aten::set_grad_enabled(bool val) -> ()", [](Stack* stack) { torch::GradMode::set_enabled(pop(stack).toBool()); push(stack, IValue()); }, aliasAnalysisConservative()), ``` which push a ``None`` on to the evaluation stack after calling ``set_enabled``. But according to the signature, the behavior is incorrect as the signature says this function won't return a value. I guess the original author might be confused by the behavior of Python, which pushes a ``None`` on to the evaluation stack when the function definition does not end with a return statement with an explicit result value. If ``aten::set_grad_enabled`` pushes a ``None`` on to the evaluation stack, each time it's called, the evaluation stack will accumulate an extra ``None``. In our case, ``with torch.no_grad():`` will cause ``aten::set_grad_enabled`` to be called twice, so when the ``forward`` method finishes, the evaluation stack will be ``[None, None, Tensor]``. But the return statement of ``GraphFunction::operator()`` in [torch/csrc/jit/api/function_impl.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/api/function_impl.cpp#L51) is ``return stack.front();`` which will try to extract a tensor out of a ``None`` thus causes the assertion failure. The solution is simple, just remove the push in the implementation of ``aten::set_grad_enabled``. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45559 Reviewed By: albanD Differential Revision: D24142153 Pulled By: SplitInfinity fbshipit-source-id: 75aad0e38bd912a437f7e1a1ee89ab4445e35b5d	2020-10-08 14:42:11 -07:00
Nikita Shulga	ddcacc736d	Do not rebase select nighly builds on top of master (#46038 ) Summary: Prevents the following nightly failures from happening: https://app.circleci.com/pipelines/github/pytorch/pytorch/224752/workflows/3a01ccc2-0215-4e95-9222-bbb4f9309201/jobs/8084912 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46038 Reviewed By: seemethere Differential Revision: D24195706 Pulled By: malfet fbshipit-source-id: d53da554bc43841ab6573188f9465c691c601eb3	2020-10-08 14:36:44 -07:00
Tristan Rice	59e4803b94	Recommit: caffe2/plan_executor: wait for 1 minute after exception and then abort (#45981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45981 This is a recommit of previously reverted D20850851 (`3fbddb92b1`). TL;DR - combining condition_variables and atomics is a bad idea https://stackoverflow.com/questions/49622713/c17-atomics-and-condition-variable-deadlock This also adds some ifdefs to disable the death test for mobile, xplat and tsan builds since forking doesn't play nicely with them. Test Plan: buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 1000 test_atomic_iter_with_concurrent_steps --timeout 120 buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 100 buck test mode/opt caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 no timeouts https://www.internalfb.com/intern/testinfra/testconsole/testrun/7036874440059883/ will ensure no timeouts in OSS Reviewed By: walterddr, dahsh Differential Revision: D24165505 fbshipit-source-id: 17cd23bfbcd9c2826a4067a387023d5186353196	2020-10-08 14:17:30 -07:00
Nick Gibson	402abdfdf4	[NNC] cacheAccesses transform (cache_reads + cache_writes) (#45869 ) Summary: Adds a new transform to the NNC compiler, which adds support for buffer access caching. All accesses within a provided scope are redirected to a cache which is initialized or written back as necessary at the boundaries of that scope. For TVM fans, this is essentially a combination of cache_reads and cache_writes. E.g. it can do this kind of thing: Before: ``` for (int i = 0; i < 64; i++) { for (int j = 0; j < 64; j++) { A[i, j] = i * j; } } for (int i_1 = 0; i_1 < 20; i_1++) { for (int j_1 = 0; j_1 < 10; j_1++) { B[i_1, j_1] = (A(i_1 + 30, j_1 + 40)) + (A(i_1 + 31, j_1 + 41)); } ``` After `cacheAccesses(A->buf(), "A_local", j_loop);` ``` for (int i = 0; i < 64; i++) { for (int j = 0; j < 64; j++) { A[i, j] = i * j; } } for (int i_1 = 0; i_1 < 20; i_1++) { for (int i_2 = 0; i_2 < 2; i_2++) { for (int j_1 = 0; j_1 < 11; j_1++) { A_local[i_2, j_1] = A[(i_2 + i_1) + 30, j_1 + 40]; } } for (int j_2 = 0; j_2 < 10; j_2++) { B[i_1, j_2] = (A_local[1, j_2 + 1]) + (A_local[0, j_2]); } } ``` Or this reduction: ``` for (int l1 = 0; l1 < 4; l1++) { sum[l1] = 0.f; for (int n1_1 = 0; n1_1 < 3; n1_1++) { for (int m1_1 = 0; m1_1 < 2; m1_1++) { sum[l1] = (sum[l1]) + (scale[(6 * l1 + 2 * n1_1) + m1_1]); } } } ``` After `l.cacheAccesses(d->buf(), "d_local", n_loop);`: ``` for (int l1 = 0; l1 < 4; l1++) { Allocate(d_local, float, {1}); sum[l1] = 0.f; d_local[0] = 0.f; for (int n1_1 = 0; n1_1 < 3; n1_1++) { for (int m1_1 = 0; m1_1 < 2; m1_1++) { d_local[0] = (d_local[0]) + (scale[(6 * l1 + 2 * n1_1) + m1_1]); } } sum[l1] = (sum[l1]) + (d_local[0]); Free(d_local); } ``` I had originally planned to write `cacheReads` and `cacheWrites` wrappers so we could use them just like their TVM cousins, but they just ended up being big masses of checking that reads or writes weren't present. Didn't feel too useful so I removed them, but let me know. This is based on bounds inference and inherits a few bugs present in that functionality, which I will address in a followup. While working on this I realized that it overlaps heavily with `computeAt`: which is really just `cacheReads` + `computeInline`. I'm considering refactoring computeAt to be a wrapper around those two transforms. ZolotukhinM opinions on this? Pull Request resolved: https://github.com/pytorch/pytorch/pull/45869 Reviewed By: mruberry Differential Revision: D24195276 Pulled By: nickgg fbshipit-source-id: 36a58ae265f346903187ebc4923637b628048155	2020-10-08 14:13:28 -07:00
Michael Suo	8e8fb8542e	upgrade clang-tidy to 11 (#46043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46043 As title, this is necessary for some internal linter thing Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24197316 Pulled By: suo fbshipit-source-id: 07e69fd6ce1937a0caa5838d6995eeed1be5162d	2020-10-08 13:52:58 -07:00
Ivan Yashchuk	f010df35e5	Added CUDA support for complex input for QR decomposition (#45032 ) Summary: QR decomposition now works for complex inputs on GPU. Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45032 Reviewed By: ailzhang Differential Revision: D24199105 Pulled By: anjali411 fbshipit-source-id: 249552b31fd713446e609b66e508ac54b817b98e	2020-10-08 13:24:21 -07:00
peter	5f7545adf6	Update randomtemp to v0.3 (#46025 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45982. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46025 Reviewed By: walterddr, mruberry Differential Revision: D24197124 Pulled By: malfet fbshipit-source-id: fcb96655375ed7b6c784a5170c6a27e7e13465f1	2020-10-08 12:12:02 -07:00
Elias Ellison	1197a38a63	[JIT] Bind log1p and lgamma (#45791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45791 Most of the lowering for log1p and lgamma already existed, add JIT integration. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D24169536 Pulled By: eellison fbshipit-source-id: a009c77a3471f3b5d378bad5de6d8e0880e9da3c	2020-10-08 12:06:34 -07:00
Elias Ellison	338283057b	[JIT] [3/3] Make sure fusion occurs in test_tensorexpr (#45790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45790 Making sure that more tests invoke a run with a Fusion Group. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D24169534 Pulled By: eellison fbshipit-source-id: a2666df53fbb12c64571e960f59dbe94df2437e4	2020-10-08 12:06:25 -07:00
Elias Ellison	564296f051	[2/3] [JIT] Make sure fusion occurs in test_tensorexpr (#45789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45789 Making sure that more tests invoke a run with a Fusion Group. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D24169535 Pulled By: eellison fbshipit-source-id: 54d7af434772ba52144b12d15d32ae30460c0c3c	2020-10-08 12:06:16 -07:00
Elias Ellison	1b97ffa07a	[1/3] [JIT] Make sure fusion occurs in test_tensorexpr file (#45788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45788 We were only running the traced graph once, which would not yet have been fused at that point. We should run for num_profiled_runs + 1, and also assert that all nodes in the graph were fused. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D24169537 Pulled By: eellison fbshipit-source-id: 8499bb1a5bd9d2221b1f1c54d6352558cf07ba9a	2020-10-08 12:02:57 -07:00
Heitor Schueroff de Souza	636eb18029	Fixed median nan propagation and implemented nanmedian (#45847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45847 Original PR here https://github.com/pytorch/pytorch/pull/45084. Created this one because I was having problems with ghstack. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24136629 Pulled By: heitorschueroff fbshipit-source-id: dd7c7540a33f6a19e1ad70ba2479d5de44abbdf9	2020-10-08 11:20:21 -07:00
Bugra Akyildiz	298e0e0d57	Refactor gather_ranges_to_dense from Python to C++ (#46021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46021 Refactor gather_ranges_to_dense from Python to C++ https://www.internalfb.com/intern/tasks/?t=71935517 Test Plan: General build/test: ``` buck build -c python.helpers=true fbcode/caffe2 buck test -c python.helpers=true fbcode/caffe2 ``` Specific Test: ```buck test mode/dev-nosan //caffe2/torch/fb/sparsenn:test -- 'test_gather_ranges_to_dense $caffe2\.torch\.fb\.sparsenn\.tests\.sparsenn_operators_test\.SparseNNOperatorsTest$' ``` Reviewed By: houseroad Differential Revision: D23858186 fbshipit-source-id: 8bce7c279275c8ff7316901b455e1d1dd7e36b13	2020-10-08 11:03:06 -07:00
Kurt Mohler	d360402f34	Use out variants of functions used by linalg.norm, where possible (#45641 ) Summary: Closes https://github.com/pytorch/pytorch/issues/45669 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45641 Reviewed By: ngimel Differential Revision: D24186731 Pulled By: mruberry fbshipit-source-id: 7e3d12ef34704bf461b8de19830e7b2f73f3739b	2020-10-08 10:55:35 -07:00
Thomas Viehmann	d3d8da7a8e	Enable CUDA Fuser for ROCm (#45965 ) Summary: This enables the cuda fuser on ROCm and enables tests for them. Part of this patch is based on work of Rohith Nallamaddi, thank you. Errors are my own, of course. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45965 Reviewed By: seemethere Differential Revision: D24170457 Pulled By: walterddr fbshipit-source-id: 3dd25b3501a41d2f00acba3ce8642ce51c49c9a6	2020-10-08 10:41:56 -07:00
Michael Suo	40828b68e1	Revert D24099167: [HTE @ clang-tidy] Enable clang-tidy configs inheretence for caffe2 project Test Plan: revert-hammer Differential Revision: D24099167 (`d93cae00f2`) Original commit changeset: 2e092fe678ad fbshipit-source-id: bbc73556a1b4d341c2db445fe4ebfb6ee6ba269f	2020-10-08 10:30:50 -07:00
gunandrose4u	283ae1998c	Pin libuv to 1.39 in Windows CI in order to keep version alignment in read me document (#46015 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/46015 Reviewed By: mruberry Differential Revision: D24193319 Pulled By: mrshenli fbshipit-source-id: b300116e7ed189a888cb980b63c67d1d402b01b9	2020-10-08 10:06:05 -07:00
Hao Lu	ea4fbb2e5e	[StaticRuntime] Replace hashtable based workspace with vector<IValue> (#45892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45892 Previously we were using hashtable (`std::unordered_map` in OSS, `folly::F14FastMap` in fb) for workspace, a container for all the IValues in the graph. Hashtable based lookups can be expensive. This diff replaces the hashtable with `std::vector` and extra bookkeepings are introduced to keep track of the indices of graph inputs/outputs in `StaticRuntime` and op inputs/outputs in `ProcessedNode`. Reviewed By: dzhulgakov Differential Revision: D24098763 fbshipit-source-id: 337f835ee144985029b5fa2ab98f9bcc5e3606b6	2020-10-08 09:50:30 -07:00
Jeffrey Wan	735d5b8907	Add complex32 dtype support to CPU/GPU implementation of (#45339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45339 Test Plan: Imported from OSS GPU implementation already works as-is: $ python -c "import torch; a = torch.tensor([1j], dtype=torch.complex32, device=torch.device('cuda')); b = a.clone(); print(b); print(a)" tensor([0.+1.j], device='cuda:0', dtype=torch.complex32) tensor([0.+1.j], device='cuda:0', dtype=torch.complex32) Test for CPU implementation: $ python -c "import torch; a = torch.tensor([1j], dtype=torch.complex32); b = a.clone(); print(b); print(a)" tensor([0.+1.j], dtype=torch.complex32) tensor([0.+1.j], dtype=torch.complex32) Reviewed By: malfet Differential Revision: D23932649 Pulled By: soulitzer fbshipit-source-id: 394b6e1f3d462ee8a010f56f4bb8404af92a066b	2020-10-08 09:29:25 -07:00
Shijun Kong	7d4f5060ad	Fix doc about operator benchmark (#45853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45853 The method name in README is not consistent with actual implementation. Reviewed By: qizzzh Differential Revision: D24114849 fbshipit-source-id: d979e324c768708e99b8cc5b87e261f17c22a883	2020-10-08 09:13:53 -07:00
Taras Galkovskyi	acca11b898	[torchscript] Verbose logging of code location causing the error (#45908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45908 As per subj, existing logging does not explain the cause of the error Test Plan: unit tests pass. Reviewed By: SplitInfinity Differential Revision: D23609965 fbshipit-source-id: 818965176f7193c62035e3d2f0547bb525fea0fb	2020-10-08 06:15:49 -07:00
Natalia Gimelshein	52f2db752d	unify reproducibility notes (#45748 ) Summary: Many of our functions contain same warnings about results reproducibility. Make them use common template. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45748 Reviewed By: colesbury Differential Revision: D24089114 Pulled By: ngimel fbshipit-source-id: e6aa4ce6082f6e0f4ce2713c2bf1864ee1c3712a	2020-10-08 02:14:57 -07:00
Ivan Murashko	d93cae00f2	[HTE @ clang-tidy] Enable clang-tidy configs inheretence for caffe2 project Summary: The primary HTE configuration (for `HTE@clang-tidy` project) is stored at the parent config `~/fbsource/fbcode.clang-tidy`. The diff enables inheretence of that configuration. Note: `facebook-hte-` checks will not be used until switch to HTE2clang-tidy be made. Note: `clang-diagnostic-*` will start work. As result clang warning messages can be dublicated: one time from HTE and another time from clang-diagnostic Test Plan: N/A Reviewed By: wfarner Differential Revision: D24099167 fbshipit-source-id: 2e092fe678ad3e53a4cef301ce1cb737cf8401e7	2020-10-08 01:35:55 -07:00
Jonathan Conder	9dc9a55bc4	Fix TypeError when torch.jit.load is passed a pathlib.Path (#45825 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45824 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45825 Reviewed By: VitalyFedyunin Differential Revision: D24129441 Pulled By: gmagogsfm fbshipit-source-id: 52a76e39c163206cee2d19967e333e948adefe99	2020-10-08 01:29:29 -07:00
Mikhail Zolotukhin	6e4de44501	[TensorExpr] LoopNest: add a constructor that takes Stmt instead of list of Tensors. (#45949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45949 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D24156001 Pulled By: ZolotukhinM fbshipit-source-id: 6f4f050b04e802e274c42ed64be74c21ba79c29f	2020-10-08 00:58:13 -07:00
Mikhail Zolotukhin	1036b77416	[TensorExpr] LoopNest: replace output_tensors_ with output_bufs_. (#45948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45948 No functionality changes expected, it's just a preparation for further changes in the LoopNest interface. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D24156000 Pulled By: ZolotukhinM fbshipit-source-id: f95ab07aac0aba128bc4ed5376a3251ac9c31c06	2020-10-08 00:58:10 -07:00
Mikhail Zolotukhin	29da553dd9	[TensorExpr] Loopnest: unify intermediate_tensors_ and temp_bufs_. (#45947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45947 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D24155999 Pulled By: ZolotukhinM fbshipit-source-id: d82acf6aba570f6a675eea683c306088e2a41f91	2020-10-08 00:58:08 -07:00
Mikhail Zolotukhin	598caddd93	[TensorExpr] Add shorthand versions for `splitWith{Mask,Tail}` functions. (#45946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45946 Also, make these functions static - they are not using anything from `LoopNest` and can be applied to any `Stmt`. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D24156002 Pulled By: ZolotukhinM fbshipit-source-id: 1c7d205f85a2a1684e07eb836af662f10d0a50fc	2020-10-08 00:58:06 -07:00
Mikhail Zolotukhin	b65ffa365c	[TensorExpr] Nuke `Function` class and directly use `Tensor` instead. (#45936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45936 `Tensor` has been a view into a `Function` that was supposed to be used for a more general case when we have multiple computations over the same domain (aka multiple output functions). We have never got to a point where we need this and now have other ideas in mind on how to support this case if need be. For now, let's just nuke `Function` to reduce the overall system complexity. The change should not affect any existing behavior. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D24153214 Pulled By: ZolotukhinM fbshipit-source-id: 26d5f11db5d661ff5e1135f4a49eff1c6d4c1bd5	2020-10-08 00:55:31 -07:00
Yinghai Lu	c9caa828f5	Throw special exception when backend compilation is met with fatal error (#45952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45952 Pull Request resolved: https://github.com/pytorch/glow/pull/4967 When glow compilation meets with nonrecoverable fatal error (hardware is busted), we would like to throw a special exception other than the normal caffe2::EnforceNotMet so that we can signal the upper layer application to handle it differently. Test Plan: Manually code some error and add LOG(FATAL) in the special exception path and wait for application to fatal. Reviewed By: ipiszy Differential Revision: D24156792 fbshipit-source-id: 4ae21bb0d36c89eac331fc52dd4682826b3ea180	2020-10-08 00:46:01 -07:00
Yinghai Lu	a92b49f7c8	[Onnxifi] Don't throw exception when we cannot write out debug files (#45979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45979 For some reason, sometime we cannot write out the debug files. This shouldn't block the whole service. Hence, we opt in to error out instead of throw error. Test Plan: Run net_runner test at `/` and observe error being printed out but the test passes. Reviewed By: ipiszy Differential Revision: D24165081 fbshipit-source-id: a4e1d0479d54d741e615e3a00b3003f512394fd4	2020-10-08 00:18:24 -07:00
Peter Bell	99d3f37bd4	Run gradgradcheck on torch.fft transforms (#46004 ) Summary: Ref https://github.com/pytorch/pytorch/issues/42175 As already noted in the `torch.fft` `gradcheck` tests, `gradcheck` isn't fully working for complex types yet and the function inputs need to be real. A similar workaround for `gradgradcheck` works, viewing the complex outputs as real before returning them makes `gradgradcheck` pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46004 Reviewed By: ngimel Differential Revision: D24187000 Pulled By: mruberry fbshipit-source-id: 33c2986b07bac282dff1bd4f2109beb70e47bf79	2020-10-08 00:02:05 -07:00
Nikita Shulga	c19b9cd18d	Add torch::cuda::ncll::all2all (#45900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45900 Use `torch:cuda::nccl:all2all` from `ProcesGroupNCCL.cpp` Fixes https://github.com/pytorch/pytorch/issues/42517 Here is a NCCL dependency graph: ``` libnccl.a --> libtorch_cuda.so ---> libtorch_python.so \| ^ \| \| --------> libc10d.a ----------------- ``` When static library is linked into a dynamic library or an executable, linker is removes all unused/duplicate symbols from that library, unless `-whole-archive` option is used. Before https://github.com/pytorch/pytorch/pull/42514 all nccl call made from `ProcessGroupNCCL.cpp` were also made from `torch/csrc/cuda/nccl.cpp`, which is compiled as part of `libtorch_cuda.so` But adding `ncclSend`\|`ncclRecv` to ProcesGroupNCCL.cpp forced linker to embed those into `libtorch_python.so`, which also resulted in linking other dependent symbols into the library. This PR adds `nccl[Send\|Recv]` call to `torch_cuda.so` by implementing `all2all` in `torch_cuda` and thus avoids double linking the static library. More involved, but prone solution, would be to use wrappers exported in `torch::cuda::nccl` namespace, instead of making direct NCCL API calls. Test Plan: Imported from OSS Reviewed By: mingzhe09088 Differential Revision: D24138011 Pulled By: malfet fbshipit-source-id: 33305197fc7d8707b7fd3a66b543f7733b9241a1	2020-10-07 23:56:31 -07:00
Kurt Mohler	ef4817fe5a	Add `tensor_split` function, based on `numpy.array_split` (#45168 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/9382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45168 Reviewed By: ngimel Differential Revision: D24166164 Pulled By: mruberry fbshipit-source-id: 795459821e52885bc99623a01a2abec060995ce6	2020-10-07 23:14:48 -07:00
Xiang Gao	b2bff9e431	Workaround for cublas bug for 45724 (#46001 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45724 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46001 Reviewed By: mruberry Differential Revision: D24184058 Pulled By: ngimel fbshipit-source-id: 7d2bab3206ddbc10a7cae3efd9b5e253f38400a9	2020-10-07 22:38:19 -07:00
Peter Bell	8d14b50e94	codegen: Improve array default handing (#45163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45163 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D24132279 Pulled By: mruberry fbshipit-source-id: 77069e7526b35cf8d13ba448e313c90f20cc67cf	2020-10-07 22:27:28 -07:00
James Reed	00b8ebe60c	[FX] Preserve type annotations on generated code in Graph (#45880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45880 Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D24127303 Pulled By: jamesr66a fbshipit-source-id: 3a042bcfb0bf9f58ac318cc814dfc3cca683c7f8	2020-10-07 21:34:47 -07:00
Nikita Shulga	81d40aaf96	Add `[zc]heevd` to the list of MKL symbols exported from torch_cpu (#46002 ) Summary: cpu implementation of `torch.symeig` uses `[zc]heev`, but MAGMA only have `d`-suffixed flavors of those functions Fixes https://github.com/pytorch/pytorch/issues/45922 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46002 Reviewed By: walterddr Differential Revision: D24177730 Pulled By: malfet fbshipit-source-id: 0e9aeb60a83f8a4b8ac2a86288721bd362b6040b	2020-10-07 20:50:10 -07:00
Your Name	c59c4b0d77	Fix cholesky TF32 tests (#45492 ) Summary: This test is changed one day before the landing of the tf32 tests PR, therefore the fix for this is not included in that PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45492 Reviewed By: ezyang Differential Revision: D24101876 Pulled By: ngimel fbshipit-source-id: cb3615b2fb8acf17abe54cd18b1faec26582d6b6	2020-10-07 20:42:06 -07:00
Xiang Gao	903acc6b83	CUDA BFloat16 support of clamp, remainder, lshift, rshift (#45247 ) Summary: Add CUDA BFloat16 support of clamp, remainder, lshift, rshift Pull Request resolved: https://github.com/pytorch/pytorch/pull/45247 Reviewed By: dzhulgakov Differential Revision: D24174258 Pulled By: ngimel fbshipit-source-id: bfcd2d1b3746bb0527d590533f3c38b9c4d0a638	2020-10-07 20:37:06 -07:00
Rohan Varma	154347d82f	Fix distributed documentation for asynchronous collective Work objects (#45709 ) Summary: Closes https://github.com/pytorch/pytorch/issues/42247. Clarifies some documentation related to `Work` object semantics (outputs of async collective functions). Clarifies the difference between CPU operations and CUDA operations (on Gloo or NCCL backend), and provides an example where the difference in CUDA operation's wait() semantics is necessary to understand for correct code. ![sync](https://user-images.githubusercontent.com/8039770/94875710-6f64e780-040a-11eb-8fb5-e94fd53534e5.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45709 Reviewed By: ngimel Differential Revision: D24171256 Pulled By: rohan-varma fbshipit-source-id: 6365a569ef477b59eb2ac0a8a9a1c1f34eb60e22	2020-10-07 19:59:51 -07:00
Nick Gibson	19da1d22fe	[NNC] Registerizer V2, supporting partial and conditional replacement (#45574 ) Summary: This is a rewrite of the Registerizer, supporting scalar replacement in vastly more situations. As a refresher, the registerizer does this: Before: ``` A[0] = 0; for (int x = 0; x < 10; x++) { A[0] = (A[0]) + x; } ``` After: ``` int A_ = 0; for (int x = 0; x < 10; x++) { A_ = x + A_; } A[0] = A_; ``` Which can greatly reduce the number of accesses to main memory in a kernel. There are cases where doing this gets complicated, and the existing implementation bails out whenever encountering multiple partial overlaps of the same buffer, or conditional accesses under any circumstances. This makes it much less useful in the presence of complex (ie. real world not example) kernels. This new version should work optimally in almost all cases (I have a few minor follow ups). I tested this version extensively, and found quite a few bugs in the original implementation I'd prefer not to back port fixes for - so I'm in favor of landing this even if we don't immediately see a perf win. I believe the killer app for this kind of optimization is fused reductions and we haven't enabled many examples of that yet. It is safe to move two accesses of the same Tensor element to a local scalar Var if between all usages of the element there are no other Loads or Stores that may refer to it. In the comments I refer to this as overlapping the access, or "cutting" the existing AccessInfo. In the case where a candidate for registerization is cut, it may be possible to finalize the access early by writing it back to the Tensor and then create a new scalar variable after the overlapping access is complete. We will attempt to do this when it saves memory accesses. There are a few cases that make this more challenging: - For: Loops change the number of real usages of a buffer by the loop extent, but only if we can pull the definition and finalization of the scalar variable out of the loop block. For loops often create accesses which are conditional on a loop var and will overlap large ranges of elements. E.g. Before: ``` A[0] = 2; for (int x1 = 0; x1 < 10; x1++) { A[0] = (A[0]) + x1; } for (int x2 = 1; x2 < 10; x2++) { A[x2] = A[x2 - 1]; } for (int x3 = 0; x3 < 10; x3++) { A[0] = (A[0]) + x3; } ``` After: ``` int A_1 = 2; for (int x1 = 0; x1 < 10; x1++) { A_1 = A_1 + x1; } A[0] = A_1; for (int x2 = 1; x2 < 10; x2++) { A[x2] = A[x2 - 1]; } int A_2 = A[0]; for (int x3 = 0; x3 < 10; x3++) { A_2 = A_2 + x3; } A[0] = A_2; ``` - Cond: Conditions complicate lifting scalars out of internal scopes. Generally we cannot lift an access outside of a conditional scope unless there is already a reference to that same access at the higher scope, since we don't know if the condition was guarding an array access not safe at the higher scope. In the comments I refer to this as the condition "hiding" the access, and the outer access "unhiding" it. E.g. this example: ``` if (x<5 ? 1 : 0) { A[x] = (A[x]) + 1; } A[x] = (A[x]) + 1; if (x>5 ? 1 : 0) { A[x] = (A[x]) + 1; } ``` The A[x] access can be registerized due to the unconditional access between the two conditions: ``` int A_1 = A[x]; if (x<5 ? 1 : 0) { A_1 = A_1 + 1; } A_1 = A_1 + 1; if (x>5 ? 1 : 0) { A_1 = A_1 + 1; } A[x] = A_1; ``` But this example has no accesses that can be registerized: ``` if (x<5 ? 1 : 0) { A[x] = (A[x]) + 1; } if (x>5 ? 1 : 0) { A[x] = (A[x]) + 1; } ``` - IfThenElse: Same situation as Cond, except since IfThenElse is an Expr rather than a Stmt we cannot insert the scalar definition or finalizer within the conditional scope. Accesses inside an IfThenElse can be safely combined with external accesses but cannot exist completely within. E.g in this example the `B[x]` cannot be registerized as there is no safe place to define it. ``` A[x] = IfThenElse(x<3 ? 1 : 0, (B[x]) + (B[x]), B[x]); ``` But the equivalent kernel using Cond can be registerized: ``` if (x<3 ? 1 : 0) { float B_1 = B[x]; A[x] = B_1 + B_1; } else { A[x] = B[x]; } ``` - Let: Accesses dependent on local variables via Let Stmts, or loop vars, cannot be raised outside of the scope of the dependent var. E.g. no accesses in this example can be registerized: ``` for (int x = 0; x < 10; x++) { int y = 30; A[y] = x + (A[y]); } ``` But they can in this example: ``` int y = 30; for (int x = 0; x < 10; x++) { A[y] = x + (A[y]); } ``` Testing The majority of this PR is tests, over 3k lines of them, because there are many different rules to consider and they can interact together more or less arbitrarily. I'd greatly appreciate any ideas for situations we could encounter that are not covered by the tests. Performance Still working on it, will update. In many FastRRNS sub kernels this diff reduces the number of total calls to Store or Load by 4x, but since those kernels use Concat very heavily (meaning a lot of branches) the actual number encountered by any particular thread on GPU is reduced only slightly. Overall perf improved by a very small amount. Reductions is where this optimization should really shine, and in particular the more complex the kernel gets (with extra fusions, etc) the better this version of the registerizer should do compared the existing version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45574 Reviewed By: albanD Differential Revision: D24151517 Pulled By: nickgg fbshipit-source-id: 9f0b2d98cc213eeea3fda16fee3d144d49fd79ae	2020-10-07 18:17:27 -07:00
Venkata Chintapalli	a36f11a3a5	[FakeLowP] T76913842 Make AddFakeFp16 take int inputs (#45992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45992 Created a template version of AddFakeFp16 to take both float and int inputs. Test Plan: notebook with local bento kernel: N369049 Reviewed By: amylittleyang Differential Revision: D24169720 fbshipit-source-id: 679de391224f65f6c5b3ca890eb0d157f09712f6	2020-10-07 17:43:00 -07:00
Elias Ellison	c86655a815	[JIT] Fix Dict bug in constant hashing (#45929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45929 We were checking `and` when we should have been checking `or`. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D24148804 Pulled By: eellison fbshipit-source-id: 9c394ea10ac91a588169d934b1e3208512c71b9d	2020-10-07 17:40:17 -07:00
Elias Ellison	72e4f51bc0	[JIT] fix dict update (#45857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45857 Fix for https://github.com/pytorch/pytorch/issues/45627 Op was calling `insert` instead of `insert_or_assign`, so it wouldn't overwrite an existing key. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D24148805 Pulled By: eellison fbshipit-source-id: bf39c71d5d928890b82cff1a9a0985dc47c1ffac	2020-10-07 17:36:02 -07:00
Natalia Gimelshein	de0d0bd5ee	Revert D24093032: Improve logging in ProcessGroupNCCL for debugging purposes. Test Plan: revert-hammer Differential Revision: D24093032 (`c8d76ff7dc`) Original commit changeset: 240b03562f8c fbshipit-source-id: dab7d54a5ba517bb308a1825b0d63ed146e5269d	2020-10-07 16:41:35 -07:00
Wanchao Liang	505be08c75	[dist_optim] serialize compilation when creating dist_optim (#45871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45871 Attempt to fix https://github.com/pytorch/pytorch/issues/45845 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D24125209 Pulled By: wanchaol fbshipit-source-id: e3697dd6ef107d8153d2a82d78a17c66d109b4fa	2020-10-07 15:10:41 -07:00
Andy Zhang	ce82b522c8	Define objects using classes instead of namedtuples in torch.utils.data._utils.worker (#45870 ) Summary: This PR fixes a bug when torch is used with pyspark, by converting namedtuples in `torch.utils.data._utils.worker` into classes. Before this PR, creating an IterableDataset and then running `list(torch.utils.data.DataLoader(MyIterableDataset(...), num_workers=2)))` will not terminate, if pyspark is also being used. This is because pyspark hijacks namedtuples to make them pickleable ([see here](https://github.com/apache/spark/blob/master/python/pyspark/serializers.py#L370)). So `_IterableDatasetStopIteration` would be modified, and then the check at [this line in dataloader.py](`5472426b9f/torch/utils/data/dataloader.py (L1072)`) is never true. Converting the namedtuples to classes avoids this hijack and allows the iteration to correctly stop when signaled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45870 Reviewed By: ngimel Differential Revision: D24162748 Pulled By: albanD fbshipit-source-id: 52f009784500fa594b2bbd15a8b2e486e00c37fb	2020-10-07 15:03:38 -07:00
Hao Lu	0927e02a6a	[caffe2] Do not run RemoveOpsByType on recurrent networks (#45986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45986 Recurrent networks have subnets that are not well supported by `RemoveOpsByType`. Here we exclude recurrent networks by adding the same check as in memonger. Test Plan: ``` buck test //caffe2/caffe2/fb/predictor:black_box_predictor_test ``` AdIndexer canary for sanity check: https://www.internalfb.com/intern/ads/canary/430059485214766620 Differential Revision: D24167284 fbshipit-source-id: fa90d1c1f34af334a599d879af09d4c0bf7c27bd	2020-10-07 14:07:52 -07:00
Pritam Damania	c8d76ff7dc	Improve logging in ProcessGroupNCCL for debugging purposes. (#45780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45780 When training jobs running with NCCL fail sometimes it is hard to debug the reason of the failure and our logging doesn't provide enough information at times to narrow down the issue. To improve the debugging experience, I've enhanced our logging to add a lot more information about what the ProcessGroup is doing under the hood. #Closes: https://github.com/pytorch/pytorch/issues/45310 Sample output: ``` > I1002 15:18:48.539551 1822062 ProcessGroupNCCL.cpp:528] [Rank 2] NCCL watchdog thread started! > I1002 15:18:48.539533 1821946 ProcessGroupNCCL.cpp:492] [Rank 2] ProcessGroupNCCL initialized with following options: > NCCL_ASYNC_ERROR_HANDLING: 0 > NCCL_BLOCKING_WAIT: 1 > TIMEOUT(ms): 1000 > USE_HIGH_PRIORITY_STREAM: 0 > I1002 15:18:51.080338 1822035 ProcessGroupNCCL.cpp:530] [Rank 1] NCCL watchdog thread terminated normally > I1002 15:18:52.161218 1821930 ProcessGroupNCCL.cpp:385] [Rank 0] Wrote aborted communicator id to store: NCCLABORTEDCOMM:a0e17500002836080c8384c50000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > I1002 15:18:52.161238 1821930 ProcessGroupNCCL.cpp:388] [Rank 0] Caught collective operation timeout for work: WorkNCCL(OpType=ALLREDUCE, TensorShape=[10], Timeout(ms)=1000) > I1002 15:18:52.162120 1821957 ProcessGroupNCCL.cpp:530] [Rank 0] NCCL watchdog thread terminated normally > I1002 15:18:58.539937 1822062 ProcessGroupNCCL.cpp:649] [Rank 2] Found key in store: NCCLABORTEDCOMM:a0e17500002836080c8384c50000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, from rank: 0, aborting appropriate communicators > I1002 15:19:34.740937 1822062 ProcessGroupNCCL.cpp:662] [Rank 2] Aborted communicators for key in store: NCCLABORTEDCOMM:a0e17500002836080c8384c50000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > I1002 15:19:34.741678 1822062 ProcessGroupNCCL.cpp:530] [Rank 2] NCCL watchdog thread terminated normally ``` ghstack-source-id: 113731163 Test Plan: waitforbuildbot Reviewed By: osalpekar Differential Revision: D24093032 fbshipit-source-id: 240b03562f8ccccc3d872538f5e331df598ceca7	2020-10-07 12:18:41 -07:00
Sam Estep	8fb32b9f55	Parametrize # of longest tests in print_test_stats (#45941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45941 This adds CLI options to the `test/print_test_stats.py` script for specifying how many of the longest tests should be printed. It also makes the following incidental changes: - The script now has a `--help` option to describe its usage. - The number of longest tests being shown is now displayed as a number, rather than in words. - The median time is now displayed with the label `median_time` instead of `mean_time`, is calculated using `statistics.median` instead of raw indexing and bit shifting, and is displayed even when there are only two tests in a class. Test Plan: Imported from OSS Reviewed By: walterddr, seemethere Differential Revision: D24154491 Pulled By: samestep fbshipit-source-id: 9fa402bf0fa56badd505f87f289ac9cca1862d6b	2020-10-07 11:49:36 -07:00
Guilherme Leobas	9679e1affc	annotate torch.autograd.* modules (#45004 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44638 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45004 Reviewed By: VitalyFedyunin Differential Revision: D24113562 Pulled By: ezyang fbshipit-source-id: a85018b7e08b2fe6cf2bc14a217eb418cb2b9de4	2020-10-07 10:53:41 -07:00
Jerry Zhang	83d2c9a232	[quant] Add quantized Sigmoid module (#45883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45883 Test Plan: python test/test_quantization.py TestStaticQuantizedModule.test_sigmoid Imported from OSS Reviewed By: z-a-f Differential Revision: D24129116 fbshipit-source-id: aa960549509c60374012f35b1f5be39e90418099	2020-10-07 10:33:18 -07:00
Nikita Vedeneev	30bf799f9c	`torch.matrix_exp` doc fix (#45909 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45909 Reviewed By: dzhulgakov Differential Revision: D24147314 Pulled By: albanD fbshipit-source-id: fc21094f4dbdd04cc2063a9639b9d1f5728cb53f	2020-10-07 10:23:37 -07:00
Yinghai Lu	b186831c08	Automatic update of fbcode/foxi to 6a4e19a2aaf7ae4b9fa9597526e65b395d5e79ad (#45951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45951 Pull Request resolved: https://github.com/pytorch/glow/pull/4966 Previous import was 4aba696ec8f31794fd42880346dc586486205e0a Included changes: - [6a4e19a](https://github.com/houseroad/foxi/commit/6a4e19a): Add fatal error value (#20) <Yinghai Lu> Test Plan: build Reviewed By: houseroad Differential Revision: D24156364 fbshipit-source-id: f833ada8d6586865e1831e2c8c632e3844c7b6a1	2020-10-07 09:55:52 -07:00
Jane Xu	5a2773702f	add test sharding to CUDA on linux (#45972 ) Summary: splits up all the cuda linux tests into 2 shards to decrease total test runtime Pull Request resolved: https://github.com/pytorch/pytorch/pull/45972 Reviewed By: malfet Differential Revision: D24163521 Pulled By: janeyx99 fbshipit-source-id: da6e88eb4305192fb287c4458c31199bf62354c0	2020-10-07 09:31:44 -07:00
neginraoof	5ce31b6f3f	[ONNX] Improve error handling for adaptive_pool (#45874 ) Summary: Duplicate of https://github.com/pytorch/pytorch/issues/43032 This update would also improve error handling for interpolate with 'area' mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45874 Reviewed By: albanD Differential Revision: D24141266 Pulled By: bzinodev fbshipit-source-id: 7559f1d6af4f1ef3507c15a1aee76fe01fa433cd	2020-10-07 09:20:35 -07:00
Rong Rong	1bb2d41b68	Revert D20850851: caffe2/plan_executor: wait for 1 minute after exception and then abort Test Plan: revert-hammer Differential Revision: D20850851 (`3fbddb92b1`) Original commit changeset: 330503775d80 fbshipit-source-id: 612c6c3c4d5586bc8ad00a112cd00fc74fb44243	2020-10-07 09:04:24 -07:00
Michael Carilli	5640b79bf8	Allow consumer ops to sync on GraphRoot's gradient (#45787 ) Summary: Currently, a GraphRoot instance doesn't have an associated stream. Streaming backward synchronization logic assumes the instance ran on the default stream, and tells consumer ops to sync with the default stream. If the gradient the GraphRoot instance passes to consumer backward ops was populated on a non-default stream, we have a race condition. The race condition can exist even if the user doesn't give a manually populated gradient: ```python with torch.cuda.stream(side_stream): # loss.backward() implicitly synthesizes a one-element 1.0 tensor on side_stream # GraphRoot passes it to consumers, but consumers first sync on default stream, not side_stream. loss.backward() # Internally to backward(), streaming-backward logic takes over, stuff executes on the same stream it ran on in forward, # and the side_stream context is irrelevant. GraphRoot's interaction with its first consumer(s) is the spot where # the side_stream context causes a problem. ``` This PR fixes the race condition by associating a GraphRoot instance, at construction time, with the current stream(s) on the device(s) of the grads it will pass to consumers. (i think this relies on GraphRoot executing in the main thread, before backward thread(s) fork, because the grads were populated on the main thread.) The test demonstrates the race condition. It fails reliably without the PR's GraphRoot diffs and passes with the GraphRoot diffs. With the GraphRoot diffs, manually populating an incoming-gradient arg for `backward` (or `torch.autograd.grad`) and the actual call to `autograd.backward` will have the same stream-semantics relationship as any other pair of ops: ```python # implicit population is safe with torch.cuda.stream(side_stream): loss.backward() # explicit population in side stream then backward in side stream is safe with torch.cuda.stream(side_stream): kickoff_grad = torch.ones_like(loss) loss.backward(gradient=kickoff_grad) # explicit population in one stream then backward kickoff in another stream # is NOT safe, even with this PR's diffs, but that unsafety is consistent with # stream-semantics relationship of any pair of ops kickoff_grad = torch.ones_like(loss) with torch.cuda.stream(side_stream): loss.backward(gradient=kickoff_grad) # Safe, as you'd expect for any pair of ops kickoff_grad = torch.ones_like(loss) side_stream.wait_stream(torch.cuda.current_stream()) with torch.cuda.stream(side_stream): loss.backward(gradient=kickoff_grad) ``` This PR also adds the last three examples above to cuda docs and references them from autograd docstrings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45787 Reviewed By: nairbv Differential Revision: D24138376 Pulled By: albanD fbshipit-source-id: bc4cd9390f9f0358633db530b1b09f9c1080d2a3	2020-10-07 08:53:53 -07:00
peterjc123	bb99bea774	Compress NVCC flags for Windows (#45842 ) Summary: Fixes #{issue number} This makes the command line shorter. Also updates `randomtemp` in which the previous version has a limitation that the length of the argument cannot exceed 260. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45842 Reviewed By: albanD Differential Revision: D24137088 Pulled By: ezyang fbshipit-source-id: f0b4240735306e302eb3887f54a2b7af83c9f5dc	2020-10-07 08:39:15 -07:00
James Reed	be45c3401a	[JIT] Make objects throw Python AttributeError on nonexistant attr access (#45911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45911 Test Plan: Imported from OSS Reviewed By: robieta Differential Revision: D24140971 Pulled By: jamesr66a fbshipit-source-id: 046a2cffff898efad5bcc36a41bf992f36f555f9	2020-10-07 01:57:29 -07:00
James Reed	8cdb638c62	[FX] Track use nodes in Node (#45775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45775 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D24091082 Pulled By: jamesr66a fbshipit-source-id: b09bb6ae78436a7722fb135b8ec71464ef9587cd	2020-10-07 00:15:04 -07:00
Zachary DeVito	205ab49612	[packaging] simpler dependency plotting (#45686 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45686 This uses an online graphviz viewer. The code is simpler, and since it embeds all the data in the url you can just click the url from your terminal. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D24059157 Pulled By: zdevito fbshipit-source-id: 94d755cc2986c4226180b09ba36f8d040dda47cc	2020-10-06 23:40:00 -07:00
Jerry Zhang	317b6516bc	[quant] Add quantized::sigmoid that take output_scale/output_zero_point as input (#45882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45882 Same changes as the stack for leaky_relu: https://github.com/pytorch/pytorch/pull/45702 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D24129113 fbshipit-source-id: a26da33f877d3bdeea1976b69b2bd9369c2bf196	2020-10-06 23:30:18 -07:00
Kurt Mohler	ed1552a48f	Add note about in-place weight modification for nn.Embedding (#45595 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26596 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45595 Reviewed By: albanD Differential Revision: D24143456 Pulled By: mruberry fbshipit-source-id: a884a32809105ce16959b40ec745ec873b3c8375	2020-10-06 23:11:39 -07:00
Peter Bell	8b39498a23	codegen: Allow string arguments to have defaults (#45665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45665 Fixes #43944 Note that the codegen doesn't use a proper parser so, in the same way as with lists, the string `, ` cannot appear in defaults or it will be interpreted as a splitting point between arguments. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24141835 Pulled By: ezyang fbshipit-source-id: 578127861fd2504917f4486c44100491a2c40343	2020-10-06 21:53:56 -07:00
Supriya Rao	1b31ed3ad6	[quant] Refactor qembeddingbag to remove duplicate code (#45881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45881 Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBagOps Imported from OSS Reviewed By: jerryzh168 Differential Revision: D24127892 fbshipit-source-id: 344ee71d335b8c1d668c647db88775632e099dbd	2020-10-06 21:11:55 -07:00
Supriya Rao	43dc7ef933	[quant] Support for 4-bit quantized EmbeddingBag module (#45865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45865 Test Plan: python test/test_quantization.py TestPostTrainingStatic.test_quantized_embedding_bag python test/test_quantization.py TestStaticQuantizedModule.test_embedding_bag_api Imported from OSS Reviewed By: jerryzh168 Differential Revision: D24120995 fbshipit-source-id: c55fc6b2cfd683d14d2a05be7c04f787fdf8cc79	2020-10-06 21:11:52 -07:00
Supriya Rao	11c32611d7	[quant] Support 4-bit embedding_bag operators using the dtype quint4x2 (#45752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45752 Use the torch.quint4x2 dtype to create 4-bit packed tensors in the previous PR. These packed tensors can be directly consumed by the operator. Serialization of the packed tensors is supported using torchbind custom class. Module support will follow in a later PR. Test Plan: python test/test_quantization.py TestEmbeddingBagOps Imported from OSS Reviewed By: jerryzh168 Differential Revision: D24120996 fbshipit-source-id: 2639353b3343ebc69e058b5ba237d3fc56728e1c	2020-10-06 21:11:49 -07:00
Supriya Rao	5c283fa292	[quant] Add 4-bit embedding_bag prepack/unpack support using quint4x2 (#45751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45751 Use the torch.quint4x2 dtype to create 4-bit packed tensors Test Plan: python test/test_quantization.py TestEmbeddingBagOps Imported from OSS Reviewed By: z-a-f Differential Revision: D24120997 fbshipit-source-id: 6aba2985715a346f6894cf43d5794e104a9ab061	2020-10-06 21:06:46 -07:00
Hao Lu	e8d8de32b4	[StaticRuntime] Implement StaticRuntime::benchmark (#45639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45639 `StaticRuntime::run_individual` is to mimic the caffe2 operator benchmark `SimpleNet::TEST_Benchmark`, so we can accurate information on the operator breakdown. We found that the PyTorch AutogradProfiler adds a lot of overhead to small models, such as the adindexer precomputation_merge net, 100% for batch_size 1, 33% for batch_size 20. This implementation adds very little overhead, as shown in the test plan. Test Plan: Test results are fb internal only. Reviewed By: yinghai, dzhulgakov Differential Revision: D24012088 fbshipit-source-id: f32eb420aace93e2de421a15e4209fce6a3d90f0	2020-10-06 20:54:43 -07:00
Rong Rong	275bb5e801	Fix flakiness in caffe2/test:serialization - test_serialization_new_format_old_format_compat (#45915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45915 Use temp file instead Test Plan: buck test mode/opt-asan //caffe2/test:serialization -- 'test_serialization_new_format_old_format_compat $test_serialization\.TestBothSerialization$' --run-disabled --jobs 18 --stress-runs 10 --record-results Reviewed By: malfet Differential Revision: D24142278 fbshipit-source-id: 9c88330fc5664d464daa9124e67644f497353f3b	2020-10-06 18:11:58 -07:00
Meghan Lele	4fdba30500	[JIT] Add API for ignoring arbitrary module attributes (#45262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45262 Summary This commit adds an API for ignoring arbitrary module attributes during scripting. A class attribute named `ignored_attributes` containing names of attributes to ignore can be added to the class of the instance being scripted. Attributes ignored in this fashion cannot be used in `forward`, methods used by `forward` or by `exported` methods. They are, however, copied to the `RecursiveScriptModule` wrapper and can be used by `ignored` methods and regular Python code. Test Plan This commit adds unit tests to `TestScriptPy3` to test this new API. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23971882 Pulled By: SplitInfinity fbshipit-source-id: 8c81fb415fde7b78aa2f87e5d83a477e876a7cc3	2020-10-06 18:02:06 -07:00
Nikita Shulga	49af421143	Embed callgrind headers (#45914 ) Summary: Because access to https://sourceware.org/git/valgrind.git can be really slow especially in some regions Pull Request resolved: https://github.com/pytorch/pytorch/pull/45914 Reviewed By: seemethere Differential Revision: D24144420 Pulled By: malfet fbshipit-source-id: a454c8c3182c570ec344bf6468bb5e55d8b8da79	2020-10-06 17:51:10 -07:00
Rong Rong	f5e70a7504	fix test flakiness caused by sys.getrefcount(None) (#45876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45876 sys.getrefcount() can be flaky before/after scope() call Test Plan: buck test mode/opt-asan //caffe2/test:others -- 'test_none_names_refcount $test_namedtensor\.TestNamedTensor$' --run-disabled Reviewed By: malfet Differential Revision: D24123724 fbshipit-source-id: 4af0b150222cfb92dd0776a42fcab44d896a772a	2020-10-06 17:32:07 -07:00
Bert Maher	624084e6d6	[te][llvm] Enable fused multiply-add (fma) in code generation (#45906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45906 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D24142404 Pulled By: bertmaher fbshipit-source-id: a8db2e66c1e65bbb255886e165a1773723cbcd20	2020-10-06 16:57:34 -07:00
Bert Maher	f2e569461b	[te] Tiled (m=32 x n=32) gemm benchmark (#45905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45905 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D24142402 Pulled By: bertmaher fbshipit-source-id: b39e18b6985ee1c1f654fba4498ed91ff14d8d5f	2020-10-06 16:57:31 -07:00
Bert Maher	50f89578dd	[te] Add a benchmark harness (#45875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45875 Adds a googlebenchmark harness for perf testing programs generated by tensorexpr, sans any pytorch wrappings (for python-level benchmarks of tensorexpr, see benchmarks/tensorexpr). Currently there's a harness for gemm that sets up the problem using torch (and also measures the perf of a torch::mm to give a baseline). Right now there's just an unoptimized implementation that is expected to be not very fast. More optimized versions are coming. Sample output from my dev box: ``` Run on (48 X 2501 MHz CPU s) CPU Caches: L1 Data 32K (x24) L1 Instruction 32K (x24) L2 Unified 256K (x24) L3 Unified 30720K (x2) -------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------- Gemm/Torch/128/128/128 73405 ns 73403 ns 8614 GFLOPS=57.1411G/s Gemm/TensorExprNoopt/128/128/128 3073003 ns 3072808 ns 229 GFLOPS=1.36497G/s ``` Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D24142403 Pulled By: bertmaher fbshipit-source-id: 3354aaa56868a43a553acd1ad9a192f28d8e3597	2020-10-06 16:57:27 -07:00
Bert Maher	5ff31620b7	[te] Add a 2D convolution example test (#45514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45514 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D24142405 Pulled By: bertmaher fbshipit-source-id: 8f064d0638b48f55a732c08938b9fcf1ba3f0415	2020-10-06 16:54:46 -07:00
Jerry Zhang	14997f2125	[quant][graphmode][fx] Add warning for unsupported case (#45714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45714 Hit the problem when writing a test like following: ``` class M(...): def forward(self, x): x = x.some_op() return x ``` we need to know the scope of `x` to figure out the qconfig for `x` Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D24069959 fbshipit-source-id: 95ac8963c802ebce5d0e54d55f5ebb42085ca8a6	2020-10-06 15:33:34 -07:00
Ansley Ussery	5072728d88	Fix stride printing/parsing formatting (#45156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45156 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24078695 Pulled By: ansley fbshipit-source-id: dab993277d43b31105c38d12098c37653747b42a	2020-10-06 15:06:46 -07:00
lixinyu	255b0e839f	C++ APIs CUDA Stream Note (Set/Get part) (#45754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45754 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24085103 Pulled By: glaringlee fbshipit-source-id: c9641c2baadcf93b84733c037ce91b670dde5f96	2020-10-06 14:57:16 -07:00
anjali411	a3662fa78c	Minor gradcheck update to reduce computations (#45757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45757 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24137143 Pulled By: anjali411 fbshipit-source-id: e0174ec03d93b1fedf27baa72c3542dac0b70058	2020-10-06 13:59:01 -07:00
Vaidotas Simkus	e154b36685	Standardized clamp kernels to Numpy-like implementation (#43288 ) Summary: BC-breaking note For ease of exposition let a_min be the value of the "min" argument to clamp, and a_max be the value of the "max" argument to clamp. This PR changes the behavior of torch.clamp to always compute min(max(a, a_min), a_max). torch.clamp currently computes this in its vectorized CPU specializations: `78b95b6204/aten/src/ATen/cpu/vec256/vec256_double.h (L304)` but in other places it clamps differently: `78b95b6204/aten/src/ATen/cpu/vec256/vec256_base.h (L624)` `78b95b6204/aten/src/ATen/native/cuda/UnaryOpsKernel.cu (L160)` These implementations are the same when a_min < a_max, but divergent when a_min > a_max. This divergence is easily triggered: ``` t = torch.arange(200).to(torch.float) torch.clamp(t, 4, 2)[0] : tensor(2.) torch.clamp(t.cuda(), 4, 2)[0] : tensor(4., device='cuda:0') torch.clamp(torch.tensor(0), 4, 2) : tensor(4) ``` This PR makes the behavior consistent with NumPy's clip. C++'s std::clamp's behavior is undefined when a_min > a_max, but Clang's std::clamp will return 10 in this case (although the program, per the above comment, is in error). Python has no standard clamp implementation. PR Summary Fixes discrepancy between AVX, CUDA, and base vector implementation for clamp, such that all implementations are consistent and use min(max_vec, max(min_vec, x) formula, thus making it equivalent to numpy.clip in all implementations. The same fix as in https://github.com/pytorch/pytorch/issues/32587 but isolated to the kernel change only, so that the internal team can benchmark. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43288 Reviewed By: colesbury Differential Revision: D24079453 Pulled By: mruberry fbshipit-source-id: 67f30d2f2c86bbd3e87080b32f00e8fb131a53f7	2020-10-06 13:42:08 -07:00
Sam Estep	a69a78daa2	Use smaller N to speed up TestForeach (#45785 ) Summary: Between September 25 and September 27, approximately half an hour was added to the running time of `pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test`. Judging from the CircleCI data, it looks like the majority of the new time was added by the following PRs: - https://github.com/pytorch/pytorch/issues/44550 - https://github.com/pytorch/pytorch/issues/45298 I'm not sure what to do about https://github.com/pytorch/pytorch/issues/44550, but it looks like https://github.com/pytorch/pytorch/issues/45298 increased the `N` for `TestForeach` from just 20 to include both 30 and 300. This PR would remove the 300, decreasing the test time by a couple orders of magnitude (at least when running it on my devserver), from over ten minutes to just a few seconds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45785 Reviewed By: malfet Differential Revision: D24094782 Pulled By: samestep fbshipit-source-id: 2476cee9d513b2b07bc384de751e08d0e5d8b5e7	2020-10-06 13:29:04 -07:00
n-v-k	c1af91a13a	[caffe2] SliceOp axes indexing fixes. (#45432 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45431 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45432 Reviewed By: albanD Differential Revision: D24132547 Pulled By: dzhulgakov fbshipit-source-id: d67f7a92d806fb8ac8fc8f522b251d3a8fb83037	2020-10-06 13:21:08 -07:00
Tristan Rice	3fbddb92b1	caffe2/plan_executor: wait for 1 minute after exception and then abort (#45297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45297 If we have two concurrent substeps and one of them throws an exception and the other is blocking, we'll currently hang. This waits up to 1 minute for it to complete before terminating the process. Test Plan: buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 Reviewed By: dahsh Differential Revision: D20850851 fbshipit-source-id: 330503775d8062a34645ba55fe38e6770de5e3c7	2020-10-06 12:59:09 -07:00
Yanan Cao	64681d6bec	Add all remaining method declarations from torch.distributed Python API to C++ (#45768 ) Summary: Also ran formatter on previous sections Pull Request resolved: https://github.com/pytorch/pytorch/pull/45768 Reviewed By: wanchaol Differential Revision: D24129467 Pulled By: gmagogsfm fbshipit-source-id: aa8a5c45c3609d5b96e5f585b699d9e3e71394c8	2020-10-06 12:36:36 -07:00
Jerry Zhang	0da6730f02	[quant][graphmode][fx][eagermode] Add leaky relu support in quantization workflows (#45712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45712 Eager mode will still be able to use functional leaky relu, but it will be less accurate than LeakyReLU module. FX graph mode will support both leaky relu functional and module Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D24069961 fbshipit-source-id: 8d91c3c50c0bcd068ba3072378ebb4da9549be3b	2020-10-06 12:16:04 -07:00
Pawel Garbacki	fb50fcaa82	[C2] Add string equality operator (#45886 ) Summary: This diff adds a string equality checking operator. Another attempt at reverted D24042344 (`cf48872d28`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45886 Test Plan: unit tests, github builds Reviewed By: dzhulgakov Differential Revision: D24129953 fbshipit-source-id: caa53c7eac5c67c414c37e9d93416104f72556b9	2020-10-06 12:08:26 -07:00
Xiao Wang	fcc7f272de	maximum number of threads per block for sm_86 is 1536 (#45889 ) Summary: according to https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications Pull Request resolved: https://github.com/pytorch/pytorch/pull/45889 Reviewed By: albanD Differential Revision: D24131188 Pulled By: ngimel fbshipit-source-id: 31d3038f7b1bc403751448c62b19609573c67a49	2020-10-06 12:01:33 -07:00
Sameer Deshmukh	ba642d36ff	ReplicationPad accepts 0-dim batch size. (#39137 ) Summary: This PR patches the ReplicationPad modules in `torch.nn` to be compatible with 0-dim batch sizes. EDIT: this is part of the work on gh-12013 (make all nn layers accept empty batch size) Pull Request resolved: https://github.com/pytorch/pytorch/pull/39137 Reviewed By: albanD Differential Revision: D24131386 Pulled By: ngimel fbshipit-source-id: 3d93057cbe14d72571943c8979d5937e4bbf743a	2020-10-06 11:54:32 -07:00
Jerry Zhang	8b7ee33ee6	[quant] Add quantized LeakyReLU module (#45711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45711 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D24069960 fbshipit-source-id: ccdd294308e07fd215556a63fa47191c09a1519f	2020-10-06 11:34:48 -07:00
Nikita Shulga	930bddd403	Cleanup nccl.cpp (#45899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45899 Use function polymorphism to avoid repeated casts I.e. instead of using `NCCL_CHECK(from_nccl_result(` add variant of the function that takes `ncclResult_t` as input argument Add non-pointer variant of `to_nccl_comm` to avoid `*to_nccl_comm(&comm)` pattern Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D24138012 Pulled By: malfet fbshipit-source-id: 7f62a03e108cbe455910e86e894afdd1c27e8ff1	2020-10-06 11:26:14 -07:00
Jerry Zhang	d1fc1555d4	[quant] Add quantized::leaky_relu that takes scale/zero_point as input (#45702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45702 https://github.com/pytorch/pytorch/issues/45593 Previously quantized leaky_relu does not require observation and just inherits the quantization parameters from input, but that does not work very well in qat This PR added a quantized::leaky_relu that has observation for output and it will become the default leaky_relu that our quantization tools produce (eager/graph mode) Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D24067681 fbshipit-source-id: d216738344363794b82bd3d75c8587a4b9415bca	2020-10-06 10:56:45 -07:00
Kimish Patel	001a7998b4	Disabling XNNPACK integration test in tsan mode (#45850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45850 In TSAN mode most xnnpack integration tests seem to be failing. Reason for failure is not entirely clear. It is not clear if this is spurious. Test Plan: python test/test_xnnpack_integration.py Reviewed By: xcheng16 Differential Revision: D24113885 fbshipit-source-id: dc3de3ad3d4bf0210ad67211383dbe0e842b09dd	2020-10-06 10:49:58 -07:00
Jane Xu	3510f19c5f	added some more details + debugging steps to CONTRIBUTING.md (#45903 ) Summary: When attempting to install PyTorch locally on my Macbook, I had some difficulty running the setup steps and understanding what I was really doing. I've added some clarifications and summarized some debugging steps about `python setup.py develop` to lower the barrier of entrance for new contributors. I'm seeking a lot of review here since I am not sure if what I wrote is entirely the most useful or accurate. Thank you! Pull Request resolved: https://github.com/pytorch/pytorch/pull/45903 Reviewed By: albanD Differential Revision: D24140343 Pulled By: janeyx99 fbshipit-source-id: a5e40d1bc804945ae7db2b95ab18cf7fe169e68a	2020-10-06 10:40:17 -07:00
Rong Rong	abedd9a274	Reduce size of test_unsqueeze to resolve consistent timeout issue (#45877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45877 apex_test_L0_optimizers Test Plan: `buck test mode/dev-tsan //caffe2/test:tensorexpr -- 'test_unsqueeze $test_tensorexpr\.TestTensorExprFuser$' --run-disabled` Reviewed By: malfet Differential Revision: D24126211 fbshipit-source-id: e38ba0168b6dd44459c070c01e3e39c93d5fae42	2020-10-06 10:33:20 -07:00
REX51	9728584cca	Replaced whitelist with allowlist (#45796 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41752 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45796 Reviewed By: dzhulgakov Differential Revision: D24125214 Pulled By: VitalyFedyunin fbshipit-source-id: 5b06c1fdaa90a60e8a6efc2e61f37fd647cf0ae7	2020-10-06 09:18:51 -07:00
Kimish Patel	a09e1098e7	Profiling allocator for mobile. (#43951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43951 AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Imported from OSS Reviewed By: dreiss Differential Revision: D23451019 fbshipit-source-id: 98bf1dbcfa8fcfb83d505ac01095e84a3f5b778d	2020-10-06 09:09:54 -07:00
peterjc123	b1373a74e0	Don't export enums for CUDA sources on Windows (#45829 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45829 Reviewed By: VitalyFedyunin Differential Revision: D24113130 Pulled By: ezyang fbshipit-source-id: 8356c837ed3a790efecf8dfcc8fb6ee6f45bd6e2	2020-10-06 08:04:36 -07:00
Jane (Yuan) Xu	be137e45cd	reorganizing tests so that test1 and test2 are balanced in timing (#45778 ) Summary: used --shard option to split up python tests ran from `test/run_test.py` in the testing script run in CI also revised a help message to be more accurate for --shard. Test results: BEFORE: \| EVENT \| TIMING \| \|---\|---\| \| TEST1 \| \| \| \| \| \| test_python_nn \| 35m19s \| \| test_cpp_extensions \| 30s \| \| total \| 35m49s \| \| TEST2 \| \| \| \| \| \| install_torchvision \| 35s \| \| test_python_all_except_nn_and_cpp_extensions \| 255m37s \| \| test_aten \| SKIPPED \| \| test_libtorch \| 9m8s \| \| test_custom_script_ops \| SKIPPED \| \| test_custom_backend \| SKIPPED \| \| test_torch_function_benchmark \| 10s \| \| total \| 4hr24m \| AFTER THIS SHARD: \| EVENT \| TIMING \| \|---\|---\| \| TEST1 \| \| \| \| \| \| test_autograd \| 26m30s \| \| test_foreach \| 69m \| \| test_nn \| test_nn is 35m38s \| \| total \| 3h1m \| \| TEST2 \| \| \| \| \| \| test-quantization \| 41m28s \| \| test_spectral_ops \| 17m37s \| \| test_torch \| 8m56s \| \| test_jit_legacy \| 16m21s \| \| total \| 2h18m \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/45778 Reviewed By: albanD Differential Revision: D24137156 Pulled By: janeyx99 fbshipit-source-id: 5873fec47aedb9f699ebbda653a4d32a9950fc13	2020-10-06 07:57:08 -07:00
REX51	67889db8aa	Replaced BLACKLIST with BLOCKLIST (#45781 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41714 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45781 Reviewed By: nairbv Differential Revision: D24136821 Pulled By: albanD fbshipit-source-id: 0c0223bda0c5b4da75167a27d7859562db396304	2020-10-06 07:49:00 -07:00
Jane Xu	8bc0c755be	adding option to move excluding to run_test.py instead of test.sh (#45868 ) Summary: Cleaning up test.sh a tiny bit Pull Request resolved: https://github.com/pytorch/pytorch/pull/45868 Reviewed By: albanD Differential Revision: D24122726 Pulled By: janeyx99 fbshipit-source-id: e8254accad15ad887a000ec1401c401389393c92	2020-10-06 07:13:27 -07:00
Raziel Alvarez Guevara	8a1e100466	Stricter backward compatibility check (#45773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45773 Changes the function schema's backward compatibility check to be stricter to comply with C++ API backwards compatibility capabilities. ghstack-source-id: 113537304 Test Plan: Updated and added tests to test_function_schema.py Browsed through several commits to native_functions.yaml and derivatives.yaml and I don't see instances where new arguments where not already being appended. Reviewed By: dzhulgakov Differential Revision: D24089751 fbshipit-source-id: a21f407cdc750906d3326e3ea27928b8aa732804	2020-10-06 01:28:48 -07:00
Xiaodong Wang	2fbe5971b3	[pytorch/cuda] apply 16-bit mask to the index for device guard registry (#45485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45485 Essentially this is the problem reported by ezyang: https://fb.workplace.com/groups/llvm.gcc/permalink/4053565044692080. There are two proposed fixes: * https://github.com/pytorch/pytorch/pull/44883: this doesn't work because it fails some static assert at runtime ``` caffe2/c10/core/TensorOptions.h:553:1: error: static_assert failed due to requirement 'sizeof(c10::TensorOptions) <= sizeof(long) * 2' "TensorOptions must fit in 128-bits" static_assert( sizeof(TensorOptions) <= sizeof(int64_t) * 2, ^ ``` * https://github.com/pytorch/pytorch/pull/44885: to be tested This diff is a temp hack to work around the problem. W/o this patch: ``` volatile size_t device_type = static_cast<size_t>(type); auto p = device_guard_impl_registry[device_type].load(); C10_LOG_FIRST_N(WARNING, 10) << "XDW-fail: " << cntr << ", Device type: " << type << ", type cast: " << device_type << ", guard: " << p; // output XDW-fail: 1129, Device type: cuda, type cast: 65537, guard: 0 ``` Another workaround is D23788441, which changes -O3 to -O2. So this seems to be a miscompilation for nvcc or the host compiler. Reviewed By: ezyang Differential Revision: D23972356 fbshipit-source-id: ab91fbbfccb6389052de216f95cf9a8265445aea	2020-10-05 22:37:47 -07:00
Peter Bell	d44eaf63d1	torch.fft helper functions (#44877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44877 Part of gh-42175. This implements the `torch.fft` helper functions: `fftfreq`, `rfftfreq`, `fftshift` and `ifftshift`. * #43009 Cleanup tracer handling of optional arguments Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D24043473 Pulled By: mruberry fbshipit-source-id: 35de7b70b27658a426773f62d23722045ea53268	2020-10-05 22:04:52 -07:00
Eric Cotner	e4efc420ae	Correct `Categorical` docstring (#45804 ) Summary: Clarified that the `Categorical` distribution will actually accept input of any arbitrary tensor shape, not just 1D and 2D tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45804 Reviewed By: dzhulgakov Differential Revision: D24125415 Pulled By: VitalyFedyunin fbshipit-source-id: 5fa1f07911bd85e172199b28d79763428db3a0f4	2020-10-05 21:49:10 -07:00
Thomas Viehmann	7eb0a71484	update persons of interest (#45803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45803 Reviewed By: dzhulgakov Differential Revision: D24125375 Pulled By: VitalyFedyunin fbshipit-source-id: a892603c6449a2c15e926d2b161468690d4ec2f4	2020-10-05 21:28:00 -07:00
Pritam Damania	bf85642c4c	Remove lock from GraphTask::set_exception_without_signal. (#45867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45867 In most cases the lock ordering was hold a lock in local autograd and then hold a lock in DistAutogradContext. In case of `set_exception_without_signal` the lock order was in reverse and as a result we saw potential deadlock issues in our TSAN tests. To fix this, I removed the lock and instead just used std::atomic exchange. In addition to this, I fixed TestE2E to ensure that we use the appropriate timeout. TestE2EProcessGroup was flaky for these two reasons and now is fixed. ghstack-source-id: 113592709 Test Plan: waitforbuildbot. Reviewed By: albanD Differential Revision: D24120962 fbshipit-source-id: 12447b84ceae772b91e9a183c90d1e6340f44e66	2020-10-05 20:02:29 -07:00
Mingzhe Li	10d86d1196	[NCCL] create NCCL communicator for send/recv on demand (#44922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44922 For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations. ghstack-source-id: 113592757 Test Plan: to add Reviewed By: pritamdamania87 Differential Revision: D23773726 fbshipit-source-id: 0d47c29d670ddc07f7181e8485af0e02e2c9cfaf	2020-10-05 18:33:03 -07:00
Mingzhe Li	59083d6176	[NCCL] Support NCCL Send/Recv (#44921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44921 This diff adds support for Process Group point-to-point operations on NCCL backend based on ncclSend/ncclRecv. See https://github.com/pytorch/pytorch/issues/43995 for more context. ghstack-source-id: 113592785 Test Plan: unittest Reviewed By: jiayisuse Differential Revision: D23709848 fbshipit-source-id: cdf38050379ecbb10450f3394631317b41163258	2020-10-05 18:27:57 -07:00
James Reed	b04ae953b4	[FX][WIP] Mutable Graph APIs (#45227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45227 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23880730 Pulled By: jamesr66a fbshipit-source-id: eb4e8c14d7f6b1deb1ddd6cf38a360413a1705ed	2020-10-05 17:07:08 -07:00
Nikita Shulga	1558a3657b	Add LazyNVRTC (#45674 ) Summary: Instead of dynamically loading `caffe2_nvrtc`, lazyNVRTC provides the same functionality by binding all the hooks to lazy bind implementation, very similar to the shared library jump tables: On the first call, each function from the list tries to get a global handle to the respective shared library and replace itself with the dynamically resolved symbol, using the following template: ``` auto fn = reinterpret_cast<decltype(&NAME)>(getCUDALibrary().sym(C10_SYMBOLIZE(NAME))); if (!fn) throw std::runtime_error("Can't get" ## NAME); lazyNVRTC.NAME = fn; return fn(...) ``` Fixes https://github.com/pytorch/pytorch/issues/31985 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45674 Reviewed By: ezyang Differential Revision: D24073946 Pulled By: malfet fbshipit-source-id: 1479a75e5200e14df003144625a859d312885874	2020-10-05 16:27:40 -07:00
Kurt Mohler	54aaffb7c7	Avoid NaN values in torch.cdist backward for p<1 (#45720 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/36493 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45720 Reviewed By: VitalyFedyunin Differential Revision: D24112541 Pulled By: albanD fbshipit-source-id: 8598a9e7cc0f6f9ea46c007f2e3365970aea0116	2020-10-05 16:19:29 -07:00
Meghan Lele	4ab73c1f74	[docs] Fix EmbeddingBag docs (#45763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45763 Summary This commit updates the documentation for `EmbeddingBag` to say that for bags of constant length with no per-sample weights, the class is equivalent to `Embedding` followed by `torch.sum(dim=1)`. The current docs say `dim=0` and this is readily falsifiable. Test Plan 1) Tried `Embedding` + `sum` with `dim`=0,1 in interpreter and compared to `EmbeddingBag` ``` >>> import torch >>> weights = torch.nn.Parameter(torch.randn(10, 3)) >>> e = torch.nn.Embedding(10, 3) >>> eb = torch.nn.EmbeddingBag(10, 3, mode="sum") >>> e.weight = weights >>> eb.weight = weights # Use 2D inputs because we are trying to test the case in which bags have constant length >>> inputs = torch.LongTensor([[4,1,2,7],[5,6,0,3]]) >>> eb(inputs) tensor([[-2.5497, -0.1556, -0.5166], [ 2.2528, -0.3627, 2.5822]], grad_fn=<EmbeddingBagBackward>) >>> torch.sum(e(inputs), dim=0) tensor([[ 1.6181, -0.8739, 0.8168], [ 0.0295, 2.3274, 1.2558], [-0.7958, -0.4228, 0.5961], [-1.1487, -1.5490, -0.6031]], grad_fn=<SumBackward1>) >>> torch.sum(e(inputs), dim=1) tensor([[-2.5497, -0.1556, -0.5166], [ 2.2528, -0.3627, 2.5822]], grad_fn=<SumBackward1>) ``` So clearly `torch.sum` with `dim=0` is not correct here. 2) Built docs and viewed in browser. Before <img width="882" alt="Captura de Pantalla 2020-10-02 a la(s) 12 26 20 p m" src="https://user-images.githubusercontent.com/4392003/94963035-557be100-04ac-11eb-986c-088965ac3050.png"> After <img width="901" alt="Captura de Pantalla 2020-10-05 a la(s) 11 26 51 a m" src="https://user-images.githubusercontent.com/4392003/95117732-ea294d80-06fd-11eb-9d6b-9b4e6c805cd0.png"> Fixes This commit closes #43197. Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24118206 Pulled By: SplitInfinity fbshipit-source-id: cd0d6b5db33e415d8e04ba04f2c7074dcecf3eee	2020-10-05 15:56:35 -07:00
Meghan Lele	78f055272c	[docs] Add 3D reduction example to tensordot docs (#45697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45697 Summary This commit adds an example of a reduction over three dimensions with `torch.tensordot`. It is unclear from existing docs whether `dims` should be a list of pairs or a pair of lists. Test Plan Built the docs locally. Before <img width="864" alt="Captura de Pantalla 2020-10-01 a la(s) 1 35 46 p m" src="https://user-images.githubusercontent.com/4392003/94866838-f0b17f80-03f4-11eb-8692-8f50fe3b9863.png"> After <img width="831" alt="Captura de Pantalla 2020-10-05 a la(s) 12 06 28 p m" src="https://user-images.githubusercontent.com/4392003/95121092-670af600-0703-11eb-959f-73c7797a76ee.png"> Fixes This commit closes #22748. Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24118186 Pulled By: SplitInfinity fbshipit-source-id: c19b0b7e001f8cd099dc4c2e0e8ec39310510b46	2020-10-05 15:36:59 -07:00
Zachary DeVito	26a9012f84	[fx] import used modules for code gen (#45471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45471 Intead of assuming that 'torch' is the only module used by generated code, use the qualified names of builtin functions to generate import statements for all builtins. This allows user-captured functions to also get code generated correctly. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23978696 Pulled By: zdevito fbshipit-source-id: ecbff150e3de38532531cdadbfe4965468f29a38	2020-10-05 15:21:44 -07:00
Dmytro Dzhulgakov	5177f8de2b	Revert D23398534: [pytorch][PR] [ONNX] Improve error handling for adaptive_pool Test Plan: revert-hammer Differential Revision: D23398534 (`45ddeb5ce6`) Original commit changeset: f2d60d40340f fbshipit-source-id: acc9d6c3d031662c37447fcee027b0c97b8492a7	2020-10-05 15:16:59 -07:00
Ansley Ussery	f18cc9c57d	Change type inferred from empty annotation (#45360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45360 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24078645 Pulled By: ansley fbshipit-source-id: 5d37d07df75bd7a2111d44638befe53c1021ee82	2020-10-05 15:16:56 -07:00
KyleCZH	a9a9d0b181	Rocm skip test cases (#45782 ) Summary: Skip the following test cases for rocm (When PYTORCH_TEST_WITH_ROCM=1): - test_reference_numerics_tan_cuda_float64 (__main__.TestUnaryUfuncsCUDA) - test_addmv_cuda_float16 (__main__.TestTorchDeviceTypeCUDA) - test_logspace_cuda_float64 (__main__.TestTensorCreationCUDA) - test_gloo_backend_2gpu_module (__main__.DistributedDataParallelTest) jeffdaily pruthvistony Pull Request resolved: https://github.com/pytorch/pytorch/pull/45782 Reviewed By: VitalyFedyunin Differential Revision: D24115581 Pulled By: xw285cornell fbshipit-source-id: 4043a9fa19e242301b5007813c15b6b3873889c5	2020-10-05 15:12:25 -07:00
Dmytro Dzhulgakov	519c086418	Revert D24042344: [C2] Add string equality operator Test Plan: revert-hammer Differential Revision: D24042344 (`cf48872d28`) Original commit changeset: c8997c6130e3 fbshipit-source-id: 3d8aec1104a2a59c67ab4b7e77caeaf9fc94ae1d	2020-10-05 15:09:03 -07:00
Lillian Johnson	9a668f94bb	[jit] allow slicing multiple dimensions with indicies (#45239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45239 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23886919 Pulled By: Lilyjjo fbshipit-source-id: d45c2a550fa8df9960cf2ab5da9d1ae0058a967a	2020-10-05 15:03:54 -07:00
Taras Galkovskyi	f11f9a8c1f	[pytorch][improvement] Improve torch logging to identify problematic key (#45766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45766 As per subj, making KeyError message more verbose. Test Plan: Verified that breakage can be successfully investigated with verbose error message unit tests Reviewed By: esqu1 Differential Revision: D24080362 fbshipit-source-id: f4e22a78809e5cff65a69780d5cbbc1e8b11b2e5	2020-10-05 14:54:52 -07:00
Facebook Community Bot	9f4abcad9d	Automated submodule update: FBGEMM (#45713 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `fe9164007c` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45713 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: VitalyFedyunin Differential Revision: D24069807 fbshipit-source-id: 4670725be42368bdf6e29a3746c89514c5f4ee1b	2020-10-05 14:47:54 -07:00
Vasiliy Kuznetsov	a83696ad53	quant docs: add API summary section (#45848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45848 This is a resubmit of the following stack: * start: https://github.com/pytorch/pytorch/pull/45093 * end: https://github.com/pytorch/pytorch/pull/45306 The original stack was reverted due to build failure, resubmitting. Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D24117781 Pulled By: vkuzo fbshipit-source-id: fb767fff2b044cfbba695ca3949221904fc8931f	2020-10-05 14:42:40 -07:00
Yuchen Huang	c80ec91b00	[iOS] Bump up the cocoapods version (#45862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45862 Bump up the cocoapods version ghstack-source-id: 113585513 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: xta0 Differential Revision: D24119158 fbshipit-source-id: e689b69628dcf802084e67c5ea627220cafcc575	2020-10-05 14:37:26 -07:00
Jerry Zhang	21fa877026	[quant][test] Remove numeric equivalence test for debug and non-debug option (#45852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45852 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24115329 fbshipit-source-id: ad32e68cbd54431fd440c8437a4361905a5dbdad	2020-10-05 14:11:07 -07:00
Nikita Shulga	14e6e50700	Refactor computeLRWorkDim (#45812 ) Summary: Move duplicated code for computing LRWork array dimention form CPU/CUDA implementation of apply_svd into LinearAlgebraUtils Reduce common multiplication factor from 7 to 5, which according to the documentation should be sufficient for LAPACK-3.6+ From `122506cd8b/SRC/cgesdd.f (L186)` ``` RWORK is REAL array, dimension (MAX(1,LRWORK)) Let mx = max(M,N) and mn = min(M,N). If JOBZ = 'N', LRWORK >= 5mn (LAPACK <= 3.6 needs 7mn); else if mx >> mn, LRWORK >= 5mnmn + 5mn; else LRWORK >= max( 5mnmn + 5mn, 2mxmn + 2mnmn + mn ). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45812 Reviewed By: walterddr Differential Revision: D24100836 Pulled By: malfet fbshipit-source-id: 0ca86aed25077c91cf60086ed301298381d5f628	2020-10-05 13:56:02 -07:00
Jane Xu	ffbffc0436	fixed formatting in function rstrings in torch.autograd.functional (#45849 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44426 The changes look like: ![Screen Shot 2020-10-05 at 12 34 32 PM](https://user-images.githubusercontent.com/31798555/95107954-9839f500-0708-11eb-88b0-444486f53061.png) (compare with https://pytorch.org/docs/stable/autograd.html#torch.autograd.functional.jacobian) and also ![Screen Shot 2020-10-05 at 12 35 15 PM](https://user-images.githubusercontent.com/31798555/95107966-9bcd7c00-0708-11eb-979a-b3578b8203da.png) (compare with https://pytorch.org/docs/stable/autograd.html#torch.autograd.functional.hessian) and lastly ![Screen Shot 2020-10-05 at 12 38 19 PM](https://user-images.githubusercontent.com/31798555/95107971-9e2fd600-0708-11eb-9919-5b809f5f0f20.png) (compare with https://pytorch.org/docs/stable/autograd.html#torch.autograd.functional.hvp) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45849 Reviewed By: albanD Differential Revision: D24114223 Pulled By: janeyx99 fbshipit-source-id: bfea5f0d594933db4b2c400291d330f747f518e8	2020-10-05 13:39:01 -07:00
Eli Uriegas	615013edcb	setup: Dataclasses only when < 3.7 (#45844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45844 Someone pointed out that dataclasses were actually added to the python stdlib in 3.7 and not 3.8, so bumping down the dependency on dataclasses from 3.8 -> 3.7 makes sense here Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr, malfet Differential Revision: D24113367 Pulled By: seemethere fbshipit-source-id: 03d2d93f7d966d48a30a8e2545fd07dfe63b4fb3	2020-10-05 13:29:21 -07:00
Pritam Damania	b5a2f04089	Disallow creation of ProcessGroupNCCL without GPUs. (#45642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45642 Prior to https://github.com/pytorch/pytorch/pull/45181, initializing a NCCL process group would work even if no GPUs were present. Although, now since init_process_group calls `barrier()` this would fail. In general the problem was that we could initialize ProcessGroupNCCL without GPUs and then if we called a method like `barrier()` the process would crash since we do % numGPUs resulting in division by zero. ghstack-source-id: 113490343 Test Plan: waitforbuildbot Reviewed By: osalpekar Differential Revision: D24038839 fbshipit-source-id: a1f1db52cabcfb83e06c1a11ae9744afbf03f8dc	2020-10-05 12:05:48 -07:00
Negin Raoof	45ddeb5ce6	[ONNX] Improve error handling for adaptive_pool (#43032 ) Summary: This would also improve error handling for interpolate with 'area' mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43032 Reviewed By: malfet Differential Revision: D23398534 Pulled By: bzinodev fbshipit-source-id: f2d60d40340f46e7c0499ea73c1e39945713418d	2020-10-05 11:53:14 -07:00
Nikolay Korovaiko	adc21c6db2	Rename jobs and cli switches for testing GraphExecutor configurations to something a little bit more sensical. (#45715 ) Summary: Rename jobs for testing GraphExecutor configurations to something a little bit more sensical. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45715 Reviewed By: ezyang, anjali411 Differential Revision: D24114344 Pulled By: Krovatkin fbshipit-source-id: 89e5f54aaebd88f8c5878e060e983c6f1f41b9bb	2020-10-05 11:43:28 -07:00
Pawel Garbacki	cf48872d28	[C2] Add string equality operator Summary: This diff adds a string equality checking operator. Test Plan: Unit tests Differential Revision: D24042344 fbshipit-source-id: c8997c6130e3438f2ae95dae69f76978e2e95527	2020-10-05 10:47:53 -07:00
Ayush Sharma	162717e527	grammatically update index.rst (#45801 ) Summary: This is a following up PR for https://github.com/pytorch/pytorch/issues/45652 which has a problem to rebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45801 Reviewed By: VitalyFedyunin Differential Revision: D24111776 Pulled By: glaringlee fbshipit-source-id: 2c727a17426be91a4df78a195de79197e1c5d120	2020-10-05 09:55:56 -07:00
Thomas Viehmann	3ab88c3903	Enable TorchBind tests on ROCm (#45426 ) Summary: The torchbind tests didn't work be cause somehow we missed the rename of caffe2_gpu to torch_... (hip for us) in https://github.com/pytorch/pytorch/issues/20774 (merged 2019-06-13, oops) and still tried to link against it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45426 Reviewed By: VitalyFedyunin Differential Revision: D24112439 Pulled By: walterddr fbshipit-source-id: a66a574e63714728183399c543d2dafbd6c028f7	2020-10-05 09:38:12 -07:00
Mingzhe Li	e829d4fba9	[op-bench] fix jit mode (#45774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45774 Fix RuntimeError: No such operator operator_benchmark::_consume Test Plan: waitforsandcastle Reviewed By: ngimel Differential Revision: D24064982 fbshipit-source-id: 13160b6d18569e659ca1ab0ca1d444ed9947260c	2020-10-05 09:29:41 -07:00
kshitij12345	f65ab89edd	[numpy] Add torch.nan_to_num (#44592 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 TODO: * [x] Add tests * [x] Add docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/44592 Reviewed By: colesbury Differential Revision: D24079472 Pulled By: mruberry fbshipit-source-id: 2b67d36cba46eaa7ca16cd72671b57750bd568bc	2020-10-05 01:38:56 -07:00
Xiang Gao	e1ff46b6e5	CUDA BFloat16 TopK (#44755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44755 Reviewed By: mruberry Differential Revision: D23741680 Pulled By: ngimel fbshipit-source-id: 8fce92a26663336bcb831c72202fe2623a2ddaf0	2020-10-04 11:38:00 -07:00
James Reed	2ab74a4839	[FX] Make Tracer.trace() just return a Graph (#45704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45704 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D24067982 Pulled By: jamesr66a fbshipit-source-id: c82aa6be504d45e110055a3c4db129d0b9ac3ef5	2020-10-03 21:13:48 -07:00
Hao Lu	8a6b919163	[StaticRuntime] Fix broken tests (#45813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45813 Fix tests broken by D23996656 (`2b48dd168d`). Test Plan: ``` buck test mode/opt //pytorch/tensorboardX:test_pytorchtb -- 'test_pytorch_graph $pytorch\.tensorboardX\.tests\.test_pytorch_graph\.PytorchGraphTest$' buck test mode/opt //pytext/tests: buck test mode/dev-nosan //mobile-vision/projects/detectron2go/tests:test_caffe2_compatibles ``` Reviewed By: yinghai Differential Revision: D24100807 fbshipit-source-id: e2f92aadca4161f5cf9f552e922fb4d6500af3a4	2020-10-03 16:54:22 -07:00
Nikita Shulga	24fa2daea6	Revert D24100389: Revert D24072697: [te] Get llvm codegen to compile with llvm9 and llvm-fb Test Plan: revert-hammer Differential Revision: D24100389 Original commit changeset: b32c5163e4fb fbshipit-source-id: 9ce7bfbcf411c0584e5d535ee107fb5a135ee6e6	2020-10-03 15:33:42 -07:00
Nikita Shulga	ff568a0e6b	Revert D24072697: [te] Get llvm codegen to compile with llvm9 and llvm-fb Test Plan: revert-hammer Differential Revision: D24072697 (`e3d2defdc8`) Original commit changeset: 7f56b9f3cbe5 fbshipit-source-id: b32c5163e4fb6df99447f95fdb82674e5ae62f22	2020-10-03 12:27:26 -07:00
Nikita Shulga	3a27fc966a	Test torch.svd using complex float and double numbers (take 2) (#45795 ) Summary: Adds support for magmaSvd for complex numbers Fixes use-after-free error in `apply_symeig` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45795 Reviewed By: ezyang Differential Revision: D24096955 Pulled By: malfet fbshipit-source-id: 0d8d8492f89fe722bbd5aed3528f244245b496d0	2020-10-03 11:33:28 -07:00
Tao Xu	d8a9c2c27e	[iOS][CI] Fix the timeout for nightlies (#45798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45798 Test Plan: Imported from OSS Reviewed By: husthyc Differential Revision: D24098451 Pulled By: xta0 fbshipit-source-id: 269517e0d54b0a07ea2ae5e2aee7f0ebc7985191	2020-10-02 23:13:30 -07:00
Hao Lu	2b48dd168d	[StaticRuntime] Integrate Static Runtime into PyTorchPredictor (#45640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45640 Reviewed By: dzhulgakov Differential Revision: D23996656 fbshipit-source-id: 63d88c89d1df61a04deadc472319607ed83867e5	2020-10-02 23:03:05 -07:00
Edward Yang	546aab66c1	Revert D24027761: Update backward definition for more operators and reenable tests in test_ops.py Test Plan: revert-hammer Differential Revision: D24027761 (`7d809f5d8e`) Original commit changeset: c1f707c2a039 fbshipit-source-id: 30750d2f08886036fb8b2cd0ae51c7732d3b7b19	2020-10-02 18:52:57 -07:00
Michael Suo	31621c828d	Fix JIT tests when run locally in fbcode (#45776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45776 Splitting out backend and custom class registration into their own library is not currently implemented in fbcode, so detect that we are running tests in fbcode and disable those tests. Test Plan: buck test mode/no-gpu mode/dev caffe2/test:jit Reviewed By: smessmer Differential Revision: D24085871 fbshipit-source-id: 1fcc0547880bc4be59428e2810b6a7f6e50ef798	2020-10-02 17:43:01 -07:00
James Reed	53aea60bce	[FX] Make output a non-special Node (#45599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45599 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D24027586 Pulled By: jamesr66a fbshipit-source-id: 747c25e3c7668ca45f03bed0be71fd3c9af67286	2020-10-02 17:08:17 -07:00
Xiang Gao	2fa062002e	CUDA BFloat16 infrastructure (#44925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44925 Reviewed By: agolynski Differential Revision: D23783910 Pulled By: ngimel fbshipit-source-id: dacac2ad87d58056bdc68bfe0b7ab1de5c2af0d8	2020-10-02 16:21:30 -07:00
Shen Li	8cb7280242	Revert "Remove device maps from TensorPipe for v1.7 release (#45353 )" (#45762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45762 This reverts commit 5211fb97ac4c246151f1286c78d63e0e317a8a4a. Test Plan: Imported from OSS Reviewed By: colesbury Differential Revision: D24088231 Pulled By: mrshenli fbshipit-source-id: b6ee15ec5ae137ea127bdc2db8e1842764bc01d4	2020-10-02 15:14:05 -07:00
Yanan Cao	d150d3e276	Make sure each warnings.warn only executes once inside TorchScript. (#45382 ) Summary: * Add a pass at end of runCleanupPasses to annotate `aten::warn` so that each has its unique id * Enhanced interpreter so that it tracks which `aten::warn` has been executed before and skip them * Improved insertInstruction so that it correctly checks for overflow Fixes https://github.com/pytorch/pytorch/issues/45108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45382 Reviewed By: mrshenli Differential Revision: D24060677 Pulled By: gmagogsfm fbshipit-source-id: 9221bc55b9ce36b374bdf614da3fe47496b481c1	2020-10-02 14:55:10 -07:00
Xianjie Chen	73e9daa35f	[caffe2] Optimize Dedup version of RowWiseSparseAdagrad fused op by WarpReduce (#45649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45649 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44275 * This Diff applies WarpReduce optimization for dedup version of RowWiseSparseAdagrad fused op. Basically we can achieve ~1.33x performance improvement with this Diff. * Port the way from D23948802 to find the num_dup * fix the likely bug about fp16 in the dedup kernel Reviewed By: jianyuh Differential Revision: D23561994 fbshipit-source-id: 1a633fcdc924593063a67f9ce0d36eadb19a7efb	2020-10-02 14:28:24 -07:00
Marcio Porto	c31066ac9d	Torch Integration Test Formatting Changes (#45740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45740 Reviewed By: esqu1 Differential Revision: D23869021 fbshipit-source-id: 5910d44f9475bd7a53dc0478b69b39572dc8666f	2020-10-02 14:02:31 -07:00
anjali411	7d809f5d8e	Update backward definition for more operators and reenable tests in test_ops.py (#44444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44444 This PR: 1. Fixes https://github.com/pytorch/pytorch/issues/41510. Updates backward formula for the following functions: `asin`, `acos`, `asinh`, `acosh`, `atan`, `atanh`, `div`, `log`, `log10`, `log2`, `log1p`, `pow`, `reciprocal`, `angle`. 2. Re-enables the tests in `test_ops.py`. 3. Adds dispatch for complex dtypes for `tanh_backward`. 4. Re-enables commented tests in `common_methods_invocation.py`. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24027761 Pulled By: anjali411 fbshipit-source-id: c1f707c2a039149a6e04bbde53ee120d9119d99a	2020-10-02 13:37:10 -07:00
Bert Maher	e3d2defdc8	[te] Get llvm codegen to compile with llvm9 and llvm-fb (#45726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45726 FB has an old internal platform that uses some random llvm version that looks sort of like llvm 7. I've guarded that with the appropriate LLVM_VERSION_PATCH. I've also swapped out some of our uses of ThreadSafeModule/ThreadSafeContext for the variants without ThreadSafe in the name. As far as I can tell we weren't using the bundled locks anyways, but I'm like 85% sure this is OK since we compile under the Torch JIT lock anyways. Test Plan: unit tests Reviewed By: ZolotukhinM, asuhan Differential Revision: D24072697 fbshipit-source-id: 7f56b9f3cbe5e6d54416acdf73876338df69ddb2	2020-10-02 13:33:13 -07:00
Nikita Shulga	5a47a2126d	Revert D24018160: [pytorch][PR] Test torch.svd using complex float and double numbers Test Plan: revert-hammer Differential Revision: D24018160 (`888f3c12e7`) Original commit changeset: 1b6103f5af94 fbshipit-source-id: 3040250db25995fc0d41fd0f497550dded43cad9	2020-10-02 13:33:11 -07:00
Rohan Varma	f8c1ca5dd8	Enable NamedTuple data type to work with DDP (#44220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44220 Closes https://github.com/pytorch/pytorch/issues/44009 Currently if a dataloader returns objects created with a collections.namedtuple, this will incorrectly be cast to a tuple. As a result, if we have data of these types, there can be runtime errors during the forward pass if the module is expecting a named tuple. Fix this in `scatter_gather.py` to resolve the issue reported in https://github.com/pytorch/pytorch/issues/44009 ghstack-source-id: 113423287 Test Plan: CI Reviewed By: colesbury Differential Revision: D23536752 fbshipit-source-id: 3838e60162f29ebe424e83e474c4350ae838180b	2020-10-02 13:33:08 -07:00
Xiao Wang	8619de84f2	Fix cuDNN error message when it's Conv2d (#45729 ) Summary: Originally introduced in https://github.com/pytorch/pytorch/issues/45023. When I was doing test in the original PR, it was a Conv3d, so this problem was not discovered. Arrays in `ConvolutionParams` have a fixed length of 3 or 5. This is because `max_dim` is set as a constexpr of 3, regardless of Conv2d or Conv3d. The current code will make some error message be weird. See below in the comments. `9201c37d02/aten/src/ATen/native/cudnn/Conv.cpp (L212-L226)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45729 Reviewed By: mruberry Differential Revision: D24081542 Pulled By: ngimel fbshipit-source-id: 141f8946f4d0db63a723131775731272abeaa6ab	2020-10-02 13:33:06 -07:00
Rong Rong	322855e380	type check for torch.quantization.observer (#45630 ) Summary: add type checker for observer Pull Request resolved: https://github.com/pytorch/pytorch/pull/45630 Reviewed By: malfet Differential Revision: D24058304 Pulled By: walterddr fbshipit-source-id: ac1c0f5ff0d34b0445bd1364653fc5c9d7571b05	2020-10-02 13:25:41 -07:00
Ansley Ussery	db8b076272	Change signature for torch.poisson (#45656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45656 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24078609 Pulled By: ansleyadelaide fbshipit-source-id: 97a95b08334ed0d710e032a267b940c2fc9f7f40	2020-10-02 13:14:12 -07:00
Ansley Ussery	7726754e70	Add function signature for pixel_shuffle (#45661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45661 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24078627 Pulled By: ansleyadelaide fbshipit-source-id: 44917ff5932e4d0adcc18ce24ecfc0b5686818e3	2020-10-02 11:46:35 -07:00
Jane (Yuan) Xu	6acd7b686c	adding sharding option to run_test.py (#45583 ) Summary: Added a sharding option to run_test.py to enable users to run a subset of the many tests. The new `--shard` argument takes in two integer values, `x` and `y`, where the larger value would denote the number of shards and the smaller value would denote which shard to run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45583 Reviewed By: malfet Differential Revision: D24083469 Pulled By: janeyx99 fbshipit-source-id: 1777bd7822c95b3bf37079deff9381c6f8eaf4cc	2020-10-02 11:21:51 -07:00
Omkar Salpekar	3799ba83e5	[Docs] Adding Store API Docs (#45543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45543 This PR adds documentation for the c10d Store to the public docs. Previously these docs were missing although we exposed a lightly-used (but potentially useful) Python API for our distributed key-value store. ghstack-source-id: 113409195 Test Plan: Will verify screenshots by building the docs. Reviewed By: pritamdamania87 Differential Revision: D24005598 fbshipit-source-id: 45c3600e7c3f220710e99a0483a9ce921d75d044	2020-10-02 11:16:56 -07:00
Eli Uriegas	a052597e6c	Bump nightlies to 1.8.0 (#45696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45696 Similar to https://github.com/pytorch/pytorch/pull/40519 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D24064381 Pulled By: seemethere fbshipit-source-id: 1484b9c4fc5fa8cfa7be591a0a5d4b6e05968589	2020-10-02 11:10:34 -07:00
Pritam Damania	6e43f0db8b	Use correct signatures for METH_NOARGS. (#45528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45528 As described in https://github.com/pytorch/pytorch/issues/45419, resolving a bunch of cpython signature issues. #Closes: https://github.com/pytorch/pytorch/issues/45419 ghstack-source-id: 113385726 Test Plan: sentinel Reviewed By: albanD Differential Revision: D24000626 fbshipit-source-id: d334596f1f0256063691aa044c8fb2face260817	2020-10-02 10:43:58 -07:00
Andrew Millspaugh	cdf93b03de	Add string versions of argument funcs in jit Node (#45464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45464 Usage of Symbols to find arguments requires one to generate a nonsense symbol for inputs which don't already have one. The intention of symbols appears to be something of an internalized string, but the namespace component doesn't apply to an argument. In order to access the arguments by name without adding new symbols, versions of those functions with std::string input was added. These can be proved valid based on the existing codepath. Additionally, a hasNamedInput convenience function was added to remove the necessity of a try/catch block in user code. The primary motivation is to be able to easily handle the variable number of arguments in glow, so that the arange op may be implemented. Reviewed By: eellison Differential Revision: D23972315 fbshipit-source-id: 3e0b41910cf07e916186f1506281fb221725a91b	2020-10-02 10:26:29 -07:00
Marcio Porto	b234acd414	Exposes SparseToDenseMask Caffe2 Operator (#45670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45670 Reviewed By: esqu1 Differential Revision: D23868280 fbshipit-source-id: d6afa129c073fe611cb43a170025bc3c880a4bec	2020-10-02 10:05:13 -07:00
Mingzhe Li	ad31068fe9	Add a distributed package reviewer (#45744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45744 Tag me as reviewer Test Plan: na Reviewed By: jiayisuse Differential Revision: D23881569 fbshipit-source-id: 8452fa60fe3d017ae1f0da26c0ce476f2b9c170c	2020-10-02 09:56:28 -07:00
Sam Estep	24187a0b42	Enable type check for torch.quantization.fake_quantize (#45701 ) Summary: Addresses part of https://github.com/pytorch/pytorch/issues/42969. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45701 Reviewed By: walterddr Differential Revision: D24066672 Pulled By: samestep fbshipit-source-id: 53bb5e7b4703738d3de86fa89fb0980f1d6251f3	2020-10-02 09:27:34 -07:00
Nikita Shulga	888f3c12e7	Test torch.svd using complex float and double numbers (#45572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45572 Reviewed By: anjali411 Differential Revision: D24018160 Pulled By: malfet fbshipit-source-id: 1b6103f5af94e9f74b73ed23aa02c0236b199b34	2020-10-02 08:29:14 -07:00
Brian Hirsh	4d08930ccb	remove beta defaulting in smooth_l1_loss_backward. added to the bc whitelist (#45588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45588 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24024312 Pulled By: bdhirsh fbshipit-source-id: 7246e5da741fbc5641deecaf057ae9a6e44e8c34	2020-10-02 07:53:04 -07:00
Brian Hirsh	869b2ca048	some documentation and style fixes to smooth_l1_loss (#45587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45587 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24024313 Pulled By: bdhirsh fbshipit-source-id: c50efb2934d7b9d3b090e92678319cde42c0df45	2020-10-02 07:47:31 -07:00
Brian Hirsh	c703602e17	make broadcasting explanation clearer in matmul doc: #22763 (#45699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45699 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24065584 Pulled By: bdhirsh fbshipit-source-id: 5e2cdd00ed18ad47d24d11751cfa5bee63853cc9	2020-10-02 06:51:42 -07:00
Sebastian Messmer	82cc86b64c	VariableKernel calls into scattered C++ api (#44158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44158 Previously, the C++ API only supported calling ops with a gathered TensorOptions object. So even if the VariableKernel took scattered arguments, it had to re-gather them to call into the C++ API. But a diff stacked below this one introduced a scattered API for the C++ frontend. This reaps the benefits and makes sure that if the Variable kernel gets scattered arguments (i.e. it's a c10-full op), then it passes those on without regathering ghstack-source-id: 113355690 Test Plan: vs master: https://www.internalfb.com/intern/fblearner/details/216342597/ vs prev diff: https://www.internalfb.com/intern/fblearner/details/216342688/ Reviewed By: ezyang Differential Revision: D23512538 fbshipit-source-id: 8ee6c1cc99443a2141db85072fd6dbc52b4d77fd	2020-10-02 04:13:39 -07:00
Sebastian Messmer	6e2eee2b9d	Add faithful C++ API (#44087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44087 Each op taking a TensorOptions argument now has an additional overload in the C++ frontend where it takes scattered ScalarType, Layout, Device, bool instead of one TensorOptions argument. If it is a c10-full op, then the scattered version calls into the dispatcher and the gathered version is a proxy calling into the scattered version. If it is a non-c10-full op, then the gathered version calls into the dispatcher and the scattered version is a proxy calling into the gathered version. This should minimize the amount of gathering and scattering needed. This PR is also a prerequisite to remove the re-gathering of arguments that is currently happening in VariableKernel. Currently, VariableKernels gather arguments into a TensorOptions object to call into the C++ API. In a PR stacked on top of this, VariableKernel will just directly call into the scattered C++ API introduced here and avoid the gathering step. ghstack-source-id: 113355689 Test Plan: waitforsandcastle vs master: https://www.internalfb.com/intern/fblearner/details/216169815/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216169957/ Reviewed By: ezyang Differential Revision: D23492188 fbshipit-source-id: 3e84c467545ad9371e98e09075a311bd18411c5a	2020-10-02 04:08:53 -07:00
Natalia Gimelshein	9201c37d02	Use addmm directly for 1x1 convolution (#45557 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45274 Based on https://github.com/pytorch/pytorch/issues/44041, sets intermediate for backward computation (otherwise, backward tests are failing). Pull Request resolved: https://github.com/pytorch/pytorch/pull/45557 Reviewed By: izdeby Differential Revision: D24030655 Pulled By: ngimel fbshipit-source-id: 368fe9440668dffc004879f8b1d2dd3787d915c9	2020-10-02 00:26:53 -07:00
Supriya Rao	1a2d3b6a75	[quant] PerChannelFloatQParams support for quint4x2 dtype (#45594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45594 Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Imported from OSS Reviewed By: raghuramank100 Differential Revision: D24025595 fbshipit-source-id: dd9d0557de585dd4aaf5f138959c3523a29fb759	2020-10-01 23:59:53 -07:00
Supriya Rao	04526a49d3	[quant] creating quint4x2 dtype for quantized tensors (#44678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44678 This is a prototype PR that introduces 4 bit qtensors. The new dtype added for this is c10::quint4x2 The underlying storage for this is still uint8_t, so we pack 2 4-bit values in a byte while quantizing it. This change uses most of the existing scaffolding for qtensor storage. We allocate storage based on the dtype before creating a new qtensor. It also adds a dispatch mechanism for this dtype so we can use this to get the bitwidth, qmin and qmax info while quantizing and packing the qtensor (when we add 2-bit qtensor) Kernels that use this dtype should be aware of the packing format. Test Plan: Locally tested ``` x = torch.ones((100, 100), dtype=torch.float) qx_8bit = torch.quantize_per_tensor(x, scale=1.0, zero_point=2, dtype=torch.quint8) qx = torch.quantize_per_tensor(x, scale=1.0, zero_point=2, dtype=torch.quint4x2) torch.save(x, "temp.p") print('Size float (B):', os.path.getsize("temp.p")) os.remove('temp.p') torch.save(qx_8bit, "temp.p") print('Size quantized 8bit(B):', os.path.getsize("temp.p")) os.remove('temp.p') torch.save(qx, "temp.p") print('Size quantized 4bit(B):', os.path.getsize("temp.p")) os.remove('temp.p') ``` Size float (B): 40760 Size quantized 8bit(B): 10808 Size quantized 4bit(B): 5816 Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23993134 fbshipit-source-id: 073bf262f9680416150ba78ed2d932032275946d	2020-10-01 23:53:34 -07:00
Nikolay Korovaiko	a0d08b2199	Set the default bailout depth to 20 (#45710 ) Summary: This modifies the default bailout depth to 20 which gives us a reasonable performance in benchmarks we considered (fastrnns, maskrcnn, hub/benchmark, etc) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45710 Reviewed By: robieta Differential Revision: D24071861 Pulled By: Krovatkin fbshipit-source-id: 472aacc136f37297b21f577750c1d60683a6c81e	2020-10-01 23:37:41 -07:00
Meghan Lele	402caaeba5	[docs] Update docs for NegativeBinomial (#45693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45693 Summary This commit updates the docstring for `torch.distributions.NegativeBinomial` to better match actual behaviour. In particular, the parameter currently documented as probability of success is actually probability of failure. Test Plan 1) Ran the code from the issue to make sure this is still an issue (it is) 2) `make html` and viewed the docs in a browser. Before <img width="879" alt="Captura de Pantalla 2020-10-01 a la(s) 1 35 28 p m" src="https://user-images.githubusercontent.com/4392003/94864456-db3a5680-03f0-11eb-977e-3bab0fb9c206.png"> After <img width="877" alt="Captura de Pantalla 2020-10-01 a la(s) 2 12 24 p m" src="https://user-images.githubusercontent.com/4392003/94864478-e42b2800-03f0-11eb-965a-51493ca27c80.png"> Fixes This commit closes #42449. Test Plan: Imported from OSS Reviewed By: robieta Differential Revision: D24071048 Pulled By: SplitInfinity fbshipit-source-id: d345b4de721475dbe26233e368af62eb57a47970	2020-10-01 23:20:34 -07:00
Daily, Jeff	36de05dbf6	passing all arguments to sccache wrapper script should be quoted as "$@" (#45582 ) Summary: This fixes MIOpen runtime compilation since it passes quoted arguments to the clang compiler. This change also makes the sccache wrapper scripts consistent with the nvcc wrapper. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45582 Reviewed By: seemethere, izdeby Differential Revision: D24034477 Pulled By: malfet fbshipit-source-id: 1964bac1e693b238e8efe9c046a39be64571e9df	2020-10-01 23:11:59 -07:00
Lillian Johnson	f6dc256bc6	example of splitting up an FX graph into smaller subgraphs with own submodules (#45404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45404 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23956147 Pulled By: Lilyjjo fbshipit-source-id: a35e33a0b9f1ed5f3fb6e5cd146f66c29bf3d518	2020-10-01 20:40:27 -07:00
Brian Hirsh	1552a926a3	migrate cuda implementation of take() from TH to ATen (#45430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45430 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24037297 Pulled By: bdhirsh fbshipit-source-id: 7c5f2c08e895fb0c25eec1d68c7455e4f2b1c64e	2020-10-01 20:03:01 -07:00
Brian Hirsh	a015ba8dd5	migrating the take() fn from TH to ATen (#45283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45283 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24037298 Pulled By: bdhirsh fbshipit-source-id: 088ce39e55ee8b5a79fa501395fa9eec08d1d396	2020-10-01 19:58:09 -07:00
lixinyu	fc4209bd4f	Fix the bucketization wrong doc for right argument (#45684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45684 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D24057996 Pulled By: glaringlee fbshipit-source-id: 3db1c24f3cae9747effa4b1f3c5c3baf6888c9a1	2020-10-01 18:16:49 -07:00
Nikolay Korovaiko	4c1e50eb5c	remove skip annotations since we already disabled the tests wholesale (#45698 ) Summary: Remove skip annotations since we already disabled the tests wholesale Pull Request resolved: https://github.com/pytorch/pytorch/pull/45698 Reviewed By: mrshenli Differential Revision: D24064547 Pulled By: Krovatkin fbshipit-source-id: 0d154135de0c0550d6874bea3c2d42d5f4d71cb4	2020-10-01 17:47:48 -07:00
Nikolay Korovaiko	cbdba7cc1e	win job for the legacy executor (#45612 ) Summary: Adds a CUDA job on Windows for the jit legacy executor Pull Request resolved: https://github.com/pytorch/pytorch/pull/45612 Reviewed By: mrshenli Differential Revision: D24042196 Pulled By: Krovatkin fbshipit-source-id: 35c79c53ed569d221e79376c108bc864900ef49e	2020-10-01 17:23:55 -07:00
Nikolay Korovaiko	0393a1e8b9	add an indexer to SymbolicShape (#45450 ) Summary: A convenience indexer into `SymbolicShape`s Pull Request resolved: https://github.com/pytorch/pytorch/pull/45450 Reviewed By: ZolotukhinM Differential Revision: D23971758 Pulled By: Krovatkin fbshipit-source-id: 1f18c5f89f579072f6bf467809ea9471bf42bc2d	2020-10-01 16:57:07 -07:00
Tao Xu	0de5824f36	[iOS][CI] Upgrade xcode version to 12.0 (#45677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45677 Test Plan: Imported from OSS Reviewed By: husthyc Differential Revision: D24065647 Pulled By: xta0 fbshipit-source-id: f2535b1d93e58cf79e7075bf56b0613a3ded16eb	2020-10-01 16:53:18 -07:00
Tao Xu	e8e0fca99e	[iOS][CI] Update the dev cert (#45651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45651 ### Summary 1. Update the iOS developer certificates. The new expiration date is 10/01/2021. 2. Restore the iOS arm64 jobs and the nightly. ### Test Plan The following CI jobs succeed - ci/circleci: pytorch_ios_11_2_1_arm64_build - ci/circleci: pytorch_ios_11_2_1_arm64_custom_build - ci/circleci: pytorch_ios_11_2_1_x86_64_build Test Plan: Imported from OSS Reviewed By: husthyc Differential Revision: D24065648 Pulled By: xta0 fbshipit-source-id: 758f41de8296fdfbd3cfad87e9445c2acafd5f94	2020-10-01 16:48:30 -07:00
Abaho Katabarwa	de3a48013a	Use CAFFE2_USE_MSVC_STATIC_RUNTIME to determine when to avoid waiting for global destructors on Windows (#43532 ) Summary: We are trying to build libtorch statically (BUILD_SHARED_LIBS=OFF) then link it into a DLL. Our setup hits the infinite loop mentioned [here](`54c05fa34e/torch/csrc/autograd/engine.cpp (L228)`) because we build with `BUILD_SHARED_LIBS=OFF` but still link it all into a DLL at the end of the day. This PR fixes the issue by changing the condition to guard on which windows runtime the build links against using the `CAFFE2_USE_MSVC_STATIC_RUNTIME` flag. `CAFFE2_USE_MSVC_STATIC_RUNTIME` defaults to ON when `BUILD_SHARED_LIBS=OFF`, so backwards compatibility is maintained. I'm not entirely confident I understand the subtleties of the windows runtime versus linking setup, but this setup works for us and should not affect the existing builds. Fixes https://github.com/pytorch/pytorch/issues/44470 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43532 Reviewed By: mrshenli Differential Revision: D24053767 Pulled By: albanD fbshipit-source-id: 1127fefe5104d302a4fc083106d4e9f48e50add8	2020-10-01 16:41:14 -07:00
Jerry Zhang	4f685ecc25	[reland][quant][graphmode][fx] Merge all quantization mode (#45292 ) (#45672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45672 This PR merges all quantization mode and will only expose the following top level functions: ``` prepare_fx prepare_qat_fx convert_fx ``` Test Plan: Imported from OSS Imported from OSS Reviewed By: z-a-f Differential Revision: D24053439 fbshipit-source-id: 03d545e26a36bc22a73349061b751eeb35171e64	2020-10-01 15:47:11 -07:00
Michael Suo	18253f4a48	Fix BUILD_CAFFE2 if FBGEMM and NNPACK are not built (#45610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45610 Also add to the usual documentation places that this option exists. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24058199 Pulled By: suo fbshipit-source-id: 81574fbd042f47587e2c7820c726fac0f68af2a7	2020-10-01 14:58:55 -07:00
Eli Uriegas	5959de3aeb	setup: Only include dataclasses for py < 3.8 (#45611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45611 dataclasses was made a standard library item in 3.8 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D24031740 Pulled By: seemethere fbshipit-source-id: 15bdf1fe0d8de9b8ba7912e4a651f06b18d516ee	2020-10-01 14:52:28 -07:00
Zafar	93be03cec0	[quant] torch.mean add path for unsupported QNNPACK modes (#45533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45533 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D24030446 Pulled By: z-a-f fbshipit-source-id: a392402ef701c5e45e244ac440bc151ef942cccd	2020-10-01 14:44:26 -07:00
Kunal Bhalla	4564444c91	[RFC][caffe2] TaskGroup.__repr__ shouldn't have side effects Summary: `__repr__` calling self.tasks() ends up marking the instance as "used", which doesn't seem appropriate. I was debugging a value being passed around and then ran into `Cannot add Task to an already used TaskGroup.` because the value had been logged once. Test Plan: Added a unit test -- didn't see a clean public method to test it, but I'm happy to add one if that makes sense. Will wait for sandcastle to trigger everything else; I'm not at all familiar with this code so any other recommendations would be great! Reviewed By: cryptopic Differential Revision: D23541198 fbshipit-source-id: 5d1ec674a1ddaedf113140133b90e0da6afa7270	2020-10-01 14:21:03 -07:00
Wang Xu	03e4e94d24	Find single partition (#45429 ) Summary: WIP: This PR is working in progress for the partition of fx graph module. _class partitioner_ generates partitions for the graph module. _class partition_ is a partition node in the partitions. _Partitioner()_ : create a partitioner _partition_graph(self, fx_module: GraphModule, devices: List[str]) -> None_: use fx graph module and devices as the input and create partition_ids for each node inside the graph module _dump_partition_DAG(self) -> None_: print out the information about each partition, including its id, its backend type (what type of device this partition uses), all the nodes included in this partition, its parent partitions, children partitions, input nodes, and output nodes. So far, only a single partition is considered, which means there is only one device with unlimited memory. A test unit call _test_find_single_partition()_ is added to test if all nodes in the graph are marked for the only partition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45429 Reviewed By: izdeby Differential Revision: D24026268 Pulled By: scottxu0730 fbshipit-source-id: 119d506f33049a59b54ad993670f4ba5d8e15b0b	2020-10-01 13:07:34 -07:00
Nikolay Korovaiko	dcda11c4d3	Disable tcuda_fuser tests in Profiling Mode (#45638 ) Summary: Disable tcuda_fuser tests in Profiling Mode to address flakey tests until fuser switches to the new approach. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45638 Reviewed By: mrshenli Differential Revision: D24057230 Pulled By: Krovatkin fbshipit-source-id: 8f7a47610d9b7da6ad3057208057a5a596e1bffa	2020-10-01 12:41:57 -07:00
Richard Zou	381f6d32a7	[docs] Fix hyperlinks for nn.CrossEntropyLoss (#45660 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45460. This PR makes it so that LogSoftmax and NLLLoss are correctly linked from the nn.CrossEntropyLoss documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45660 Test Plan: - built and viewed docs locally ![image](https://user-images.githubusercontent.com/5652049/94816513-ee85fb80-03c9-11eb-8289-56642c133e11.png) Reviewed By: glaringlee Differential Revision: D24049009 Pulled By: zou3519 fbshipit-source-id: 3bd0660acb8575d753cefd2d0f1e523ca58a25b6	2020-10-01 12:18:43 -07:00
Richard Zou	1efdbfabcc	[docs] Fix back quote rendering in loss modules docs (#45662 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42855. Previously, back quotes weren't rendering correctly in equations. This is because we were quoting things like `'mean'`. In order to backquote properly in latex in text-mode, the back-quote needs to be written as a back-tick. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45662 Test Plan: - built docs locally and viewed the changes. For NLLLoss (which is not the original module mentioned in the issue, but it has the same problem), we can see how the back quotes now render properly: ![image](https://user-images.githubusercontent.com/5652049/94819862-c5676a00-03cd-11eb-9e92-01380ee52bd6.png) Reviewed By: glaringlee Differential Revision: D24049880 Pulled By: zou3519 fbshipit-source-id: 61a1257994144549eb8f29f19d639aea962dfec0	2020-10-01 11:52:27 -07:00
Ivan Yashchuk	77cd8e006b	Added support for complex torch.symeig (#45121 ) Summary: This PR adds support for complex-valued input for `torch.symeig`. TODO: - [ ] complex cuda tests raise `RuntimeError: _th_bmm_out not supported on CUDAType for ComplexFloat` Update: Added xfailing tests for complex dtypes on CUDA. Once support for complex `bmm` is added these tests will work. Fixes https://github.com/pytorch/pytorch/issues/45061. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45121 Reviewed By: mrshenli Differential Revision: D24049649 Pulled By: anjali411 fbshipit-source-id: 2cd11f0e47d37c6ad96ec786762f2da57f25dac5	2020-10-01 08:57:13 -07:00
Edward Yang	4583edb5d6	Add NativeFunction.signature and kind. (#45131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45131 These make it easier to group native functions together and determine what kind of native function it is (inplace/out/functional). Currently they are not used but they may be useful for tools.autograd porters. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zhangguanheng66 Differential Revision: D23872526 Pulled By: ezyang fbshipit-source-id: 1d6e429ab9a1f0fdb764be4228c5bca4dce8f24e	2020-10-01 08:46:40 -07:00
Edward Yang	41bd5a5ee0	Switch all Sequences in tools.codegen.model to Tuple (#45127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45127 I thought I was being clever by using Sequence, which doesn't commit to List or Tuple, but forces read-onlyness in the type system. However, there is runtime implication to using List or Tuple: Lists can't be hashed, but Tuples can be! This is important because I shortly want to group by FunctionSchema, and to do this I need FunctionSchema to be hashable. Switch everything to Tuple for true immutability. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D23872527 Pulled By: ezyang fbshipit-source-id: 5c8fae1c50a5ae47b4167543646d94ddcafff8c3	2020-10-01 08:41:53 -07:00
iurii zdebskyi	a242ac8c27	Update torchvision version to current latest master (#45342 ) Summary: Updating torchvision version to the current latest master. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45342 Reviewed By: seemethere Differential Revision: D23933572 Pulled By: izdeby fbshipit-source-id: c374156eb608e882a1e2107143e39f03b7399081	2020-10-01 08:31:38 -07:00
Michael Carilli	72bc3d9de4	Use MTA for amp grad unscaling, enforce op math type in MTA functors, and allow op lambdas (#44778 ) Summary: Amp gradient unscaling is a great use case for multi tensor apply (in fact it's the first case I wrote it for). This PR adds an MTA unscale+infcheck functor. Really excited to have it for `torch.cuda.amp`. izdeby your interface was clean and straightforward to use, great work! Labeled as bc-breaking because the native_functions.yaml exposure of unscale+infcheck changes from [`_amp_non_finite_check_and_unscale_` to `_amp_foreach_non_finite_check_and_unscale_`]( https://github.com/pytorch/pytorch/pull/44778/files#diff-f1e4b2c15de770d978d0eb77b53a4077L6289-L6293). The PR also modifies Unary/Binary/Pointwise Functors to - do ops' internal math in FP32 for FP16 or bfloat16 inputs, which improves precision ([and throughput, on some architectures!](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions)) and has no downside for the ops we care about. - accept an instantiated op functor rather than an op functor template (`template<class> class Op`). This allows calling code to pass lambdas. Open question: As written now, the PR has MTA Functors take care of pre- and post-casting FP16/bfloat16 inputs to FP32 before running the ops. However, alternatively, the pre- and post-math casting could be deferred/written into the ops themselves, which gives them a bit more control. I can easily rewrite it that way if you prefer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44778 Reviewed By: gchanan Differential Revision: D23944102 Pulled By: izdeby fbshipit-source-id: 22b25ccad5f69b413c77afe8733fa9cacc8e766d	2020-10-01 07:51:16 -07:00
generatedunixname89002005325676	84cf3372d1	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D24044108 fbshipit-source-id: 6dfe2f1201304fa58e42472e3f53c72cbb63d7d2	2020-10-01 05:29:03 -07:00
generatedunixname89002005325674	592b398e82	[AutoAccept][Codemod][FBSourceGoogleJavaFormatLinter] Daily `arc lint --take GOOGLEJAVAFORMAT` Reviewed By: zertosh Differential Revision: D24044052 fbshipit-source-id: 50ac5b7480ed65af94617bf8b014252ea7b27c4f	2020-10-01 05:19:37 -07:00
Mike Ruberry	c36b354072	Revert D23913105: [quant][graphmode][fx] Merge all quantization mode Test Plan: revert-hammer Differential Revision: D23913105 (`ffcb0989e7`) Original commit changeset: 4e335286d6de fbshipit-source-id: 5765b4e8ec917423f1745f73a9f3f235fc53423d	2020-10-01 03:12:42 -07:00
James Reed	78b95b6204	Revert "Revert D24024606: [FX] Shape propagation example" (#45637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45637 This reverts commit 869b05648def7a3b01685da94d4ee36f671d5dd6. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24037870 Pulled By: jamesr66a fbshipit-source-id: 851beb42fe72383108ceeff1fe97f388d9ad059e	2020-10-01 01:07:56 -07:00
Xingying Cheng	4339f5c076	[PyTorch][QPL] Add instance_key into MOBILE_MODULE_LOAD_STATS logging. (#45518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45518 Similar to previous diff, Add instance_key into MOBILE_MODULE_LOAD_STATS logging. ghstack-source-id: 113149713 Test Plan: ``` 09-29 11:50:23.345 6477 9351 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterLoadModel instance_key = 2015064908 09-29 11:50:23.409 6477 9351 W MobileModuleQPLObserver.cpp: TESTINGTESTING markerAnnotate instance_key = 2015064908, model_name = bi_pytext_v10 09-29 11:50:23.410 6477 9351 W MobileModuleQPLObserver.cpp: TESTINGTESTING markerAnnotate instance_key = 2015064908, model_type = FBNet 09-29 11:50:23.410 6477 9351 W MobileModuleQPLObserver.cpp: TESTINGTESTING markerAnnotate instance_key = 2015064908, op_list_string = ["aten::__getitem__.t", "aten::__is__", "aten::__isnot__", "aten::add.Tensor", "aten::append.t", "aten::cat", "aten::contiguous", "aten::conv1d", "aten::dim", "aten::embedding", "aten::eq.int", "aten::format", "aten::len.t", "aten::max.dim", "aten::mul.Tensor", "aten::permute", "aten::relu", "aten::softmax.int", "aten::tanh", "prepacked::linear_clamp_run", "prim::RaiseException", "prim::TupleIndex", "prim::TupleUnpack", "prim::Uninitialized", "prim::unchecked_cast"] 09-29 11:50:23.410 6477 9351 W MobileModuleQPLObserver.cpp: TESTINGTESTING onExitLoadModel instance_key = 2015064908 ``` Reviewed By: iseeyuan Differential Revision: D23996150 fbshipit-source-id: 7bf76af3b7e6b346afd20ab341204743c81cfe83	2020-09-30 23:31:35 -07:00
Nikolay Korovaiko	d306d0c2b1	remove redundant PE(profiling executor) jobs in CI (#45397 ) Summary: This PR removes redundant profiling jobs since after the switch PE (https://github.com/pytorch/pytorch/pull/45396) will be now running by default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45397 Reviewed By: zhangguanheng66 Differential Revision: D23966890 Pulled By: Krovatkin fbshipit-source-id: ef184ca5fcf079580fa139b6653f8d9a6124050e	2020-09-30 22:18:02 -07:00
BowenBao	3da4cea658	[ONNX] Add dim_param support in export with onnx shape inference (#44920 ) Summary: * Support propagating `dim_param` in ONNX by encoding as `ShapeSymbol` in `SymbolicShape` of outputs. If export is called with `dynamic_axes` provided, shape inference will start with these axes set as dynamic. * Add new test file `test_pytorch_onnx_shape_inference.py`, reusing all test cases from `test_pytorch_onnx_onnxruntime.py`, but focus on validating shape for all nodes in graph. Currently this is not enabled in the CI, since there are still quite some existing issues and corner cases to fix. The test is default to run only at opset 12. * Bug fixes, such as div, _len, and peephole.cpp passes for PackPadded, and LogSoftmaxCrossEntropy. * This PR depends on existing PR such as 44332. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44920 Reviewed By: eellison Differential Revision: D23958398 Pulled By: bzinodev fbshipit-source-id: 00479d9bd19c867d526769a15ba97ec16d56e51d	2020-09-30 21:56:24 -07:00
Jerry Zhang	ffcb0989e7	[quant][graphmode][fx] Merge all quantization mode (#45292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45292 This PR merges all quantization mode and will only expose the following top level functions: ``` prepare_fx prepare_qat_fx convert_fx ``` Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23913105 fbshipit-source-id: 4e335286d6de225839daf51d1df54322d52d68e5	2020-09-30 21:20:34 -07:00
Xingying Cheng	3f440d74fc	[PyTorch][QPL] Add instance_key into MOBILE_MODULE_STATS logging. (#45517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45517 Add unique instance_key instead of the default one into MOBILE_MODULE_STATS logging to avoid multiple events overlaps. ghstack-source-id: 113149453 Test Plan: Make sure that each event's start, annotate and end are having the same instancekey: ``` 09-28 23:46:03.094 19349 21069 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod instance_key = 1123198800, method_name = forward 09-28 23:46:03.094 19349 21069 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod instance_key = 1123198800, model_name = bi_pytext_v10 09-28 23:46:03.094 19349 21069 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod instance_key = 1123198800, model_type = FBNet 09-28 23:46:03.094 19349 21069 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod instance_key = 1123198800, op_list_string = ["aten::__getitem__.t", "aten::__is__", "aten::__isnot__", "aten::add.Tensor", "aten::append.t", "aten::cat", "aten::contiguous", "aten::conv1d", "aten::dim", "aten::embedding", "aten::eq.int", "aten::format", "aten::len.t", "aten::max.dim", "aten::mul.Tensor", "aten::permute", "aten::relu", "aten::softmax.int", "aten::tanh", "prepacked::linear_clamp_run", "prim::RaiseException", "prim::TupleIndex", "prim::TupleUnpack", "prim::Uninitialized", "prim::unchecked_cast"] 09-28 23:46:03.181 19349 21069 W MobileModuleQPLObserver.cpp: TESTINGTESTING onExitRunMethod instance_key = 1123198800 09-28 23:46:04.183 19349 20896 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod instance_key = 1521608147, method_name = forward 09-28 23:46:04.184 19349 20896 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod instance_key = 1521608147, model_name = __torch__.Model 09-28 23:46:04.205 19349 20896 W MobileModuleQPLObserver.cpp: TESTINGTESTING onExitRunMethod instance_key = 1521608147 ``` Reviewed By: iseeyuan Differential Revision: D23985178 fbshipit-source-id: bcd5db8dc680e3cf8d12edf865377e80693cc23b	2020-09-30 20:13:33 -07:00
Mikhail Zolotukhin	75fc263579	[TensorExpr] Add a tensor expressions tutorial. (#45527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45527 Differential Revision: D23998787 Test Plan: Imported from OSS Reviewed By: eellison Pulled By: ZolotukhinM fbshipit-source-id: 1f78ccfe8ef13bf493812cfec7f2fd4853e630ee	2020-09-30 19:35:58 -07:00
Jerry Zhang	9d5607fcd9	[quant] Use PlaceholderObserver as default dynamic quant observer (#45343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45343 Current default dynamic quant observer is not correct since we don't accumulate min/max and we don't need to calculate qparams. Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D23933995 fbshipit-source-id: 3ff497c9f5f74c687e8e343ab9948d05ccbba09b	2020-09-30 19:01:18 -07:00
Taylor Robie	2b13d9413e	Re-land: Add callgrind collection to Timer #44717 (#45586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45586 Test Plan: The unit test has been softened to be less platform sensitive. Reviewed By: mruberry Differential Revision: D24025415 Pulled By: robieta fbshipit-source-id: ee986933b984e736cf1525e1297de6b21ac1f0cf	2020-09-30 17:43:06 -07:00
Yanan Cao	3a2d45304d	[Experimental][Partial] New implementation for torch.distributed APIs in C++ (#45547 ) Summary: This is an attempt at refactoring `torch.distributed` implementation. Goal is to push Python layer's global states (like _default_pg) to C++ layer such that `torch.distributed` becomes more TorchScript friendly. This PR adds the skeleton of C++ implementation, at the moment it is not included in any build (and won't be until method implementations are filled in). If you see any test failures related, feel free to revert. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45547 Reviewed By: izdeby Differential Revision: D24024213 Pulled By: gmagogsfm fbshipit-source-id: 2762767f63ebef43bf58e17f9447d53cf119f05f	2020-09-30 17:35:51 -07:00
Jerry Zhang	0b3ad5404a	[bot] Add quantization triage bot script (#45622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45622 Copied and modified from https://github.com/pytorch/pytorch/blob/master/.github/workflows/jit_triage.yml Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D24036142 fbshipit-source-id: 41287b6a0390cabe4474c99464d74da2c0934401	2020-09-30 17:19:41 -07:00
David Reiss	869b05648d	Revert D24024606: [FX] Shape propagation example Test Plan: revert-hammer Differential Revision: D24024606 (`ac9a708ed0`) Original commit changeset: 5340eab20f80 fbshipit-source-id: f465eb5e8e994b3b0bedbc779901f76b9ab16f02	2020-09-30 17:03:14 -07:00
Hector Yuen	f2c2b75e80	flush the buffer when printing the IR (#45585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45585 I discovered this bug when I was trying to print the graph to a file. Turns out I had to close the file, but flushing should be a good safeguard in case other users forget. Test Plan: Tested with and without flushing. with P144064292 without P144064767 Reviewed By: mortzur Differential Revision: D24023819 fbshipit-source-id: 39574b3615feb28e5b5939664c04ddfb1257706a	2020-09-30 16:55:27 -07:00
Meghan Lele	6fde2df1b8	[JIT] Update JIT triage project board workflow (#45613 ) Summary: This commit updates `.github/workflows/jit_triage.yml` to use the new `oncall: jit` tag instead of the old `jit` tag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45613 Reviewed By: izdeby Differential Revision: D24032388 Pulled By: SplitInfinity fbshipit-source-id: 6631a596b2f80bdb322caa74adaf0dc2cb146350	2020-09-30 16:36:23 -07:00
Zino Benaissa	4be42034b6	Clear shape information before finalizing graph-mode quantization (#45282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45282 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23909601 Pulled By: bzinodev fbshipit-source-id: 3062cda46b15a79094a360216c35906afab7c723	2020-09-30 16:13:55 -07:00
Malgi Nikitha Vivekananda	85a70ce71f	Add multiline string dedent support (#45580 ) Summary: Fixes #{44842} Summary ======== This PR adds support for multiline string dedents. Test ===== pytest -k test_multiline_string_dedents test/test_jit.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/45580 Reviewed By: wconstab Differential Revision: D24025866 Pulled By: nikithamalgifb fbshipit-source-id: 0f49739fb93f70f73a8f367caca2887f558a3937	2020-09-30 16:08:26 -07:00
Richard Barnes	56840f0a81	Prevent overflow in bucketize binary search Summary: The current `median` calculation in the bucketize binary search is done in a way which is well-known to produce overflow issues ([link](https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues)). This diff fixes the calculation so that overflows do not occur. Test Plan: Standard commit tests. Also can test with: ``` #include <cassert> #include <iostream> #include <cstdint> int32_t mp1(int32_t a, int32_t b){ return (a+b)/2; } int32_t mp2(int32_t a, int32_t b){ return a+(b-a)/2; } int main(){ int32_t low=-1; for(int32_t high=1;high<10000;high++){ if(mp1(low,high)!=mp2(low,high)){ std::cout<<"Ahhhh!"<<std::endl; } } } ``` Reviewed By: drdarshan Differential Revision: D23993920 fbshipit-source-id: 6b4567515552092de5876de6cab77df27c9cf61d	2020-09-30 15:04:11 -07:00
Sam Tsai	2596113a79	Add fuse support for batchnorm with affine=False (#45474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45474 When batchnorm affine is set to false, weight and bias is set to None, which is not supported in this case. Added a fix to set weights to 1 and bias to 0 if they are not set. Test Plan: Add unit test for testing fusing conv, batchnorm where batchnorm is in affine=False mode. Reviewed By: z-a-f Differential Revision: D23977080 fbshipit-source-id: 2782be626dc67553f3d27d8f8b1ddc7dea022c2a	2020-09-30 14:15:05 -07:00
Negin Raoof	6b42ca2d69	[ONNX] Update embedding_bag export (#44693 ) Summary: Export of embedding bag with dynamic list of offsets. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44693 Reviewed By: malfet Differential Revision: D23831980 Pulled By: bzinodev fbshipit-source-id: 3eaff1a0f20d1bcfb8039e518d78c491be381e1a	2020-09-30 13:36:40 -07:00
James Reed	ac9a708ed0	[FX] Shape propagation example (#45589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45589 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D24024606 Pulled By: jamesr66a fbshipit-source-id: 5340eab20f805c232bfeb37e4e2156f39a161c19	2020-09-30 13:18:23 -07:00
gunandrose4u	ffd50b8220	SET USE_DISTRIBUTED OFF when libuv is not installed (#45554 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45554 Reviewed By: izdeby Differential Revision: D24016825 Pulled By: mrshenli fbshipit-source-id: 332d860429626a915c06f98cad31e6db1cbc4eb1	2020-09-30 12:46:36 -07:00
Xinyu Li	c9bb990707	[c++] Distance-agnostic triplet margin loss (#45377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45377 This PR adds a C++ implementation of the TripletMarginWithDistanceLoss, for which the Python implementation was introduced in PR #43680. It's based on PR #44072, but I'm resubmitting this to unlink it from Phabricator. Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D24003973 fbshipit-source-id: 2d9ada7260a6f27425ff2fdbbf623dad0fb79405	2020-09-30 12:37:35 -07:00
Rohan Varma	181afd5220	Add an option to DDP to take a list of parameters to ignore upfront. (#44826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44826 As described in https://github.com/pytorch/pytorch/issues/43690, there is a need for DDP to be able to ignore certain parameters in the module (not install allreduce hooks) for certain use cases. `find_unused_parameters` is sufficient from a correctness perspective, but we can get better performance with this upfront list if users know which params are unused, since we won't have to traverse the autograd graph every iteration. To enable this, we add a field `parameters_to_ignore` to DDP init and don't pass in that parameter to reducer if that parameter is in the given list. ghstack-source-id: 113210109 Test Plan: Added unittest Reviewed By: xw285cornell, mrshenli Differential Revision: D23740639 fbshipit-source-id: a0411712a8b0b809b9c9e6da04bef2b955ba5314	2020-09-30 11:52:50 -07:00
Supriya Rao	c112e89cc6	[quant] Make choose_qparams_optimized return Tensors to preserve dtype (#45530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45530 Returning double values requires special handling as a return type for aten functions. Instead return tensors where the type is preserved in the tensor dtype Test Plan: python test/test_quantization.py TestQuantizedTensor.test_choose_qparams_optimized Imported from OSS Reviewed By: dskhudia Differential Revision: D24001134 fbshipit-source-id: bec6b17242f4740ab5674be06e0fc30c35eb0379	2020-09-30 11:35:23 -07:00
Meghan Lele	ce9df084d5	[pytorch] Replace "blacklist" in test/test_mobile_optimizer.py (#45512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45512 This diff addresses https://github.com/pytorch/pytorch/issues/41443. It is a clone of D23205313 which could not be imported from GitHub for strange reasons. Test Plan: Continuous integration. Reviewed By: AshkanAliabadi Differential Revision: D23967322 fbshipit-source-id: 744eb92de7cb5f0bc9540ed6a994f9e6dce8919a	2020-09-30 10:43:59 -07:00
Yi Zhang	a245dd4317	add dllexport before template specialization functions for windows build (#45477 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41896 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45477 Reviewed By: zhangguanheng66 Differential Revision: D24006579 Pulled By: walterddr fbshipit-source-id: 01e8808f0fecf9a405174fab5f348c02fb063e37	2020-09-30 10:39:23 -07:00
Jerry Zhang	5539066d12	[quant][graphmode][fx] Support quantization for custom module (#44074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44074 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23580642 fbshipit-source-id: a80b0b3e5e1f4c4a9647da872239cc0a4d58dd3b	2020-09-30 10:24:54 -07:00
Mike Ruberry	51d0ae9207	Revert D24010742: [pytorch][PR] Add callgrind collection to Timer Test Plan: revert-hammer Differential Revision: D24010742 (`9b27e0926b`) Original commit changeset: df6bc765f8ef fbshipit-source-id: 4c1edd57ea932896f7052716427059c924222501	2020-09-30 10:15:46 -07:00
Brian Hirsh	6c4aa2a79c	Revert D24002415: Some fixes to smooth_l1_loss Test Plan: revert-hammer Differential Revision: D24002415 (`fdbed7118e`) Original commit changeset: 980c141019ec fbshipit-source-id: 8981b5f6d982ed66c670122e437540444cb5f39c	2020-09-30 10:00:17 -07:00
Rong Rong	4f3920951e	type check for torch.quantization.quantize_jit (#45548 ) Summary: added type signal for more jit python functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/45548 Reviewed By: malfet Differential Revision: D24010922 Pulled By: walterddr fbshipit-source-id: 2fdd75482481adf2eddc01b915d7d5720fbb2b82	2020-09-30 09:17:00 -07:00
Iurii Zdebskyi	939e0389de	Update test_multi_tensor_optimizers test (#45510 ) Summary: Following up on previous [feedback](https://github.com/pytorch/pytorch/pull/45475/files#r496330797). Pull Request resolved: https://github.com/pytorch/pytorch/pull/45510 Reviewed By: heitorschueroff Differential Revision: D23992304 Pulled By: izdeby fbshipit-source-id: 4784ed8d79e09da3aa61880add6443e3a8d322e4	2020-09-30 08:59:18 -07:00
anjali411	415ed434aa	Add whitelist for complex backward (#45461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45461 This PR disables autograd for all C -> C, R -> C functions which are not included in the whitelist `GRADIENT_IMPLEMENTED_FOR_COMPLEX`. In practice, there will be a RuntimeError during forward computation when the outputs are differentiable: ``` >>> x=torch.randn(4, 4, requires_grad=True, dtype=torch.cdouble) >>> x.pow(3) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: pow does not support automatic differentiation for outputs with complex dtype. ``` The implicit assumption here is that all the C -> R functions have correct backward definitions. So before merging this PR, the following functions must be tested and verified to have correct backward definitions: `torch.abs` (updated in #39955 ), `torch.angle`, `torch.norm`, `torch.irfft`, `torch.istft`. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D23998156 Pulled By: anjali411 fbshipit-source-id: 370eb07fe56ac84dd8e2233ef7bf3a3eb8aeb179	2020-09-30 08:45:55 -07:00
gunandrose4u	7e863475d7	Upgrade ReadMe document to guide user to install libuv(1.39) in conda env on Windows platform (#45553 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45553 Reviewed By: SciPioneer Differential Revision: D24017246 Pulled By: mrshenli fbshipit-source-id: ec69f864a7acfbdddd60c3d2b442294ec3e34558	2020-09-30 08:28:47 -07:00
Erjia Guan	96540e918c	Add ShuffleDataset with buffer (#45290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45290 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D24001084 Pulled By: erjia-guan fbshipit-source-id: d8a7455cf3f18e1f8c1edc53c42c1a99c8573c51	2020-09-30 07:58:15 -07:00
Brian Hirsh	fdbed7118e	Some fixes to smooth_l1_loss (#45532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45532 - updated documentation - explicitly not supporting negative values for beta (previously the result was incorrect) - Removing default value for beta in the backwards function, since it's only used internally by autograd (as per convention) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D24002415 Pulled By: bdhirsh fbshipit-source-id: 980c141019ec2d437b771ee11fc1cec4b1fcfb48	2020-09-30 07:28:44 -07:00
VinodSKumar	e02868e12d	Unify Transformer coder Constructors (#45515 ) Summary: Fixes #{[45502](https://github.com/pytorch/pytorch/issues/45502)} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45515 Reviewed By: zhangguanheng66, ZolotukhinM Differential Revision: D23994644 Pulled By: glaringlee fbshipit-source-id: b8728e8dfd8857e27246ebb11b17c2d1b48796ca	2020-09-30 07:05:41 -07:00
Nikolay Korovaiko	7566823779	Enable PE + TE (#45546 ) Summary: This PR enables PE + TE for 1.7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45546 Reviewed By: ZolotukhinM Differential Revision: D24006940 Pulled By: Krovatkin fbshipit-source-id: a3326077d34a023941acdb06c4907c96e7ba0115	2020-09-30 06:49:59 -07:00
Taylor Robie	9b27e0926b	Add callgrind collection to Timer (#44717 ) Summary: This PR allows Timer to collect deterministic instruction counts for (some) snippets. Because of the intrusive nature of Valgrind (effectively replacing the CPU with an emulated one) we have to perform our measurements in a separate process. This PR writes a `.py` file containing the Timer's `setup` and `stmt`, and executes it within a `valgrind` subprocess along with a plethora of checks and error handling. There is still a bit of jitter around the edges due to the Python glue that I'm using, but the PyTorch signal is quite good and thus this provides a low friction way of getting signal. I considered using JIT as an alternative, but: A) Python specific overheads (e.g. parsing) are important B) JIT might do rewrites which would complicate measurement. Consider the following bit of code, related to https://github.com/pytorch/pytorch/issues/44484: ``` from torch.utils._benchmark import Timer counts = Timer( "x.backward()", setup="x = torch.ones((1,)) + torch.ones((1,), requires_grad=True)" ).collect_callgrind() for c, fn in counts[:20]: print(f"{c:>12} {fn}") ``` ``` 812800 ???:_dl_update_slotinfo 355600 ???:update_get_addr 308300 work/Python/ceval.c:_PyEval_EvalFrameDefault'2 304800 ???:__tls_get_addr 196059 ???:_int_free 152400 ???:__tls_get_addr_slow 138400 build/../c10/core/ScalarType.h:c10::typeMetaToScalarType(caffe2::TypeMeta) 126526 work/Objects/dictobject.c:_PyDict_LoadGlobal 114268 ???:malloc 101400 work/Objects/unicodeobject.c:PyUnicode_FromFormatV 85900 work/Python/ceval.c:_PyEval_EvalFrameDefault 79946 work/Objects/typeobject.c:_PyType_Lookup 72000 build/../c10/core/Device.h:c10::Device::validate() 70000 /usr/include/c++/8/bits/stl_vector.h:std::vector<at::Tensor, std::allocator<at::Tensor> >::~vector() 66400 work/Objects/object.c:_PyObject_GenericGetAttrWithDict 63000 ???:pthread_mutex_lock 61200 work/Objects/dictobject.c:PyDict_GetItem 59800 ???:free 58400 work/Objects/tupleobject.c:tupledealloc 56707 work/Objects/dictobject.c:lookdict_unicode_nodummy ``` Moreover, if we backport this PR to 1.6 (just copy the `_benchmarks` folder) and load those counts as `counts_1_6`, then we can easily diff them: ``` print(f"Head instructions: {sum(c for c, _ in counts)}") print(f"1.6 instructions: {sum(c for c, _ in counts_1_6)}") count_dict = {fn: c for c, fn in counts} for c, fn in counts_1_6: _ = count_dict.setdefault(fn, 0) count_dict[fn] -= c count_diffs = sorted([(c, fn) for fn, c in count_dict.items()], reverse=True) for c, fn in count_diffs[:15] + [["", "..."]] + count_diffs[-15:]: print(f"{c:>8} {fn}") ``` ``` Head instructions: 7609547 1.6 instructions: 6059648 169600 ???:_dl_update_slotinfo 101400 work/Objects/unicodeobject.c:PyUnicode_FromFormatV 74200 ???:update_get_addr 63600 ???:__tls_get_addr 46800 work/Python/ceval.c:_PyEval_EvalFrameDefault 33512 work/Objects/dictobject.c:_PyDict_LoadGlobal 31800 ???:__tls_get_addr_slow 31700 build/../aten/src/ATen/record_function.cpp:at::RecordFunction::RecordFunction(at::RecordScope) 28300 build/../torch/csrc/utils/python_arg_parser.cpp:torch::FunctionSignature::parse(_object, _object, _object, _object, bool) 27800 work/Objects/object.c:_PyObject_GenericGetAttrWithDict 27401 work/Objects/dictobject.c:lookdict_unicode_nodummy 24115 work/Objects/typeobject.c:_PyType_Lookup 24080 ???:_int_free 21700 work/Objects/dictobject.c:PyDict_GetItemWithError 20700 work/Objects/dictobject.c:PyDict_GetItem ... -3200 build/../c10/util/SmallVector.h:at::TensorIterator::binary_op(at::Tensor&, at::Tensor const&, at::Tensor const&, bool) -3400 build/../aten/src/ATen/native/TensorIterator.cpp:at::TensorIterator::resize_outputs(at::TensorIteratorConfig const&) -3500 /usr/include/c++/8/x86_64-redhat-linux/bits/gthr-default.h:std::unique_lock<std::mutex>::unlock() -3700 build/../torch/csrc/utils/python_arg_parser.cpp:torch::PythonArgParser::raw_parse(_object, _object, _object) -4207 work/Objects/obmalloc.c:PyMem_Calloc -4500 /usr/include/c++/8/bits/stl_vector.h:std::vector<at::Tensor, std::allocator<at::Tensor> >::~vector() -4800 build/../torch/csrc/autograd/generated/VariableType_2.cpp:torch::autograd::VariableType::add__Tensor(at::Tensor&, at::Tensor const&, c10::Scalar) -5000 build/../c10/core/impl/LocalDispatchKeySet.cpp:c10::impl::ExcludeDispatchKeyGuard::ExcludeDispatchKeyGuard(c10::DispatchKey) -5300 work/Objects/listobject.c:PyList_New -5400 build/../torch/csrc/utils/python_arg_parser.cpp:torch::FunctionParameter::check(_object, std::vector<pybind11::handle, std::allocator<pybind11::handle> >&) -5600 /usr/include/c++/8/bits/std_mutex.h:std::unique_lock<std::mutex>::unlock() -6231 work/Objects/obmalloc.c:PyMem_Free -6300 work/Objects/listobject.c:list_repeat -11200 work/Objects/listobject.c:list_dealloc -28900 build/../torch/csrc/utils/python_arg_parser.cpp:torch::FunctionSignature::parse(_object, _object, _object*, bool) ``` Remaining TODOs: Include a timer in the generated script for cuda sync. * Add valgrind to CircleCI machines and add a unit test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44717 Reviewed By: soumith Differential Revision: D24010742 Pulled By: robieta fbshipit-source-id: df6bc765f8efce7193893edba186cd62b4b23623	2020-09-30 05:52:54 -07:00
Ilia Cherniavskii	f5c95d5cf1	Source code level attribution in profiler (#43898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43898 Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py ``` Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Source Location ----------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- -------------------------------------------- ts_method_1 10.43% 235.364us 36.46% 822.920us 822.920us 1 test/test_profiler.py(70): test_source aten::add 7.52% 169.833us 8.88% 200.439us 200.439us 1 test/test_profiler.py(69): test_source aten::normal_ 6.26% 141.380us 6.26% 141.380us 141.380us 1 test/test_profiler.py(67): test_source aten::add 5.80% 130.830us 8.41% 189.800us 63.267us 3 test/test_profiler.py(72): test_source aten::sum 5.02% 113.340us 8.39% 189.475us 189.475us 1 test/test_profiler.py(64): ts_method_1 aten::add 4.58% 103.346us 6.33% 142.847us 142.847us 1 test/test_profiler.py(62): ts_method_1 aten::mul 4.05% 91.498us 9.62% 217.113us 217.113us 1 test/test_profiler.py(71): test_source aten::add 4.03% 90.880us 5.60% 126.405us 126.405us 1 test/test_profiler.py(58): ts_method_2 aten::empty 3.49% 78.735us 3.49% 78.735us 19.684us 4 test/test_profiler.py(72): test_source ``` Reviewed By: ngimel Differential Revision: D23432664 Pulled By: ilia-cher fbshipit-source-id: 83ad7ebe0c2502494d3b48c4e687802db9c77615	2020-09-30 00:57:35 -07:00
Xiang Gao	c2c7099944	Fix docs for kwargs, q-z (#43589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43589 Reviewed By: zhangguanheng66 Differential Revision: D24006259 Pulled By: mruberry fbshipit-source-id: 39abd474744f152648aad201d7311b42d20efc88	2020-09-29 22:57:02 -07:00
Xiao Wang	b4ba66ae32	Print tensor shapes and convolution parameters when cuDNN exception is thrown (#45023 ) Summary: Originally proposed at https://github.com/pytorch/pytorch/issues/44473#issuecomment-690670989 by colesbury . This PR adds the functionality to print relevant tensor shapes and convolution parameters along with the stack trace once a cuDNN exception is thrown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45023 Reviewed By: gchanan Differential Revision: D23932661 Pulled By: ezyang fbshipit-source-id: 5f5f570df6583271049dfc916fac36695f415331	2020-09-29 21:55:34 -07:00
Peng-Jen Chen	93650a82c9	Move prim::tolist math.log and aten::cpu to lite interpreter for translation model (#45482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45482 Working on some models that need these ops on lite interpreter. Test Plan: locally build and load/run the TS model without problem. Reviewed By: iseeyuan Differential Revision: D23906581 fbshipit-source-id: 01b9de2af2046296165892b837bc14a7e5d59b4e	2020-09-29 21:42:18 -07:00
Mikhail Zolotukhin	4aca63d38a	[TensorExpr] Change API for creating Load and Store expressions. (#45520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45520 With this change `Load`s and `Store`s no longer accept `Placeholder`s in their constructor and `::make` functions and can only be built with `Buf`. `Placeholder` gets its own `store`, `load`, `storeWithMask`, and `loadWithMask` method for more convenient construction. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D23998789 Pulled By: ZolotukhinM fbshipit-source-id: 3fe018e00c1529a563553b2b215f403b34aea912	2020-09-29 20:52:38 -07:00
Nikita Shulga	772ce9ac2c	Fix memory corruption when running torch.svd for complex.doubles (#45486 ) Summary: According to http://www.netlib.org/lapack/explore-html/d3/da8/group__complex16_g_esing_gaccb06ed106ce18814ad7069dcb43aa27.html rwork should be an array of doubles, but it was allocated as array of floats (actually ints) Fixes crash from https://github.com/pytorch/pytorch/issues/45269 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45486 Reviewed By: walterddr Differential Revision: D23984444 Pulled By: malfet fbshipit-source-id: 6a1b00a27de47046496ccf6a91b6e8ad283e42e6	2020-09-29 20:27:08 -07:00
Taylor Robie	ccad73ab41	Fix D23995953 import. Summary: https://github.com/pytorch/pytorch/pull/45511 could not be properly imported Test Plan: See https://github.com/pytorch/pytorch/pull/45511 Reviewed By: zhangguanheng66 Differential Revision: D23995953 fbshipit-source-id: a6224a67d54617ddf34c2392e65f2142c4e78ea4	2020-09-29 19:30:23 -07:00
Nikita Shulga	c87ff2cb90	Enable transposed tensor copy for complex types (#45487 ) Summary: This enables a special copy operator for transposed tensors with more than 360 elements: `417e3f85e5/aten/src/ATen/native/Copy.cpp (L19)` Steps to repro: python -c "import torch; print(torch.svd(torch.randn(61, 61, dtype=torch.complex64)))" Fixes https://github.com/pytorch/pytorch/issues/45269 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45487 Reviewed By: anjali411 Differential Revision: D23984441 Pulled By: malfet fbshipit-source-id: 10ce1d5f4425fb6de78e96adffd119e545b6624f	2020-09-29 19:22:05 -07:00
Xiang Gao	0a15646e15	CUDA RTX30 series support (#45489 ) Summary: I also opened a PR on cmake upstream: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/5292 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45489 Reviewed By: zhangguanheng66 Differential Revision: D23997844 Pulled By: ezyang fbshipit-source-id: 4e7443dde9e70632ee429184f0d51cb9aa5a98b5	2020-09-29 18:19:23 -07:00
Guilherme Leobas	c1e6592964	Enable type-checking of torch.nn.quantized.* modules (#43110 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43029 I am not changing the following files in this PR: * `torch/nn/quantized/dynamic/modules/rnn.py` due to https://github.com/pytorch/pytorch/issues/43072 * `torch/nn/quantized/modules/conv.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/43110 Reviewed By: gchanan Differential Revision: D23963258 Pulled By: ezyang fbshipit-source-id: 0fb0fd13af283f6f7b3434e7bbf62165357d1f98	2020-09-29 18:14:29 -07:00
Guilherme Leobas	375a83e6c1	Annotate torch.utils.(tensorboard/show_pickle/hypify) (#44216 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44215 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44216 Reviewed By: gchanan Differential Revision: D23963216 Pulled By: ezyang fbshipit-source-id: b3fed51b2a1cbd05e3cd0222c89c38d61d8968c1	2020-09-29 18:14:26 -07:00
Guilherme Leobas	eb39542e67	Add typing annotations for torch.utils.data.* modules (#44136 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44135 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44136 Reviewed By: gchanan Differential Revision: D23963273 Pulled By: ezyang fbshipit-source-id: 939234dddbe89949bd8e5ff05d06f6c8add6935c	2020-09-29 18:12:05 -07:00
peter	33aba57e4c	Patch generate files for system protobuf (#44583 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42939 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44583 Reviewed By: albanD Differential Revision: D23692639 Pulled By: ezyang fbshipit-source-id: 49781f704dd6ceab7717b63225d0b4076ce33daa	2020-09-29 18:06:33 -07:00
Thomas Viehmann	22a34bcf4e	ROCm {emoji:2764} TensorExpr (#45506 ) Summary: This might be an alternative to reverting https://github.com/pytorch/pytorch/issues/45396 . The obvious rough edge is that I'm not really seeing the work group limits that TensorExpr produces. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45506 Reviewed By: zhangguanheng66 Differential Revision: D23991410 Pulled By: Krovatkin fbshipit-source-id: 11d3fc4600e4bffb1d1192c6b8dd2fe22c1e064e	2020-09-29 16:52:16 -07:00
Iurii Zdebskyi	637570405b	Disable multi tensor tesnor tests on rocm (#45535 ) Summary: Disable multi tensor test on rocm Pull Request resolved: https://github.com/pytorch/pytorch/pull/45535 Reviewed By: ngimel Differential Revision: D24002557 Pulled By: izdeby fbshipit-source-id: 608c9389e3d9cd7dac49ea42c9bb0af55662c754	2020-09-29 15:49:21 -07:00
Hongyi Jia	06a566373a	[PyTorch/NCCL] Fix async error handling (#45456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45456 Remove work while not holding lock, to avoid deadlock with watchdog thread while GPU is 100% SyncBatchNorm failure trace: P143879560 Test Plan: Desync test: BACKEND=nccl WORLD_SIZE=3 NCCL_ASYNC_ERROR_HANDLING=1 ./buck-out/gen/caffe2/test/distributed/distributed_nccl_spawn#binary.par -r test_DistributedDataParallel_desync SyncBatchNorm test: BACKEND=nccl WORLD_SIZE=3 NCCL_ASYNC_ERROR_HANDLING=1 ./buck-out/gen/caffe2/test/distributed/distributed_nccl_fork#binary.par -r test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient Reviewed By: osalpekar Differential Revision: D23972071 fbshipit-source-id: f03d9637a6ec998d64dab1a062a81e0f3697275f	2020-09-29 15:44:34 -07:00
Garret Catron	ef41472544	Create experimental FX graph manipulation library (#44775 ) Summary: This PR adds a new GraphManipulation library for operating on the GraphModule nodes. It also adds an implementation of replace_target_nodes_with, which replaces all nodes in the GraphModule or a specific op/target with a new specified op/target. An example use of this function would be replacing a generic operator with an optimized operator for specific sizes and shapes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44775 Reviewed By: jamesr66a Differential Revision: D23874561 Pulled By: gcatron fbshipit-source-id: e1497cd11e0bbbf1fabdf137d65c746248998e0b	2020-09-29 15:32:41 -07:00
Martin Yuan	d642992877	Quantized operators template selective (#45509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45509 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44479 Test Plan: Imported from OSS Reviewed By: dhruvbird Differential Revision: D23626562 Pulled By: iseeyuan fbshipit-source-id: c2fc8bad25f8e5e9a70eb1001b9066a711b8e8e7	2020-09-29 14:52:27 -07:00
Randall Hunt	ab5cf16b6c	fix standard deviation gradient NaN behavior (#45468 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/4320 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45468 Reviewed By: zhangguanheng66 Differential Revision: D23991064 Pulled By: albanD fbshipit-source-id: d4274895f2dac8b2cdbd73e5276ce3df466fc341	2020-09-29 13:47:29 -07:00
anjali411	18876b5722	Update backward formula for torch.dot and add backward definition for torch.vdot (#45074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45074 TODO: Add R -> C tests in https://github.com/pytorch/pytorch/pull/44744 (blocked on some JIT changes) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D23975361 Pulled By: anjali411 fbshipit-source-id: 3512bd2962b588a198bc317673bd18cc96ac823f	2020-09-29 12:52:03 -07:00
mattip	147c88ef2d	Add docs to a pytorch.github.io/doc/tag directory when repo is tagged (#45204 ) Summary: In coordination with jlin27. This PR is meant to build documentation when the repo is tagged. For instance, tagging the repo with 1.7.0rc1 will push that commit's documentation to pytorch/pytorch.github.io/docs/1.7. Subsequently tagging 1.7.0rc2 will override the 1.7 docs, as will 1.7.0, and 1.7.1. I think this is as it should be: there should be one, latest, version for the 1.7 docs. This can be tweaked differently if desired. There is probably work that needs to be done to adjust the [versions.html](https://pytorch.org/docs/versions.html) to add the new tag? Is there a way to test the tagging side of this without breaking the production documentation? As an aside, the documentation is being built via the `pytorch_linux_xenial_py3_6_gcc5_4_build` image. Some projects are starting to move on from python3.6 since [it is in security-only support mode](https://devguide.python.org/#status-of-python-branches), no new binaries are being released. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45204 Reviewed By: zhangguanheng66 Differential Revision: D23996800 Pulled By: seemethere fbshipit-source-id: a94a080348a47738c1de5832ab37b2b0d57d2d57	2020-09-29 12:31:30 -07:00
Mike Ruberry	b66ac1e928	Updates nonzero's as_tuple behavior to no longer warn. (#45413 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44284. [torch.nonzero](https://pytorch.org/docs/master/generated/torch.nonzero.html?highlight=nonzero#torch.nonzero) is distinct from [numpy.nonzero](https://numpy.org/doc/1.18/reference/generated/numpy.nonzero.html?highlight=nonzero#numpy.nonzero). The latter returns a tensor by default, and the former returns a tuple of tensors. The `as_tuple` argument was added as part of an intended deprecation process to make torch.nonzero consistent with numpy.nonzero, but this was a confusing change for users. A better deprecation path would be to offer torch.argwhere consistent with [numpy.argwhere](https://numpy.org/doc/stable/reference/generated/numpy.argwhere.html?highlight=argwhere#numpy.argwhere), which is equivalent to the default torch.nonzero behavior. Once this is offered a change to torch.nonzero should be more straightforward with less user disruption, if we decided that's the correct change to pursue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45413 Reviewed By: ngimel Differential Revision: D23975015 Pulled By: mruberry fbshipit-source-id: b59237d0d8c2df984e952b62d0a7c247b49d84dc	2020-09-29 12:16:59 -07:00
Hong Xu	0df99ad470	Remove unnecessary __at_align32__ in int_elementwise_binary_256 (#45470 ) Summary: They were added in 4b3046ed286e92b5910769bf97f2bc6a1ad473d1 based on a misunderstanding of `_mm256_storeu_si256`, but they are actually unnecessary. The [document][1] of `_mm256_storeu_si256` says: > Moves values from a integer vector to an unaligned memory location. In this case, it's better to remove the `__at_align32__` qualifier to leave the compiler and linker more flexibility to optimize. [1]: https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/intrinsics-for-intel-advanced-vector-extensions/intrinsics-for-load-and-store-operations-1/mm256-storeu-si256.html Close https://github.com/pytorch/pytorch/issues/44810 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45470 Reviewed By: zhangguanheng66 Differential Revision: D23980060 Pulled By: glaringlee fbshipit-source-id: 12b3558b76c6e81d88a72081060fdb8674464768	2020-09-29 11:55:25 -07:00
Kimish Patel	6e55a26e10	Move mobile specific CPUCachingAllocator to c10/mobile folder. (#45364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45364 Plus add some more comments about the usage, limitations and cons. Test Plan: Build and run benchmark binary. Reviewed By: gchanan Differential Revision: D23944193 fbshipit-source-id: 30d4f4991d2185a0ab768d94c846d73730fc0835	2020-09-29 11:33:26 -07:00
Mike Ruberry	b2925671b6	Updates deterministic flag to throw a warning, makes docs consistent (#45410 ) Summary: Per feedback in the recent design review. Also tweaks the documentation to clarify what "deterministic" means and adds a test for the behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45410 Reviewed By: ngimel Differential Revision: D23974988 Pulled By: mruberry fbshipit-source-id: e48307da9c90418fc6834fbd67b963ba2fe0ba9d	2020-09-29 11:17:33 -07:00
Michael Carilli	aa2bd7e1ae	Conservative-ish persistent RNN heuristics for compute capability 8.0+ (#43165 ) Summary: Based on https://github.com/pytorch/pytorch/pull/43165#issuecomment-697033663 and tests by Vasily Volkov ([persistentRNN-speedup.xlsx](https://github.com/pytorch/pytorch/files/5298001/persistentRNN-speedup.xlsx)). See comments in code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43165 Reviewed By: zhangguanheng66, mruberry Differential Revision: D23991756 Pulled By: ngimel fbshipit-source-id: 4c2c14c9002be2fec76fb21ba55b7dab79497510	2020-09-29 11:14:55 -07:00
Ivan Yashchuk	f47fd0eb72	Updated `cholesky_backward` for complex inputs (#45267 ) Summary: Updated `cholesky_backward` to work correctly for complex input. Note that the current implementation gives the conjugate of what JAX would return. anjali411 is that correct thing to do? Ref. https://github.com/pytorch/pytorch/issues/44895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45267 Reviewed By: bwasti Differential Revision: D23975269 Pulled By: anjali411 fbshipit-source-id: 9908b0bb53c411e5ad24027ff570c4f0abd451e6	2020-09-29 11:07:32 -07:00
Hong Xu	15f85eea18	Support bfloat16 and complex dtypes for logical_not (#43537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43537 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23751950 Pulled By: mruberry fbshipit-source-id: d07ecd9aae263eb8e00928d4fc981e0d66066fbb	2020-09-29 11:00:05 -07:00
Xingying Cheng	ea59251f51	Fix model_name not logged properly issue. (#45488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45488 model_name logging was broken, issue is from the recent change of assigning the method name into the module name, this diff is fixing it. ghstack-source-id: 113103942 Test Plan: made sure that now the model_name is logged from module_->name(). verified with one model which does not contain the model metadata, and the model_name field is logged as below: 09-28 21:59:30.065 11530 12034 W module.cpp: TESTINGTESTING run() module = __torch__.Model 09-28 21:59:30.065 11530 12034 W module.cpp: TESTINGTESTING metadata does not have model_name assigning to __torch__.Model 09-28 21:59:30.066 11530 12034 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod log model_name = __torch__.Model 09-28 21:59:30.066 11530 12034 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod log method_name = labels 09-28 21:59:30.068 11530 12034 W MobileModuleQPLObserver.cpp: TESTINGTESTING onExitRunMethod() Reviewed By: linbinyu Differential Revision: D23984165 fbshipit-source-id: 5b00f50ea82106b695c2cee14029cb3b2e02e2c8	2020-09-29 10:37:36 -07:00
Meghan Lele	09b3e16b40	[JIT] Enable @unused syntax for ignoring properties (#45261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45261 Summary This commit enables `unused` syntax for ignoring properties. Inoring properties is more intuitive with this feature enabled. `ignore` is not supported because class type properties cannot be executed in Python (because they exist only as TorchScript types) like an `ignored` function and module properties that cannot be scripted are not added to the `ScriptModule` wrapper so that they may execute in Python. Test Plan This commit updates the existing unit tests for class type and module properties to test properties ignored using `unused`. Test Plan: Imported from OSS Reviewed By: navahgar, Krovatkin, mannatsingh Differential Revision: D23971881 Pulled By: SplitInfinity fbshipit-source-id: 8d3cc1bbede7753d6b6f416619e4660c56311d33	2020-09-29 10:24:25 -07:00
Akshit Khurana	5f49d14be2	Add mobile_optimized tag to optimized model. (#45479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45479 Add a top level boolean attribute to the model called mobile_optimized that is set to true if it is optimized. Test Plan: buck test //caffe2/test:mobile passes Reviewed By: kimishpatel Differential Revision: D23956728 fbshipit-source-id: 79c5931702208b871454319ca2ab8633596b1eb8	2020-09-29 10:06:57 -07:00
Ivan Kobzarev	17be7c6e5c	[vulkan][android][test_app] Add test_app variant that runs module on Vulkan (#44897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44897 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D23763770 Pulled By: IvanKobzarev fbshipit-source-id: 6ad16b7271c745313a71da64a629a764258bbc85	2020-09-29 10:00:46 -07:00
Ivan Kobzarev	2c300fd74c	[android][vulkan] Module load argument to specify device cpu/vulkan (#44896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44896 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D23763771 Pulled By: IvanKobzarev fbshipit-source-id: 990a386ad13c704f03345dbe09e180281af913c9	2020-09-29 09:58:22 -07:00
Heitor Schueroff de Souza	fe9019cbfe	Reorganized Sorting.cpp method order (#45083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45083 This PR just reorders the methods in Sorting.cpp placing related methods next to each other. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D23908817 Pulled By: heitorschueroff fbshipit-source-id: 1dd7b693b5135fddf5dff12303474e85ce0c2f83	2020-09-29 09:49:31 -07:00
Mike Ruberry	ab5edf21b0	Revert D23789657: [wip] fast typeMeta/ScalarType conversion approach 2 Test Plan: revert-hammer Differential Revision: D23789657 (`1ed1a2f5b0`) Original commit changeset: 5afdd52d24bd fbshipit-source-id: 6d827be8895bcb39c8e85342eee0f7a3f5056c76	2020-09-29 09:40:53 -07:00
Nikita Shulga	b3135c2056	Enable torch.cuda.amp typechecking (#45480 ) Summary: Fix `torch._C._autocast_*_nesting` declarations in __init__.pyi Fix iterable constructor logic: not every iterable can be constructed using `type(val)(val)` trick, for example it would not work for `val=range(10)` although `isinstance(val, Iterable)` is True Change optional resolution logic to meet mypy expectations Fixes https://github.com/pytorch/pytorch/issues/45436 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45480 Reviewed By: walterddr Differential Revision: D23982822 Pulled By: malfet fbshipit-source-id: 6418a28d04ece1b2427dcde4b71effb67856a872	2020-09-29 09:31:55 -07:00
Xiao Wang	df0de780c3	Add cusolver guard for cuda >= 10.1.243 (#45452 ) Summary: See https://github.com/pytorch/pytorch/issues/45403 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45452 Reviewed By: mruberry Differential Revision: D23977009 Pulled By: ngimel fbshipit-source-id: df66425773d7500fa37e64d5e4bcc98167016be3	2020-09-29 09:25:20 -07:00
Mike Ruberry	bb19a55429	Improves fft doc consistency and makes deprecation warnings more prominent (#45409 ) Summary: This PR makes the deprecation warnings for existing fft functions more prominent and makes the torch.stft deprecation warning consistent with our current deprecation planning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45409 Reviewed By: ngimel Differential Revision: D23974975 Pulled By: mruberry fbshipit-source-id: b90d8276095122ac3542ab625cb49b991379c1f8	2020-09-29 09:07:49 -07:00
gunandrose4u	0a38aed025	Auto set libuv_ROOT env var for Gloo submodule on Windows platform (#45484 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45484 Reviewed By: lw Differential Revision: D23990724 Pulled By: mrshenli fbshipit-source-id: 1987ce7eb7d3f9d3120c07e954cd6581cd3caf59	2020-09-29 08:58:56 -07:00
Mike Ruberry	6d37126a10	Makes rdiv consistent with div (#45407 ) Summary: In addition to making rdiv consistent with div, this PR significantly expands division testing, accounting for floor_divide actually performing truncation division, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45407 Reviewed By: ngimel Differential Revision: D23974967 Pulled By: mruberry fbshipit-source-id: 82b46b07615603f161ab7cd1d3afaa6d886bfe95	2020-09-29 08:34:01 -07:00
Himangshu	7cde662f08	Add check for Complex Type to allow non integral alpha. (#45200 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45184 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45200 Reviewed By: gchanan Differential Revision: D23940134 Pulled By: anjali411 fbshipit-source-id: cce7b1efc22ec189ba6c83e31ce712bb34997139	2020-09-29 07:36:46 -07:00
Peter Bell	0806c58e9f	Optimize view_as_complex and view_as_real (#44908 ) Summary: This avoids unnecessary memory allocations in `view_as_complex` and `view_as_real`. I construct the new tensor directly with the existing storage to avoid creating a new storage object and also use `DimVector`s to avoid allocating for the sizes and strides. Overall, this saves about 2 us of overhead from `torch.fft.fft` which currently has to call `view_as_real` and `view_as_complex` for every call. I've used this simple benchmark to measure the overhead: ```python In [1]: import torch ...: a = torch.rand(1, 2) ...: ac = torch.view_as_complex(a) ...: %timeit torch.view_as_real(ac) ...: %timeit torch.view_as_complex(a) ...: %timeit ac.real ``` Results before: ``` 2.5 µs ± 62.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 2.22 µs ± 36 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 4.17 µs ± 8.76 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` and after: ``` 1.83 µs ± 9.26 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 1.57 µs ± 7.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 3.47 µs ± 34.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44908 Reviewed By: agolynski Differential Revision: D23793479 Pulled By: anjali411 fbshipit-source-id: 64b9cad70e3ec10891310cbfa8c0bdaa1d72885b	2020-09-29 07:30:38 -07:00
Mike Ruberry	87f98a5b54	Updates torch.floor_divide documentation to clarify it's actually torch.trunc_divide (or torch.rtz_divide) (#45411 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/43874 for 1.7. 1.8 will need to take floor_divide through a proper deprecation process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45411 Reviewed By: ngimel Differential Revision: D23974997 Pulled By: mruberry fbshipit-source-id: 16dd07e50a17ac76bfc93bd6b71d4ad72d909bf4	2020-09-29 05:55:44 -07:00
Antonio Cuni	37f9af7f29	Missing tests about torch.xxx(out=...) (#44465 ) Summary: PR opened just to run the CI tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/44465 Reviewed By: ngimel Differential Revision: D23907565 Pulled By: mruberry fbshipit-source-id: 620661667877f1e9a2bab17d19988e2dc986fc0f	2020-09-29 04:54:46 -07:00
Mike Ruberry	56af122659	Revert D23966878: [pytorch][PR] This PR flips a switch to enable PE + TE Test Plan: revert-hammer Differential Revision: D23966878 (`dddb685c11`) Original commit changeset: 2010a0b07c59 fbshipit-source-id: 132556039730fd3e4babd0d7ca8daf9c8d14f728	2020-09-29 04:33:19 -07:00
Basil Hosmer	1ed1a2f5b0	[wip] fast typeMeta/ScalarType conversion approach 2 (#44965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44965 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23789657 Pulled By: bhosmer fbshipit-source-id: 5afdd52d24bd097891ff4a7313033f7bd400165e	2020-09-29 02:39:36 -07:00
Supriya Rao	489af4ddcb	[quant] Add quant APIs to save/load observer state_dict (#44846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44846 The save function traverses the model state dict to pick out the observer stats load function traverse the module hierarchy to load the state dict into module attributes depending on observer type Test Plan: python test/test_quantization.py TestQuantizeFx.test_save_observer_state_dict Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23746821 fbshipit-source-id: 05c571b62949a2833602d736a81924d77e7ade55	2020-09-29 01:52:42 -07:00
Zafar	bb478810e0	[quant] torch.max_pool1d (#45152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45152 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23846473 Pulled By: z-a-f fbshipit-source-id: 38fd611e568e4f8b39b7a00adeb42c7b99576360	2020-09-29 01:45:22 -07:00
Mikhail Zolotukhin	b86008ab75	[TensorExpr] Remove buf_ field from class Tensor. (#45390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45390 Tensor objects should always refer to their Function's bufs. Currently we never create a Tensor with a buffer different than of its function, but having it in two places seems incorrect and dangerous. Differential Revision: D23952865 Test Plan: Imported from OSS Reviewed By: nickgg Pulled By: ZolotukhinM fbshipit-source-id: e63fc26d7078427514649d9ce973b74ea635a94a	2020-09-29 01:21:57 -07:00
Mikhail Zolotukhin	3c33695a6d	[TensorExpr] Rename `Buffer` to `Placeholder`. (#45389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45389 Differential Revision: D23952866 Test Plan: Imported from OSS Reviewed By: nickgg Pulled By: ZolotukhinM fbshipit-source-id: 17eedd3ac17897501403482ac1866c569d247c75	2020-09-29 01:21:54 -07:00
Mikhail Zolotukhin	92306b85d5	[TensorExpr] Consolidate {buffer,function,tensor}.{h.cpp} in tensor.{h,cpp}. (#45388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45388 Classes defined in these files are closely related, so it is reasonable to have them all in one file. The change is purely a code move. Differential Revision: D23952867 Test Plan: Imported from OSS Reviewed By: nickgg Pulled By: ZolotukhinM fbshipit-source-id: 12cfaa968bdfc4dff00509e34310a497c7b59155	2020-09-29 01:17:10 -07:00
vishalrao487	d2623da52c	replaced whitelist with allowlist (#45260 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41754 (1) Intially file was named gen_op_registration_whitelist.py I changed it to gen_op_registration_allowlist.py (2) There were some whitelist in comment inside the file, I changed it to allowlist ![update1](https://user-images.githubusercontent.com/62737243/94106752-b296e780-fe59-11ea-8541-632a1dbf90d6.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45260 Reviewed By: dhruvbird Differential Revision: D23947182 Pulled By: ljk53 fbshipit-source-id: 31b486592451dbb0605d7950e07747cbb72ab80f	2020-09-29 00:27:46 -07:00
Iurii Zdebskyi	8c309fc052	Add more tests for mt optimizers (#45475 ) Summary: Add more test cases for mt optimizers and fix Adam/AdamW Pull Request resolved: https://github.com/pytorch/pytorch/pull/45475 Reviewed By: soumith Differential Revision: D23982727 Pulled By: izdeby fbshipit-source-id: 4b24d37bd52a2fa3719d3e3a5dcf3b96990b0f5b	2020-09-28 23:59:58 -07:00
James Reed	6bdb871d47	[FX] Lint pass for Graphs (#44973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44973 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23792631 Pulled By: jamesr66a fbshipit-source-id: d8faef0c311d8bd611ba0a7e1e2f353e3e5a1068	2020-09-28 23:00:32 -07:00
James Reed	b0bdc82a00	[FX][EZ] Fix bug where copying node made non-unique name (#45311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45311 Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D23917864 Pulled By: jamesr66a fbshipit-source-id: 10d0a4017ffe160bce4ba0d830e035616bbded74	2020-09-28 22:55:20 -07:00
lixinyu	417e3f85e5	Support tuple inputs in NN Module test (#44853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44853 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23750441 Pulled By: glaringlee fbshipit-source-id: 1b111a370a726b40521134b711c35f48dda99411	2020-09-28 22:05:05 -07:00
Nikolay Korovaiko	dddb685c11	This PR flips a switch to enable PE + TE (#45396 ) Summary: This PR flips a switch to enable PE + TE next PR: https://github.com/pytorch/pytorch/pull/45397 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45396 Reviewed By: suo Differential Revision: D23966878 Pulled By: Krovatkin fbshipit-source-id: 2010a0b07c595992a88b3fe0792d6af315cf421e	2020-09-28 21:57:50 -07:00
Natalia Gimelshein	50b91103a9	add self cuda time to avoid double/quadruple counting (#45209 ) Summary: In profiler, cuda did not report self time, so for composite functions there was no way to determine which function is really taking time. In addition, "total cuda time" reported was frequently more than total wallclock time. This PR adds "self CUDA time" in profiler, and computes total cuda time based on self cuda time, similar to how it's done for CPU. Also, slight formatting changes to make table more compact. Before: ``` -------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls -------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- aten::matmul 0.17% 890.805us 99.05% 523.401ms 5.234ms 49.91% 791.184ms 7.912ms 100 aten::mm 98.09% 518.336ms 98.88% 522.511ms 5.225ms 49.89% 790.885ms 7.909ms 100 aten::t 0.29% 1.530ms 0.49% 2.588ms 25.882us 0.07% 1.058ms 10.576us 100 aten::view 0.46% 2.448ms 0.46% 2.448ms 12.238us 0.06% 918.936us 4.595us 200 aten::transpose 0.13% 707.204us 0.20% 1.058ms 10.581us 0.03% 457.802us 4.578us 100 aten::empty 0.14% 716.056us 0.14% 716.056us 7.161us 0.01% 185.694us 1.857us 100 aten::as_strided 0.07% 350.935us 0.07% 350.935us 3.509us 0.01% 156.380us 1.564us 100 aten::stride 0.65% 3.458ms 0.65% 3.458ms 11.527us 0.03% 441.258us 1.471us 300 -------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 528.437ms CUDA time total: 1.585s Recorded timeit time: 789.0814 ms ``` Note recorded timeit time (with proper cuda syncs) is 2 times smaller than "CUDA time total" reported by profiler After ``` -------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls -------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::matmul 0.15% 802.716us 99.06% 523.548ms 5.235ms 302.451us 0.04% 791.151ms 7.912ms 100 aten::mm 98.20% 519.007ms 98.91% 522.745ms 5.227ms 790.225ms 99.63% 790.848ms 7.908ms 100 aten::t 0.27% 1.406ms 0.49% 2.578ms 25.783us 604.964us 0.08% 1.066ms 10.662us 100 aten::view 0.45% 2.371ms 0.45% 2.371ms 11.856us 926.281us 0.12% 926.281us 4.631us 200 aten::transpose 0.15% 783.462us 0.22% 1.173ms 11.727us 310.016us 0.04% 461.282us 4.613us 100 aten::empty 0.11% 591.603us 0.11% 591.603us 5.916us 176.566us 0.02% 176.566us 1.766us 100 aten::as_strided 0.07% 389.270us 0.07% 389.270us 3.893us 151.266us 0.02% 151.266us 1.513us 100 aten::stride 0.60% 3.147ms 0.60% 3.147ms 10.489us 446.451us 0.06% 446.451us 1.488us 300 -------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 528.498ms CUDA time total: 793.143ms Recorded timeit time: 788.9832 ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45209 Reviewed By: zou3519 Differential Revision: D23925491 Pulled By: ngimel fbshipit-source-id: 7f9c49238d116bfd2db9db3e8943355c953a77d0	2020-09-28 21:51:13 -07:00
Ilia Cherniavskii	35596d39e9	Coalesce TLS accesses in RecordFunction constructor (#44970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44970 Right now, when RecordFunction is not active (usual case), we do two TLS accesses (check for thread local callbacks, and check for thread local boolean). Experimenting with reducing number of TLS accesses in RecordFunction constructor. Test Plan: record_function_benchmark Reviewed By: dzhulgakov Differential Revision: D23791165 Pulled By: ilia-cher fbshipit-source-id: 6137ce4bface46f540ece325df9864fdde50e0a4	2020-09-28 21:42:23 -07:00
Rong Rong	5a6a31168f	add circle ci job name dimension to report test stats (#45457 ) Summary: To support abnormal detection for test time spike Pull Request resolved: https://github.com/pytorch/pytorch/pull/45457 Reviewed By: malfet Differential Revision: D23975628 Pulled By: walterddr fbshipit-source-id: f28d0f12559070004d637d5bde83289f029b15b8	2020-09-28 20:51:58 -07:00
Shen Li	5be954b502	Fix WorkerInfo link format (#45476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45476 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23982069 Pulled By: mrshenli fbshipit-source-id: 6d932e77c1941dfd96592b388353f0fc8968dde6	2020-09-28 20:48:15 -07:00
Shen Li	8e47fcba5f	Update docs for RPC async_execution (#45458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45458 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23973366 Pulled By: mrshenli fbshipit-source-id: 3697f07fa972db21746aa25eaf461c1b93293f58	2020-09-28 20:48:12 -07:00
Shen Li	c5ade5f698	Fix no_sync docs (#45455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45455 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23973365 Pulled By: mrshenli fbshipit-source-id: 87c9878cdc7310754670b83efa65ae6f877f86fb	2020-09-28 20:48:09 -07:00
Shen Li	6967e6295e	Fix DDP docs (#45454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45454 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23973367 Pulled By: mrshenli fbshipit-source-id: 11f20d51d0d0f92f199e4023f02b86623867bae0	2020-09-28 20:43:22 -07:00
anjali411	534f2ae582	Disable inplace abs for complex tensors (#45069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45069 `torch.abs` is a `C -> R` function for complex input. Following the general semantics in torch, the in-place version of abs should be disabled for complex input. Test Plan: Imported from OSS Reviewed By: glaringlee, malfet Differential Revision: D23818397 Pulled By: anjali411 fbshipit-source-id: b23b8d0981c53ba0557018824d42ed37ec13d4e2	2020-09-28 20:33:35 -07:00
Aliaksandr Ivanou	208df1aeb8	Use python 3.8 in pytorch docker image (#45466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45466 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D23975294 Pulled By: tierex fbshipit-source-id: 964de7928b541121963e9de792630bcef172bb5c	2020-09-28 19:21:40 -07:00
kiyosora	8c66cd120b	Disable complex inputs to torch.round (#45330 ) Summary: - Related with https://github.com/pytorch/pytorch/issues/44612 - Disable complex inputs to `torch.round` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45330 Reviewed By: gchanan Differential Revision: D23970781 Pulled By: anjali411 fbshipit-source-id: b8c9ac315ae0fc872701aa132367c3171fd56185	2020-09-28 19:07:01 -07:00
Xiong Wei	0c8a6008ac	Fix torch.pow when the scalar base is a complex number (#45259 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43829 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45259 Reviewed By: gchanan Differential Revision: D23962073 Pulled By: anjali411 fbshipit-source-id: 1b16afbb98f33fa7bc53c6ca296c5ddfcbdd2b72	2020-09-28 18:25:53 -07:00
Meghan Lele	a0f0cb1608	[JIT] Add test for ignored class type property (#45233 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45233 Summary This commit modifies `TestClassType.test_properties` to check that properties on class types can be ignored with the same syntax as ignoring properties on `Modules`. Test Plan `python test/test_jit.py TestClassType.test_properties` Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23971885 Pulled By: SplitInfinity fbshipit-source-id: f2228f61fe26dff219024668cc0444a2baa8834c	2020-09-28 18:22:19 -07:00
Meghan Lele	4af4b71fdc	[JIT] Update docs for recently added features (#45232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45232 Summary This commit updates the TorchScript language reference to include documentation on recently-added TorchScript enums. It also removed `torch.no_grad` from the list of known unsupported `torch` modules and classes because it is now supported. Test Plan Continuous integration. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23971884 Pulled By: SplitInfinity fbshipit-source-id: 5e2c164ed59bc0926b11201106952cff86e9356e	2020-09-28 18:17:42 -07:00
Alex Suhan	52cbc9e4ec	[TensorExpr] Always inline and DCE in the LLVM backend (#45445 ) Summary: Inline pytorch into wrapper, which is especially helpful in combination with dead code elimination to reduce IR size and compilation times when a lot of parameters are unused. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45445 Test Plan: CI Reviewed By: ZolotukhinM Differential Revision: D23969009 Pulled By: asuhan fbshipit-source-id: a21509d07e4c130b6aa6eae5236bb64db2748a3d	2020-09-28 18:11:13 -07:00
Meghan Lele	7ac872b934	[JIT] Modify to_backend API so that it accepts wrapped modules (#43612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43612 Summary This commit modifies the `torch._C._jit_to_backend` function so that it accepts `ScriptModules` as inputs. It already returns `ScriptModules` (as opposed to C++ modules), so this makes sense and makes the API more intuitive. Test Plan Continuous integration, which includes unit tests and out-of-tree tests for custom backends. Fixes This commit fixes #41432. Test Plan: Imported from OSS Reviewed By: suo, jamesr66a Differential Revision: D23339854 Pulled By: SplitInfinity fbshipit-source-id: 08ecef729c4e1e6bddf3f483276947fc3559ea88	2020-09-28 17:17:01 -07:00
Rong Rong	5855aa8dac	Type check quasirandom (#45434 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42978. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45434 Reviewed By: walterddr Differential Revision: D23967139 Pulled By: ajitmaths fbshipit-source-id: bcee6627f367fd01aa9a5c10a7c24331fc1823ad	2020-09-28 16:49:38 -07:00
Rong Rong	49b198c454	type check for torch.testing._internal.common_utils (#45375 ) Summary: part of torch.testing._internal.* effort Pull Request resolved: https://github.com/pytorch/pytorch/pull/45375 Reviewed By: malfet Differential Revision: D23964315 Pulled By: walterddr fbshipit-source-id: efdd643297f5c7f75670ffe60ff7e82fc413d18d	2020-09-28 16:28:46 -07:00
Heitor Schueroff de Souza	96f8755034	Fixed handling of nan for evenly_distribute_backward (#45280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45280 Performance is the same on CPU and on CUDA is only 1-1.05x slower. This change is necessary for the future nan ops including nan(min\|max\|median) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D23908796 Pulled By: heitorschueroff fbshipit-source-id: c2b57acbe924cfa59fbd85216811f29f4af05088	2020-09-28 15:57:02 -07:00
Jan Schlüter	6a206df891	20000x faster audio conversion for SummaryWriter (#44201 ) Summary: Stumbled upon a little gem in the audio conversion for `SummaryWriter.add_audio()`: two Python `for` loops to convert a float array to little-endian int16 samples. On my machine, this took 35 seconds for a 30-second 22.05 kHz excerpt. The same can be done directly in numpy in 1.65 milliseconds. (No offense, I'm glad that the functionality was there!) Would also be ready to extend this to support stereo waveforms, or should this become a separate PR? Pull Request resolved: https://github.com/pytorch/pytorch/pull/44201 Reviewed By: J0Nreynolds Differential Revision: D23831002 Pulled By: edward-io fbshipit-source-id: 5c8f1ac7823d1ed41b53c4f97ab9a7bac33ea94b	2020-09-28 15:44:29 -07:00
Zachary DeVito	e54e1fe51e	[package] Add dependency viz (#45214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45214 When in verbose mode the package exporter will produce an html visualization of dependencies of a module to make it easier to trim out unneeded code, or debug inclusion of things that cannot be exported. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D23873525 Pulled By: zdevito fbshipit-source-id: 6801991573d8dd5ab8c284e09572b36a35e1e5a4	2020-09-28 15:38:41 -07:00
Omkar Salpekar	331ebaf7cb	[Distributed] Adding Python tests for the TCPStore getNumKeys and deleteKey (#45402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45402 Previous diffs in this stack implemented the getNumKeys and deleteKey APIs in the c10d Store as well as added tests at the C++ layer. This diff adds tests at the Python level in test_c10d.py ghstack-source-id: 112997161 Test Plan: Running these new python tests as well as previous C++ tests Reviewed By: mrshenli Differential Revision: D23955729 fbshipit-source-id: c7e0af7c884de2d488320e2a1d94aec801a782e5	2020-09-28 15:35:24 -07:00
Omkar Salpekar	6b65b3cbd8	[Distributed] DeleteKey API for c10d TCP Store (#45401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45401 Added a DeleteKey API for the TCP Store ghstack-source-id: 112997162 Test Plan: Modified the existing get/set test to use delete. verified that the correct keys were deleted and that the numKeys API returned the right values Reviewed By: mrshenli Differential Revision: D23955730 fbshipit-source-id: 5c9f82be34ff4521c59f56f8d9c1abf775c67f9f	2020-09-28 15:30:39 -07:00
Yangxin Zhong	190f91e3db	Adding Histogram Binning Calibration to DSNN and Adding Type Double to Caffe2 ParallelSumOp/SumReluOp Summary: As title. Test Plan: FBL job without this diff failed: f221545832 Error message: ``` NonRetryableException: AssertionError: Label is missing in training stage for HistogramBinningCalibration ``` FBL job with canary package built in this diff is running without failure: f221650379 Reviewed By: chenshouyuan Differential Revision: D23959508 fbshipit-source-id: c077230de29f7abfd092c84747eaabda0b532bcc	2020-09-28 15:21:31 -07:00
Gregory Chanan	1097fe0088	Remove CriterionTest.test_cuda code for dtype None. (#45316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45316 It's never used. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23919449 Pulled By: gchanan fbshipit-source-id: f9aaeeabf3940389156bfc01bc3118d348ca4cf6	2020-09-28 15:08:09 -07:00
lcskrishna	a4486fe7ba	[ROCm] Print name irrespective of seq number assignment for roctx traces (#45229 ) Summary: Recent changes to the seq_num correlation behavior in profiler (PR https://github.com/pytorch/pytorch/issues/42565) has changed the behavior for emit_nvtx(record_shapes=True) which doesn't print the name of the operator properly. Created PR to dump out the name in roctx traces, irrespective of the sequence number assigned only for ROCm. cc: jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45229 Reviewed By: zou3519 Differential Revision: D23932902 Pulled By: albanD fbshipit-source-id: c782667ff002b70b51f1cc921afd1b1ac533b39d	2020-09-28 15:03:47 -07:00
Taylor Robie	c6b7eeb654	Gh/taylorrobie/timer cleanup (#45361 ) Summary: This PR cleans up some of the rough edges around `Timer` and `Compare` * Moves `Measurement` to be dataclass based * Adds a bunch of type annotations. MyPy is now happy. * Allows missing entries in `Compare`. This is one of the biggest usability issues with `Compare` right now, both from an API perspective and because the current failure mode is really unpleasant. * Greatly expands the testing of `Compare` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45361 Test Plan: Changes to Timer are covered under existing tests, changes to `Compare` are covered by the expanded `test_compare` method. Reviewed By: bwasti Differential Revision: D23966816 Pulled By: robieta fbshipit-source-id: 826969f73b42f72fa35f4de3c64d0988b61474cd	2020-09-28 14:56:43 -07:00
Negin Raoof	a77d633db1	[ONNX] Fix view for dynamic input shape (#43558 ) Summary: Export of view op with dynamic input shape is broken when using tensors with a 0-dim. This fix removes symbolic use of static input size to fix this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43558 Reviewed By: ailzhang Differential Revision: D23965090 Pulled By: bzinodev fbshipit-source-id: 628e9d7ee5d53375f25052340ca6feabf7ba7c53	2020-09-28 14:46:51 -07:00
Gregory Chanan	5d1fee23b3	Remove convert_target from NN tests. (#45291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45291 It's not necessary, you can just check if the dtype is integral. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23911963 Pulled By: gchanan fbshipit-source-id: 230139e1651eb76226f4095e31068dded30e03e8	2020-09-28 14:21:42 -07:00
Rong Rong	986af53be2	type check for torch.testing._internalcodegen:* (#45368 ) Summary: part of `torch.testing._internal.*` effort Pull Request resolved: https://github.com/pytorch/pytorch/pull/45368 Reviewed By: malfet Differential Revision: D23950512 Pulled By: walterddr fbshipit-source-id: 399f712d12cdd9795b0136328f512c3f86a15f24	2020-09-28 14:04:52 -07:00
Yi Wang	7a4c417ed3	Fix typo (#45379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45379 Registeres -> Registers in reducer.h. ghstack-source-id: 112982279 Test Plan: N/A Reviewed By: mrshenli Differential Revision: D23951203 fbshipit-source-id: 96c7dc2e1e12c132339b9ac83ce1da52c812740c	2020-09-28 14:02:01 -07:00
BowenBao	57c18127dc	[ONNX] Update div export to perform true divide (#44831 ) Summary: related https://github.com/pytorch/pytorch/issues/43787 Now that PyTorch div is actually performing true divide, update onnx export code to stay consistent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44831 Reviewed By: eellison Differential Revision: D23880316 Pulled By: bzinodev fbshipit-source-id: 3bb8db34142ac4fed4039295ad3c4cb79487987f	2020-09-28 13:53:43 -07:00
Yangxin Zhong	9163e8171e	Adding Type Double to Caffe2 Mean Op Summary: Adding support for type double to caffe2 MeanOp and MeanGradientOp. Test Plan: All tests passed. Example FBL job failed without this diff: f221169563 Error message: ``` c10::Error: [enforce fail at mean_op.h:72] . Mean operator only supports 32-bit float, but input was of type double (Error from operator: input: "dpsgd_8/Copy_3" input: "dpsgd_8/Copy_4" output: "dpsgd_8/Mean_2" name: "" type: "Mean" device_option { device_type: 0 device_id: 0 }) ``` Example FBL job is running without failure with the canary package built from this diff: f221468723 Reviewed By: chenshouyuan Differential Revision: D23956222 fbshipit-source-id: 6c81bbc390d812ae0ac235e7d025141c8402def1	2020-09-28 13:35:29 -07:00
gunandrose4u	47debdca42	Document change for DDP enabled on Windows platform (#45392 ) Summary: Document change for DDP enabled on Windows platform Pull Request resolved: https://github.com/pytorch/pytorch/pull/45392 Reviewed By: gchanan Differential Revision: D23962344 Pulled By: mrshenli fbshipit-source-id: 8924c6ca36d68699871d8add3e0aab6542ea269c	2020-09-28 13:22:42 -07:00
Iurii Zdebskyi	722faeb2a4	[RELAND] Added optimizers based on multi tensor apply (#45408 ) Summary: Original PR https://github.com/pytorch/pytorch/pull/45299. The present PR fixes minor bugs that caused revert. Adding a new namespace `torch.optim._multi_tensor` with a bunch of updated optimizers. Those optimizers are using _foreach APIs which improve performance significantly. ### Tests - updated existing tests to use both optimizers - added `test_multi_tensor_optimizers` test to verify correctness. ### Perf results Adam timeit: 42.69 ms --> 10.16 ms autorange: 41.96 ms --> 10.28 ms AdamW timeit: 51.38 ms --> 15.63 ms autorange: 50.82 ms --> 16.07 ms SGD timeit: 6.28 ms --> 4.40 ms autorange: 6.13 ms --> 4.73 ms RMSprop timeit: 28.63 ms --> 5.89 ms autorange: 28.27 ms --> 5.76 ms Rprop timeit: 213.30 --> 178.42 autorange: 212.03 --> 178.03 ASGD timeit: 21.67 --> 9.33 autorange: 21.64 --> 9.27 Adamax timeit: 55.60 --> 48.29 autorange: 55.22 -> 49.13 Rerf Script used ``` import torch import time import torch.optim as optim from torch.autograd import Variable from torch.optim.lr_scheduler import ExponentialLR, ReduceLROnPlateau, StepLR import torch.nn as nn import time import torchvision import torch.utils._benchmark as benchmark_utils device = "cuda" model = torchvision.models.resnet.resnet101(pretrained=True).to(device) targets = torch.randint(0, 1000, (100, 100), device=device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=1e-3) # <----------------------- optimizer. # would compare optim.SGD vs optim._multi_tensor.SGD running_loss = 0.0 target = torch.empty(128, dtype=torch.long, device=device).random_(5) optimizer.zero_grad() inputs = torch.rand(128, 3, 100, 100, device=device , requires_grad=True) outputs = model(inputs) loss = criterion(outputs, target) loss.backward() optimizer.step() running_loss += loss.item() def main(): timer = benchmark_utils.Timer( stmt="optimizer.step()", globals=globals(), label="str(optimizer)", ) for i in range(1): print(f"Run: {i}\n{'-' * 40}") print(f"timeit:\n{timer.timeit(1000)}\n") print(f"autorange:\n{timer.blocked_autorange()}\n\n") if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45408 Reviewed By: gchanan Differential Revision: D23956680 Pulled By: izdeby fbshipit-source-id: c5eab7bf5fce14a287c15cead1cdc26e42cfed94	2020-09-28 13:14:04 -07:00
Bram Wasti	87b356d093	[static runtime] Split out graph preparation from runtime (#44131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44131 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D23604305 Pulled By: bwasti fbshipit-source-id: 7b47da4961d99074199417ef1407a788c7d80ee6	2020-09-28 13:01:23 -07:00
Nikolay Korovaiko	6ab1c0b1ca	Disable a few tests in preparation to enabling PE+TE (#44815 ) Summary: Disable a few tests in preparation to enabling PE+TE Next PR: https://github.com/pytorch/pytorch/pull/45396 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44815 Reviewed By: ZolotukhinM Differential Revision: D23948445 Pulled By: Krovatkin fbshipit-source-id: 93e641b7b8a3f13bd3fd3840116076553408f224	2020-09-28 12:55:12 -07:00
Xiang Gao	36c3fbc9e3	CUDA BFloat Conv (non-cuDNN) (#45007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45007 Reviewed By: zou3519 Differential Revision: D23933174 Pulled By: ngimel fbshipit-source-id: 84eb028f09c9197993fb9981c0efb535014e5f78	2020-09-28 11:42:42 -07:00
Bert Maher	03342af3a3	Add env variable to bypass CUDACachingAllocator for debugging (#45294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45294 While tracking down a recent memory corruption bug we found that cuda-memcheck wasn't finding the bad accesses, and ngimel pointed out that it's because we use a caching allocator so a lot of "out of bounds" accesses land in a valid slab. This PR adds a runtime knob (`PYTORCH_NO_CUDA_MEMORY_CACHING`) that, when set, bypasses the caching allocator's caching logic so that allocations go straight to cudaMalloc. This way, cuda-memcheck will actually work. Test Plan: Insert some memory errors and run a test under cuda-memcheck; observe that cuda-memcheck flags an error where expected. Specifically I removed the output-masking logic here: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/cuda_codegen.cpp#L819-L826 And ran: ``` PYTORCH_NO_CUDA_MEMORY_CACHING=1 cuda-memcheck pytest -k test_superslomo test_jit_fuser_te.py ``` Reviewed By: ngimel Differential Revision: D23964734 Pulled By: bertmaher fbshipit-source-id: 04efd11e8aff037b9edde80c70585cb820ee6e39	2020-09-28 11:40:04 -07:00
Nikolay Korovaiko	993628c74a	Build shape expressions and remove outputs that are only used by `aten::size`s (#45080 ) Summary: Currently, TE materializes all intermediate results even if they are only used for computing their shapes. This diff ports the approach the OF (Old Fuser) took to deal with this issue. Namely, given the structure of a fusion group we infer all the sizes outside a fusion group based on fusion group's inputs. A simple example would be: ``` def test_fuse(a, b): c = a + b d = c + b return d ``` Here we don't need to cache `c` as computing a gradient for `b` in `d = c + b` doesn't need it. We do need to compute sizes for all arguments here in case broadcasts happen. Without this optimization, TE would need to materialize `c` so we can get its size ``` [DUMP profiling_graph_executor_impl.cpp:499] Optimized Graph: [DUMP profiling_graph_executor_impl.cpp:499] graph(%a.1 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %b.1 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %11 : Tensor = prim::DifferentiableGraph_0(%b.1, %a.1) [DUMP profiling_graph_executor_impl.cpp:499] return (%11) [DUMP profiling_graph_executor_impl.cpp:499] with prim::DifferentiableGraph_0 = graph(%11 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %13 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %59 : int[] = aten::size(%13) # <string>:3:44 [DUMP profiling_graph_executor_impl.cpp:499] %62 : int[] = aten::size(%11) # <string>:3:93 [DUMP profiling_graph_executor_impl.cpp:499] %83 : Double(1:1, requires_grad=0, device=cuda:0), %84 : Double(1:1, requires_grad=0, device=cuda:0), %85 : bool = prim::TypeCheck(%11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %86 : Tensor, %87 : Tensor = prim::If(%85) [DUMP profiling_graph_executor_impl.cpp:499] block0(): [DUMP profiling_graph_executor_impl.cpp:499] %d.4 : Double(1:1, requires_grad=0, device=cuda:0), %c.4 : Double(1:1, requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%83, %84) [DUMP profiling_graph_executor_impl.cpp:499] -> (%d.4, %c.4) [DUMP profiling_graph_executor_impl.cpp:499] block1(): [DUMP profiling_graph_executor_impl.cpp:499] %94 : Function = prim::Constant[name="fallback_function", fallback=1]() [DUMP profiling_graph_executor_impl.cpp:499] %95 : (Tensor, Tensor) = prim::CallFunction(%94, %11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %96 : Tensor, %97 : Tensor = prim::TupleUnpack(%95) [DUMP profiling_graph_executor_impl.cpp:499] -> (%96, %97) [DUMP profiling_graph_executor_impl.cpp:499] %60 : int[] = aten::size(%87) # <string>:3:55 [DUMP profiling_graph_executor_impl.cpp:499] %61 : int[]? = aten::_size_if_not_equal(%59, %60) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %64 : int[]? = aten::_size_if_not_equal(%62, %60) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] %67 : int[] = aten::size(%86) # <string>:3:55 [DUMP profiling_graph_executor_impl.cpp:499] %68 : int[]? = aten::_size_if_not_equal(%60, %67) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %71 : int[]? = aten::_size_if_not_equal(%62, %67) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] return (%86, %61, %64, %68, %71) [DUMP profiling_graph_executor_impl.cpp:499] with prim::TensorExprGroup_0 = graph(%1 : Double(1:1, requires_grad=0, device=cuda:0), [DUMP profiling_graph_executor_impl.cpp:499] %4 : Double(1:1, requires_grad=0, device=cuda:0)): [DUMP profiling_graph_executor_impl.cpp:499] %5 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %c.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%4, %1, %5) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2872:16 [DUMP profiling_graph_executor_impl.cpp:499] %2 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %d.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%c.3, %1, %2) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2873:16 [DUMP profiling_graph_executor_impl.cpp:499] return (%d.3, %c.3) ``` With this optimization we use `prim::BroadcastSizes` to compute the size of `c`. No need to materialize it. ``` [DUMP profiling_graph_executor_impl.cpp:499] Optimized Graph: [DUMP profiling_graph_executor_impl.cpp:499] graph(%a.1 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %b.1 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %11 : Tensor = prim::DifferentiableGraph_0(%b.1, %a.1) [DUMP profiling_graph_executor_impl.cpp:499] return (%11) [DUMP profiling_graph_executor_impl.cpp:499] with prim::DifferentiableGraph_0 = graph(%11 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %13 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %59 : int[] = aten::size(%13) # <string>:3:44 [DUMP profiling_graph_executor_impl.cpp:499] %62 : int[] = aten::size(%11) # <string>:3:93 [DUMP profiling_graph_executor_impl.cpp:499] %88 : Double(1:1, requires_grad=0, device=cuda:0), %89 : Double(1:1, requires_grad=0, device=cuda:0), %90 : bool = prim::TypeCheck(%11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %91 : Tensor = prim::If(%90) [DUMP profiling_graph_executor_impl.cpp:499] block0(): [DUMP profiling_graph_executor_impl.cpp:499] %d.4 : Double(1:1, requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%88, %89) [DUMP profiling_graph_executor_impl.cpp:499] -> (%d.4) [DUMP profiling_graph_executor_impl.cpp:499] block1(): [DUMP profiling_graph_executor_impl.cpp:499] %97 : Function = prim::Constant[name="fallback_function", fallback=1]() [DUMP profiling_graph_executor_impl.cpp:499] %98 : (Tensor) = prim::CallFunction(%97, %11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %99 : Tensor = prim::TupleUnpack(%98) [DUMP profiling_graph_executor_impl.cpp:499] -> (%99) [DUMP profiling_graph_executor_impl.cpp:499] %85 : int[] = aten::size(%91) [DUMP profiling_graph_executor_impl.cpp:499] %86 : int[] = prim::BroadcastSizes(%59, %62) [DUMP profiling_graph_executor_impl.cpp:499] %61 : int[]? = aten::_size_if_not_equal(%59, %86) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %64 : int[]? = aten::_size_if_not_equal(%62, %86) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] %68 : int[]? = aten::_size_if_not_equal(%86, %85) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %71 : int[]? = aten::_size_if_not_equal(%62, %85) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] return (%91, %61, %64, %68, %71) [DUMP profiling_graph_executor_impl.cpp:499] with prim::TensorExprGroup_0 = graph(%1 : Double(1:1, requires_grad=0, device=cuda:0), [DUMP profiling_graph_executor_impl.cpp:499] %4 : Double(1:1, requires_grad=0, device=cuda:0)): [DUMP profiling_graph_executor_impl.cpp:499] %5 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %c.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%4, %1, %5) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2872:16 [DUMP profiling_graph_executor_impl.cpp:499] %2 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %d.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%c.3, %1, %2) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2873:16 [DUMP profiling_graph_executor_impl.cpp:499] return (%d.3) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45080 Reviewed By: bertmaher Differential Revision: D23856410 Pulled By: Krovatkin fbshipit-source-id: 2956286eb03a4894a5baa151c35e6092466322b1	2020-09-28 10:45:56 -07:00
Luca Wehrstedt	e5242aaf89	Update TensorPipe submodule (#45433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45433 Primarily in order to pick up the fix landed in https://github.com/pytorch/tensorpipe/pull/225 which fixes the handling of scopes in link-local IPv6 addresses, which was reported by a user. Test Plan: The specific upstream change is covered by new unit tests. The submodule update will be validated by the PyTorch CI. Reviewed By: beauby Differential Revision: D23962289 fbshipit-source-id: 4ed762fc19c4aeb1398d1337d61b3188c4c228be	2020-09-28 10:32:06 -07:00
Rong Rong	48d29c830d	[hotfix] disable problematic cuda tests on rocm builds (#45435 ) Summary: Disable the recent 3 cuda tests on amd rocm build/tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/45435 Reviewed By: malfet Differential Revision: D23962881 Pulled By: walterddr fbshipit-source-id: ad4ea1f835b4722cdbdce685806cfd64376cc16f	2020-09-28 10:02:12 -07:00
Eli Uriegas	e2ffdf467a	docker: Add torchelastic to docker image (#45438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45438 Adds torchelastic (as well as its dependencies) to the official docker images Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: tierex Differential Revision: D23963787 Pulled By: seemethere fbshipit-source-id: 54ebb4b9c50699e543f264975dadf99badf55753	2020-09-28 09:53:07 -07:00
Nikita Vedeneev	e4950a093a	Backward support for generalized eigenvalue solver with LOBPCG in forward [only k-rank SYMEIG case] (#43002 ) Summary: As per title. Fixes [#{38948}](https://github.com/pytorch/pytorch/issues/38948). Therein you can find some blueprints for the algorithm being used in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43002 Reviewed By: zou3519 Differential Revision: D23931326 Pulled By: albanD fbshipit-source-id: e6994af70d94145f974ef87aa5cea166d6deff1e	2020-09-28 07:22:35 -07:00
Mike Ruberry	6417a70465	Updates linalg warning + docs (#45415 ) Summary: Changes the deprecation of norm to a docs deprecation, since PyTorch components still rely on norm and some behavior, like automatically flattening tensors, may need to be ported to torch.linalg.norm. The documentation is also updated to clarify that torch.norm and torch.linalg.norm are distinct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45415 Reviewed By: ngimel Differential Revision: D23958252 Pulled By: mruberry fbshipit-source-id: fd54e807c59a2655453a6bcd9f4073cb2c12e8ac	2020-09-28 05:28:42 -07:00
generatedunixname89002005325676	7818a214c5	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D23959094 fbshipit-source-id: 6caa046d263114bff38a38d756099aac357e4f04	2020-09-28 05:08:46 -07:00
Negin Raoof	95a97e51b5	[ONNX] Improve scripting inplace indexing ops (#44351 ) Summary: Fix a couple of issues with scripting inplace indexing in prepare_inplace_ops_for_onnx pass. 1- Tracing index copy (such as cases lik x[1:3] = data) already applies broadcasting on rhs if needed. The broadcasting node (aten::expand) is missing in scripting cases. 2- Inplace indexing with ellipsis (aten::copy_) is replaced with aten::index_put and then handled with slice+select in this pass. Support for negative indices for this op added. Shape inference is also enabled for scripting tests using new JIT API. A few more tests are enabled for scripting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44351 Reviewed By: ezyang Differential Revision: D23880267 Pulled By: bzinodev fbshipit-source-id: 78b33444633eb7ae0fbabc7415e3b16001f5207f	2020-09-28 00:32:36 -07:00
Zino Benaissa	13f76f2be4	Fix preserve submodule attribute in freezing (#45143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45143 This PR prevents freezing cleaning up a submodule when user requests to preserve a submodule. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23844969 Pulled By: bzinodev fbshipit-source-id: 80e6db3fc12460d62e634ea0336ae2a3551c2151	2020-09-28 00:05:38 -07:00
liqunfu	c3bf402cbb	handle onnx nll with default ignore index (#44816 ) Summary: in ONNX NegativeLogLikelihoodLoss specification, ignore_index is optional without default value. therefore, when convert nll op to ONNX, we need to set ignore_index attribute even if it is not specified (e.g. ignore_index=-100). Pull Request resolved: https://github.com/pytorch/pytorch/pull/44816 Reviewed By: ezyang Differential Revision: D23880354 Pulled By: bzinodev fbshipit-source-id: d0bdd58d0a4507ed9ce37133e68533fe6d1bdf2b	2020-09-27 23:26:19 -07:00
Mike Ruberry	8bdbedd4ee	Revert "Updates and simplifies nonzero as_tuple behavior" This reverts commit 8b143771d0f0bcd93d925263adc8b0d6b235b398.	2020-09-27 20:58:42 -07:00
Mike Ruberry	8b143771d0	Updates and simplifies nonzero as_tuple behavior	2020-09-27 20:56:30 -07:00
shubhambhokare1	5b839bca78	[ONNX] Optimize export_onnx api to reduce string and model proto exchange (#44332 ) Summary: Optimize export_onnx api to reduce string and model proto exchange in export.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/44332 Reviewed By: bwasti, eellison Differential Revision: D23880129 Pulled By: bzinodev fbshipit-source-id: 1d216d8f710f356cbba2334fb21ea15a89dd16fa	2020-09-27 16:29:08 -07:00
neginraoof	4005afe94b	[ONNX] Update narrow for dynamic inputs (#44039 ) Summary: Update narrow for dynamic inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/44039 Reviewed By: mruberry Differential Revision: D23742215 Pulled By: bzinodev fbshipit-source-id: 0d58d2fe996f91a124af988a9a21ee433e842d07	2020-09-27 15:52:57 -07:00
Natalia Gimelshein	78caa028b6	Revert D23009117: [Distributed] DeleteKey API for c10d TCP Store Test Plan: revert-hammer Differential Revision: D23009117 (`addf94f2d6`) Original commit changeset: 1a0d95b43d79 fbshipit-source-id: ad3fe5501267e1a0a7bf23410766f1e92b34b24d	2020-09-27 12:04:42 -07:00
Natalia Gimelshein	f84b2e865f	Revert D23878455: [Distributed] Adding Python tests for the TCPStore getNumKeys and deleteKey Test Plan: revert-hammer Differential Revision: D23878455 (`cf808bed73`) Original commit changeset: 0a17ecf66b28 fbshipit-source-id: 93e60b23f66324e3e5266c45abb0cec295bb3d23	2020-09-27 12:02:24 -07:00
Mikhail Zolotukhin	bc5710f2f7	Benchmarks: tweak PE config settings. (#45349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45349 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D23935518 Pulled By: ZolotukhinM fbshipit-source-id: 5a7c508c6fc84eafbc23399f095d732b903510dc	2020-09-26 23:13:29 -07:00
Mikhail Zolotukhin	a07d82982a	CI: Add a run of FastRNN benchmarks in default executor/fuser configuration. (#45348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45348 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D23935520 Pulled By: ZolotukhinM fbshipit-source-id: efecaaab68caaaa057b354884f4ae37b6ef36983	2020-09-26 23:13:27 -07:00
Mikhail Zolotukhin	8cef7326f4	Benchmarks: add 'default' options for fuser and executor. (#45347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45347 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D23935519 Pulled By: ZolotukhinM fbshipit-source-id: 8323fafe7828683c4d29c12a1e5722adb6f945ff	2020-09-26 23:09:02 -07:00
Natalia Gimelshein	37a671abc7	Revert D23828257: Quantization: add API summary section Test Plan: revert-hammer Differential Revision: D23828257 (`d2bd556e7d`) Original commit changeset: 9311ee3f394c fbshipit-source-id: 80b16fc123191e249e6a070ec5360a15fe91cf61	2020-09-26 22:53:10 -07:00
Natalia Gimelshein	110aa45387	Revert D23842456: Quantization: combine previous summary with new summary Test Plan: revert-hammer Differential Revision: D23842456 (`278da57255`) Original commit changeset: db2399e51e9a fbshipit-source-id: 7878257330bf83751cb17c0971a5c894bdf256ba	2020-09-26 22:53:07 -07:00
Natalia Gimelshein	3da1061059	Revert D23916669: quant docs: add reduce_range explanatation to top level doc Test Plan: revert-hammer Differential Revision: D23916669 (`eb39624394`) Original commit changeset: ef93fb774cb1 fbshipit-source-id: 7b56020427e76e13f847494044179c81d508db11	2020-09-26 22:48:38 -07:00
Mike Ruberry	54a253fded	Revert D23931987: Added optimizers based on multi tensor apply Test Plan: revert-hammer Differential Revision: D23931987 (`2b21e7767e`) Original commit changeset: 582134ef2d40 fbshipit-source-id: ffd500aea55fda34155442fb15e2529cb9c00100	2020-09-26 18:11:54 -07:00
Mike Ruberry	e52762cbb7	Revert D23917034: quant docs: document how to customize qconfigs in eager mode Test Plan: revert-hammer Differential Revision: D23917034 (`7763e1d7b1`) Original commit changeset: ccf71ce4300c fbshipit-source-id: 9ce99e880b4a22e824f4413354a0f3703e7c5c2c	2020-09-26 18:05:38 -07:00
Rohan Varma	23dfca8351	Support record_shapes in RPC profiling (#44419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44419 Closes https://github.com/pytorch/pytorch/issues/39969 This PR adds support for propagation of input shapes over the wire when the profiler is invoked with `record_shapes=True` over RPC. Previously, we did not respect this argument. This is done by saving the shapes as an ivalue list and recovering it as the type expected (`std::vector<std::vector<int>>` on the client). Test is added to ensure that remote ops have the same `input_shapes` as if the op were run locally. ghstack-source-id: 112977899 Reviewed By: pritamdamania87 Differential Revision: D23591274 fbshipit-source-id: 7cf3b2e8df26935ead9d70e534fc2c872ccd6958	2020-09-26 13:26:44 -07:00
Rohan Varma	19dda7c68a	Fallback to CPU when remote end does not have CUDA for profiling (#44967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44967 When enabling profiler on server, if it is a different machine it may not have CUDA while caller does. In this case, we would crash but now we fallback to CPU and log a warning. ghstack-source-id: 112977906 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D23790729 fbshipit-source-id: dc6eba172b7e666842d54553f52a6b9d5f0a5362	2020-09-26 13:12:55 -07:00
Iurii Zdebskyi	2b21e7767e	Added optimizers based on multi tensor apply (#45299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45299 Adding a new namespace `torch.optim._multi_tensor` with a bunch of updated optimizers. Those optimizers are using _foreach APIs which improve performance significantly. ### Tests - updated existing tests to use both optimizers - added `test_multi_tensor_optimizers` test to verify correctness. ### Perf results Adam timeit: 42.69 ms --> 10.16 ms autorange: 41.96 ms --> 10.28 ms AdamW timeit: 51.38 ms --> 15.63 ms autorange: 50.82 ms --> 16.07 ms SGD timeit: 6.28 ms --> 4.40 ms autorange: 6.13 ms --> 4.73 ms RMSprop timeit: 28.63 ms --> 5.89 ms autorange: 28.27 ms --> 5.76 ms Rprop timeit: 213.30 --> 178.42 autorange: 212.03 --> 178.03 ASGD timeit: 21.67 --> 9.33 autorange: 21.64 --> 9.27 Adamax timeit: 55.60 --> 48.29 autorange: 55.22 -> 49.13 Rerf Script used ``` import torch import time import torch.optim as optim from torch.autograd import Variable from torch.optim.lr_scheduler import ExponentialLR, ReduceLROnPlateau, StepLR import torch.nn as nn import time import torchvision import torch.utils._benchmark as benchmark_utils device = "cuda" model = torchvision.models.resnet.resnet101(pretrained=True).to(device) targets = torch.randint(0, 1000, (100, 100), device=device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=1e-3) # <----------------------- optimizer. # would compare optim.SGD vs optim._multi_tensor.SGD running_loss = 0.0 target = torch.empty(128, dtype=torch.long, device=device).random_(5) optimizer.zero_grad() inputs = torch.rand(128, 3, 100, 100, device=device , requires_grad=True) outputs = model(inputs) loss = criterion(outputs, target) loss.backward() optimizer.step() running_loss += loss.item() def main(): timer = benchmark_utils.Timer( stmt="optimizer.step()", globals=globals(), label="str(optimizer)", ) for i in range(1): print(f"Run: {i}\n{'-' * 40}") print(f"timeit:\n{timer.timeit(1000)}\n") print(f"autorange:\n{timer.blocked_autorange()}\n\n") if __name__ == "__main__": main() ``` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23931987 Pulled By: izdeby fbshipit-source-id: 582134ef2d402909d27d89a45c5b588fb7130ea1	2020-09-26 12:17:43 -07:00
Thomas Bredillet	0fa551f0ab	[c2] Fix int types for learning rate Summary: Currently GetSingleArgument is overflowing since it's expecting an int instead of an int64 when using a 1cycle (hill policy) annealing schedule Test Plan: unittest buck test caffe2/caffe2/python/operator_test:learning_rate_op_test Differential Revision: D23938169 fbshipit-source-id: 20d65df800d7a0f1dd9520705af31f63ae716463	2020-09-26 10:59:29 -07:00
Omkar Salpekar	cf808bed73	[Distributed] Adding Python tests for the TCPStore getNumKeys and deleteKey (#45223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45223 Previous diffs in this stack implemented the getNumKeys and deleteKey APIs in the c10d Store as well as added tests at the C++ layer. This diff adds tests at the Python level in test_c10d.py ghstack-source-id: 112939763 Test Plan: Ensured these new python tests as well as previous C++ tests pass Reviewed By: jiayisuse Differential Revision: D23878455 fbshipit-source-id: 0a17ecf66b28d46438a77346e5bf36414e05e25c	2020-09-26 00:54:28 -07:00
Omkar Salpekar	addf94f2d6	[Distributed] DeleteKey API for c10d TCP Store (#43963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43963 Added a DeleteKey API for the TCP Store ghstack-source-id: 112939762 Test Plan: Modified the existing get/set test to use delete. verified that the correct keys were deleted and that the numKeys API returned the right values Reviewed By: jiayisuse Differential Revision: D23009117 fbshipit-source-id: 1a0d95b43d79e665a69b2befbaa059b2b50a1f66	2020-09-26 00:54:21 -07:00
Omkar Salpekar	304e1d1e19	[Distributed] getNumKeys API to c10d TCPStore (#43962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43962 TCPStore needs a getNumKeys API for our logging needs. ghstack-source-id: 112939761 Test Plan: Adding tests to C++ Store Tests Reviewed By: pritamdamania87 Differential Revision: D22985085 fbshipit-source-id: 8a0d286fbd6fd314dcc997bae3aad0e62b51af83	2020-09-26 00:49:00 -07:00
Zafar	d9af3d2fcd	[quant] ConvTranspose warnings (#45081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45081 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23822449 Pulled By: z-a-f fbshipit-source-id: f21a5f3ef4d09f703c96fff0bc413dbadeac8202	2020-09-25 22:30:14 -07:00
Wang Xu	92189b34b7	Add get_all_users_of function to GraphManipulation (#45216 ) Summary: This PR adds get_all_users_of function. The function returns all the users of a specific node. A test unit is also added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45216 Reviewed By: ezyang Differential Revision: D23883572 Pulled By: scottxu0730 fbshipit-source-id: 3eb68a411c3c6db39ed2506c9cb7bb7337520ee4	2020-09-25 19:32:49 -07:00
Vasiliy Kuznetsov	7763e1d7b1	quant docs: document how to customize qconfigs in eager mode (#45306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45306 Adds details to the main quantization doc on how specifically users can skip or customize quantization of layers. Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23917034 Pulled By: vkuzo fbshipit-source-id: ccf71ce4300c1946b2ab63d1f35a07691fd7a2af	2020-09-25 18:33:35 -07:00
Vasiliy Kuznetsov	eb39624394	quant docs: add reduce_range explanatation to top level doc (#45305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45305 Adds an explanatation for reduce_range to the main quantization doc page. Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23916669 Pulled By: vkuzo fbshipit-source-id: ef93fb774cb15741cd92889f114f6ab76c39f051	2020-09-25 18:33:32 -07:00
Vasiliy Kuznetsov	278da57255	Quantization: combine previous summary with new summary (#45135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45135 The previous quantization summary had steps on what to do for dynamic, static, QAT. This PR moves these steps to comments in the example code, so it is more clear how to accomplish the steps. Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23842456 Pulled By: vkuzo fbshipit-source-id: db2399e51e9ae33c8a1ac610e3d7dbdb648742b0	2020-09-25 18:33:30 -07:00
Vasiliy Kuznetsov	d2bd556e7d	Quantization: add API summary section (#45093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45093 This adds a tl;dr; style summary of the quantization API to the documentation. Hopefully this will make this easier for new folks to learn how to use quantization. This is not meant to be all-encompassing. Future PRs can improve the documentation further. Test Plan: 1. build the doc as specified in https://github.com/pytorch/pytorch#building-the-documentation 2. inspect the quantization page in Chrome, format looks good Reviewed By: jerryzh168 Differential Revision: D23828257 Pulled By: vkuzo fbshipit-source-id: 9311ee3f394cd83af0aeafb6e2fcdc3e0321fa38	2020-09-25 18:30:51 -07:00
Zafar	958c208666	[quant] conv_transpose graph patterns (#45078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45078 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23821580 Pulled By: z-a-f fbshipit-source-id: 813a4ef1bbc429720765d61791fe754b6678a334	2020-09-25 18:14:29 -07:00
Ailing Zhang	606b1a9a2e	Move xla codegen to aten. (#45241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45241 Test Plan: Imported from OSS Reviewed By: soumith Differential Revision: D23926750 Pulled By: ailzhang fbshipit-source-id: f768e24a9baeca9f9df069a62d6f8b94a853a1ee	2020-09-25 18:07:32 -07:00
Wanchao Liang	32c355af5b	[dist_optim] introduce distributed functional optimizer (#45221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45221 This PR introduces a distributed functional optimizer, so that distributed optimizer can reuse the functional optimizer APIs and maintain their own states. This could enable the torchscript compatible functional optimizer when using distributed optimizer, helps getting rid of GIL and improve overall performance of training, especially distributed model parallel training Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23935256 Pulled By: wanchaol fbshipit-source-id: 59b6d77ff4693ab24a6e1cbb6740bcf614cc624a	2020-09-25 17:13:10 -07:00
Wanchao Liang	08caf15502	[optimizer] refactor Adam to use functional API (#44791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44791 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23935257 Pulled By: wanchaol fbshipit-source-id: 6f6e22a9287f5515d2e4e6abd4dee2fe7e17b945	2020-09-25 17:13:08 -07:00
Wanchao Liang	0444c372e1	[optimizer] introduce optimizer functional API, refactor Adagrad (#44715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44715 We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency. This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL. Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23935258 Pulled By: wanchaol fbshipit-source-id: d2a5228439edb3bc64f7771af2bb9e891847136a	2020-09-25 17:10:26 -07:00
Nikita Shulga	8ab2ad306d	Enable `torch.cuda.nccl` typechecking (#45344 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45336 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45344 Reviewed By: walterddr Differential Revision: D23935306 Pulled By: malfet fbshipit-source-id: dd09d4f8ff7a327131764487158675027a13bf69	2020-09-25 17:02:47 -07:00
Shen Li	5211fb97ac	Remove device maps from TensorPipe for v1.7 release (#45353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45353 Temporarily removing this feature, will add this back after branch cut. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23939865 Pulled By: mrshenli fbshipit-source-id: 7dceaffea6b9a16512b5ba6036da73e7f8f83a8e	2020-09-25 16:51:45 -07:00
Brian Hirsh	439930c81b	adding a beta parameter to the smooth_l1 loss fn (#44433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44433 Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23636720 Pulled By: bdhirsh fbshipit-source-id: caea2a1f8dd72b3b5fd1d72dd886b2fcd690af6d	2020-09-25 16:36:28 -07:00
Nikita Shulga	37513a1118	Use explicit templates in CUDALoops kernels (#44286 ) Summary: Reland attempt of https://github.com/pytorch/pytorch/pull/41059 Use explicit templates instead of lambdas to reduce binary size without affecting the perf by 100-200Kb per arch per CU, namely: BinaryMulDivKernel.cu 3.8Mb -> 3.5Mb CompareEQKernel.cu 1.8Mb -> 1.7Mb BinaryAddSubKernel.cu 2.0Mb -> 1.8Mb BinaryBitwiseOpsKernels.cu 2.6Mb -> 2.3Mb Pull Request resolved: https://github.com/pytorch/pytorch/pull/44286 Reviewed By: ngimel Differential Revision: D23859691 Pulled By: malfet fbshipit-source-id: 2c4e86f35e0f94a62294dc5d52a3ba364db23e2d	2020-09-25 16:26:40 -07:00
Pritam Damania	a2b4177c5b	Add barrier() at the end of init_process_group and new_group. (#45181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45181 `init_process_group` and `new_group` update a bunch of global variables after initializing the actual process group. As a result, there is a race that after initializing the process group on say rank 0, if we immediately check the default process group on rank 1 (say via RPC), we might actually get an error since rank 1 hasn't yet updated its _default_pg variable. To resolve this issue, I've added barrier() at the end of both of these calls. This ensures that once these calls return we are guaranteed about correct initialization on all ranks. Since these calls are usually done mostly during initialization, it should be fine to add the overhead of a barrier() here. #Closes: https://github.com/pytorch/pytorch/issues/40434, https://github.com/pytorch/pytorch/issues/40378 ghstack-source-id: 112923112 Test Plan: Reproduced the failures in https://github.com/pytorch/pytorch/issues/40434 and https://github.com/pytorch/pytorch/issues/40378 and verified that this PR fixes the issue. Reviewed By: mrshenli Differential Revision: D23858025 fbshipit-source-id: c4d5e46c2157981caf3ba1525dec5310dcbc1830	2020-09-25 15:46:59 -07:00
Pritam Damania	3b7e4f89b2	Add deprecation warning to PG backend and make TP backend stable. (#45356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45356 In this PR, I'm adding a warning to the PG backend mentioning it would be deprecated in the future. In addition to this I removed the warning from the TP backend that it is a beta feature. ghstack-source-id: 112940501 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D23940144 fbshipit-source-id: d44054aa1e4ef61004a40bbe0ec45ff07829aad4	2020-09-25 15:41:00 -07:00
Bram Wasti	04be420549	[static runtime] Remove ops in static from backwards compatibility checks (#45354 ) Summary: This should get the builds green again Pull Request resolved: https://github.com/pytorch/pytorch/pull/45354 Reviewed By: zhangguanheng66 Differential Revision: D23939615 Pulled By: bwasti fbshipit-source-id: e93b11bc9592205e52330bb15928603b0aea21ac	2020-09-25 14:46:42 -07:00
Vasiliy Kuznetsov	eee7dad376	Add torch.do_assert, which is symbolically traceable (#45188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45188 This is a symbolically traceable alternative to Python's `assert`. It should be useful to allow people who want to use FX to also be able to assert things. A bunch of TODO(before) land are inline - would love thoughts on where is the best place for this code to live, and what this function should be called (since `assert` is reserved). Test Plan: ``` python test/test_fx.py TestFX.test_symbolic_trace_assert ``` Imported from OSS Reviewed By: jamesr66a Differential Revision: D23861567 fbshipit-source-id: d9d6b9556140faccc0290eba1fabea401d7850de	2020-09-25 13:46:28 -07:00
Rohan Varma	7c5436d557	[RPC profiling] Add tests to ensure RPC profiling works on single threaded (#44923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44923 This ensures that RPC profiling works in single-threaded server scenarios and that we won't make the assumption that we'll have multiple threads when working on this code. For example, this assumption resulted in a bug in the previous diff (which was fixed) ghstack-source-id: 112868469 Test Plan: CI Reviewed By: lw Differential Revision: D23691304 fbshipit-source-id: b17d34ade823794cbe949b70a5ab35723d974203	2020-09-25 13:24:18 -07:00
Rohan Varma	27ab9bc0f9	[RPC profiling] Extend RPC profiling to support async function execution over RPC. (#44664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44664 Closes https://github.com/pytorch/pytorch/issues/39971. This PR adds support for functions decorated with `rpc.functions.async_execution` to be profiled over RPC as builtins, jit functions, and blocking python UDFs currently can be. The reasoning for this is to provide complete feature support in terms of RPC profiling and the various types of functions users can run. To enable this, the PR below this enables calling `disableProfiler()` safely from another thread. We use that functionality to defer disabling the profiler on the server until the future corresponding to the RPC request completes (rather than only the blocking `processRPC` call as was done previously). Since when the future completes we've kicked off the async function and the future corresponding to it has completed, we are able to capture any RPCs the function would have called and the actual work done on the other node. For example, if the following async function is ran on a server over RPC: ``` def slow_add(x, y): time.sleep(1) return torch.add(x, y) rpc.functions.async_execution def slow_async_add(to, x, y): return rpc.rpc_async(to, slow_add, args=(x, y)) ``` we expect to see the original RPC profiled, the nested RPC profiled, and the actual torch.add() work. All of these events should be recorded with the correct node id. Here is an example profiling output: ``` ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Node ID ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- rpc_async#slow_async_add(worker1 -> worker2) 0.00% 0.000us 0 1.012s 1.012s 1 1 aten::empty 7.02% 11.519us 7.02% 11.519us 11.519us 1 1 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3) 0.00% 0.000us 0 1.006s 1.006s 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: aten::empty 7.21% 11.843us 7.21% 11.843us 11.843us 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::add 71.94% 118.107us 85.77% 140.802us 140.802us 1 3 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::empty 13.82% 22.695us 13.82% 22.695us 22.695us 1 3 ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Self CPU time total: 164.164us ``` This PR also moves a bunch of the profiling logic to `rpc/utils.cpp` to declutter `request_callback` code. ghstack-source-id: 112868470 Test Plan: ``` rvarm1@devbig978:fbcode (52dd34f6)$ buck test mode/no-gpu mode/dev-nosan //caffe2/test/distributed/rpc:process_group_agent -- test_rpc_profiling_async_function --print-passing-details --stress-runs 1 ``` Reviewed By: mrshenli Differential Revision: D23638387 fbshipit-source-id: eedb6d48173a4ecd41d70a9c64048920bd4807c4	2020-09-25 13:19:26 -07:00
Iurii Zdebskyi	d5748d9a1a	Enable binary ops with Scalar Lists with for foreach APIs (#45298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45298 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23931986 Pulled By: izdeby fbshipit-source-id: 281267cd6f90d57a169af89f9f10b0f4fcab47e3	2020-09-25 12:58:34 -07:00
gunandrose4u	f07ac6a004	Fix Windows build failure after DDP PR merged (#45335 ) Summary: Fixes #{issue number} This is resubmit for PR https://github.com/pytorch/pytorch/issues/42897 . Together with fix for Windows build issue introduced by PR https://github.com/pytorch/pytorch/issues/44344 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/45335 Reviewed By: zou3519 Differential Revision: D23931471 Pulled By: mrshenli fbshipit-source-id: f49b5a114944c1450b32934b3292170be064f494	2020-09-25 12:37:50 -07:00
Nikita Shulga	c8166d4b58	Add `torch.cuda.comm` to typechecking CI (#45350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45350 Reviewed By: walterddr Differential Revision: D23935750 Pulled By: malfet fbshipit-source-id: 5a7d2d4fbc976699d80bb5caf4727c19fa2c5bc8	2020-09-25 12:13:43 -07:00
Michael Suo	22401b850b	port all JIT tests to gtest (#45264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45264 Context for why we are porting to gtest in: https://github.com/pytorch/pytorch/pull/45018. This PR completes the process of porting and removes unused files/macros. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23901392 Pulled By: suo fbshipit-source-id: 89526890e1a49462f3f77718f4ee273c5bc578ba	2020-09-25 11:37:43 -07:00
Michael Suo	5a0514e3e6	[pytorch] Update fmt to 7.0.3 (#45304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45304 As title Test Plan: sandcastle Reviewed By: malfet Differential Revision: D23916328 fbshipit-source-id: 47c76886c1f17233304dc59289ff6baa16c50b8d	2020-09-25 11:33:36 -07:00
Gao, Xiang	dc9e9c118e	CUDA BFloat16 neg (#45240 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45240 Reviewed By: mruberry Differential Revision: D23933392 Pulled By: ngimel fbshipit-source-id: 2472dc550600ff470a1044ddee39054e22598038	2020-09-25 11:25:49 -07:00
Bram Wasti	e5f6e5af13	Add Deep and wide to test and flatten/tranpose for good measure (#44129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44129 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D23604302 Pulled By: bwasti fbshipit-source-id: 5787f6f32a80b22b1b712c4116f70370dad98f12	2020-09-25 11:05:41 -07:00
Bram Wasti	d1a11618f5	[static runtime] Add _out variants and reuse memory (#44128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44128 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D23604304 Pulled By: bwasti fbshipit-source-id: 06a23cb75700a0fc733069071843b7b498e7b9e9	2020-09-25 11:03:06 -07:00
Nick Gibson	d1d9017a66	[NNC] fix Half conversion of immediates in Cuda backend (#45213 ) Summary: The Cuda HalfChecker casts up all loads and stores of Half to Float, so we do math in Float on the device. It didn't cast up HalfImmediate (ie. constants) so they could insert mixed-size ops. Fix is to do that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45213 Reviewed By: ezyang Differential Revision: D23885287 Pulled By: nickgg fbshipit-source-id: 912991d85cc06ebb282625cfa5080d7525c8eba9	2020-09-25 10:53:36 -07:00
Hong Xu	536580e976	Vectorize bitwise_not (#45103 ) Summary: Benchmark (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz): ```python import timeit for dtype in ('torch.int64', 'torch.int32', 'torch.int16', 'torch.int8', 'torch.uint8'): for n, t in [(10_000, 100000), (100_000, 10000)]: print(f'torch.bitwise_not(a), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('torch.bitwise_not(a)', setup=f'import torch; a = torch.arange(-{n//2}, {n//2}, dtype={dtype})', number=t)) ``` Before: ``` torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int64 0.5479081739904359 torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int64 0.3350257440470159 torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int32 0.39590477803722024 torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int32 0.25563537096604705 torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int16 0.31152817397378385 torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int16 0.20817365101538599 torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int8 0.8573925020173192 torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int8 0.4150037349900231 torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.uint8 0.8551108679967001 torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.uint8 0.37137620500288904 ``` After: ``` torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int64 0.5232444299617782 torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int64 0.33852163201663643 torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int32 0.3931163849774748 torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int32 0.24392802000511438 torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int16 0.3122224889229983 torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int16 0.1977886479580775 torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int8 0.26711542706470937 torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int8 0.18208567495457828 torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.uint8 0.2615354140289128 torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.uint8 0.17972210398875177 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45103 Reviewed By: ailzhang Differential Revision: D23848675 Pulled By: ezyang fbshipit-source-id: 6dde1ab32d9a343a49de66ad9f9b062fa23824d2	2020-09-25 10:18:30 -07:00
Supriya Rao	a117d968f6	[quant][graph] Remove redundant aten::wait calls in the graph (#45257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45257 Currently we inline fork-wait calls when we insert observers for quantization In the case where fork and wait are in different subgraphs, inlining the fork-wait calls only gets rid of the fork. This leaves the aten::wait call in the graph with a torch.Tensor as input, which is currently not supported. To avoid this we check to make sure input to all wait calls in the graph is of type Future[tensor] in the cleanup phase Test Plan: python test/test_quantization.py TestQuantizeJitPasses.test_quantize_fork_wait Imported from OSS Reviewed By: qizzzh Differential Revision: D23895412 fbshipit-source-id: 3c58c6be7d7e7904eb6684085832ac21f827a399	2020-09-25 09:52:52 -07:00
Shinichiro Hamaji	8b00c4c794	[ONNX] Correct a minor typo in warning (#45187 ) Summary: The warning for batch_norm was mentioning dropout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45187 Reviewed By: glaringlee Differential Revision: D23873215 Pulled By: ezyang fbshipit-source-id: 1dcc82ad16522215f49b4cd0fc0e357b2094e4f2	2020-09-25 09:26:51 -07:00
Peter Bell	b70fac75ac	CMake: Fix python dependencies in codegen (#45275 ) Summary: I noticed while working on https://github.com/pytorch/pytorch/issues/45163 that edits to python files in the `tools/codegen/api/` directory wouldn't trigger rebuilds. This tells CMake about all of the dependencies, so rebuilds are triggered automatically. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45275 Reviewed By: zou3519 Differential Revision: D23922805 Pulled By: ezyang fbshipit-source-id: 0fbf2b6a9b2346c31b9b0384e5ad5e0eb0f70e9b	2020-09-25 09:16:38 -07:00
Sebastian Messmer	78fcde9c50	Trace scattered tensor options arguments (#44071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44071 Previously, tracing re-gathered ScalarType, Layout, Device, bool into a TensorOptions object and called `tracer::addInput()` on the gathered TensorOptions argument. `tracer::addInput()` then scattered them again and added the individual scattered arguments to the traced graph. This PR avoids the extraneous gathering and re-scattering step and calls `tracer::addInput()` on the individual arguments directly. This avoid the perf hit for an unnecessary gathering step. This applies to both c10-full and non-c10-full ops. In the case of c10-full ops, the tracing kernels takes scattered arguments and we can directly pass them to `tracer::addInput()`. In the case of non-c10-full ops, the kernel takes a `TensorOptions` argument but we still call `tracer::addInput()` on the scattered arguments. ghstack-source-id: 112825793 Test Plan: waitforsandcastle vs master: https://www.internalfb.com/intern/fblearner/details/216129483/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170069/ Reviewed By: ezyang Differential Revision: D23486638 fbshipit-source-id: e0b53e6673cef8d7f94158e718301eee261e5d22	2020-09-25 09:04:06 -07:00
Sebastian Messmer	2ac7de7d53	Remove hacky_wrapper from BackendSelect kernels (#44062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44062 Previously, BackendSelect kernels were still written in the legacy way, i.e. they took one TensorOptions argument instead of scattered dtype, layout, device, pin_memory, and they used hacky_wrapper to be callable. This caused a re-wrapping step. Calling into a BackencSelect kernel required taking the individual scattered arguments, packing them into a TensorOptions, and the kernel itself then gathered them again for redispatch. Now with this PR, BackendSelect kernels are written in the new way and no hacky_wrapper or rewrapping is needed for them. ghstack-source-id: 112825789 Test Plan: vs master: https://www.internalfb.com/intern/fblearner/details/216117032/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170194/ Reviewed By: ezyang Differential Revision: D23484192 fbshipit-source-id: e8fb49c4692404b6b775d18548b990c4cdddbada	2020-09-25 09:04:03 -07:00
Sebastian Messmer	043bd51b48	Remove hacky_wrapper from VariableType and TraceType (#44005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44005 Previously, VariableType and TraceType kernels were still written in the legacy way, i.e. they took one TensorOptions argument instead of scattered dtype, layout, device, pin_memory, and they used hacky_wrapper to be callable. Now with this PR, variable and tracing kernels are written in the new way and no hacky_wrapper is needed for them. ghstack-source-id: 112825791 Test Plan: waitforsandcastle https://www.internalfb.com/intern/fblearner/details/215954270/ Reviewed By: ezyang Differential Revision: D23466042 fbshipit-source-id: bde730a9e3bb4cb80ad484417be1ebecbdc2d377	2020-09-25 09:01:34 -07:00
Vinod Kumar S	bf8cd21f2a	Py transformer coder test (#43976 ) Summary: Fixes #{[37756](https://github.com/pytorch/pytorch/issues/37756)} Added the missing Transformer coder python test scripts from C++ API test scripts Pull Request resolved: https://github.com/pytorch/pytorch/pull/43976 Reviewed By: jamesr66a Differential Revision: D23873250 Pulled By: glaringlee fbshipit-source-id: cdeae53231e02208463e7629ba2c1f00990150ea	2020-09-25 08:22:24 -07:00
Brian Hirsh	2739a7c599	Byte-for-byte compatibility fixes in codegen (#44879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44879 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23825163 Pulled By: bdhirsh fbshipit-source-id: 4d8028274f82c401b393c4fe1b9e32de3f4909c6	2020-09-25 08:06:50 -07:00
kshitij12345	00e704e757	[fix] torch.repeat : dim-0 backward (#45212 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45201 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45212 Reviewed By: mrshenli Differential Revision: D23905545 Pulled By: albanD fbshipit-source-id: c5bf9cf481c8cf3ccc1fdbfb364006b29f67dc9f	2020-09-25 07:53:00 -07:00
Alex Suhan	76ee58e2ec	[TensorExpr] Move inner loops vectorization logic to its own method (#45287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45287 Test Plan: CI, build Reviewed By: gmagogsfm Differential Revision: D23913432 Pulled By: asuhan fbshipit-source-id: 3bf8fe09753f349e3c857863a43d2b1fca5101c1	2020-09-25 02:29:36 -07:00
Xiong Wei	241afc9188	Migrate `addr` from the TH to Aten (CPU) (#44364 ) Summary: Related https://github.com/pytorch/pytorch/issues/24507 Fixes https://github.com/pytorch/pytorch/issues/24666 This PR is to modernize the CPU implementation of the vector `outer product`. The existing TH implementation for `torch.attr` is migrated to `aten`, as the `torch.ger` manipulates the `addr` functions to calculate outer product, Pull Request resolved: https://github.com/pytorch/pytorch/pull/44364 Reviewed By: ezyang Differential Revision: D23866733 Pulled By: mruberry fbshipit-source-id: 5159ea22f0e3c991123fe7c19cc9beb6ad00301e	2020-09-25 01:18:09 -07:00
jjsjann123	99e0a87bbb	[nvFuser] Latency improvements for pointwise + reduction fusion (#45218 ) Summary: A lot of changes are in this update, some highlights: - Added Doxygen config file - Split the fusion IR (higher level TE like IR) from kernel IR (lower level CUDA like IR) - Improved latency with dynamic shape handling for the fusion logic - Prevent recompilation for pointwise + reduction fusions when not needed - Improvements to inner dimension reduction performance - Added input -> kernel + kernel launch parameters cache, added eviction policy - Added reduction fusions with multiple outputs (still single reduction stage) - Fixed code generation bugs for symbolic tiled GEMM example - Added thread predicates to prevent shared memory form being loaded multiple times - Improved sync threads placements with shared memory and removed read before write race - Fixes to FP16 reduction fusions where output would come back as FP32 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45218 Reviewed By: ezyang Differential Revision: D23905183 Pulled By: soumith fbshipit-source-id: 12f5ad4cbe03e9a25043bccb89e372f8579e2a79	2020-09-24 23:17:20 -07:00
Mike Ruberry	95df8657c9	Enables test linalg (#45278 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45271. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45278 Reviewed By: ngimel Differential Revision: D23926124 Pulled By: mruberry fbshipit-source-id: 26692597f9a1988e5fa846f97b8430c3689cac27	2020-09-24 23:09:38 -07:00
Vasiliy Kuznetsov	bdf329ef8a	SyncBN: preserve qconfig if it exists (#45317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45317 Eager mode quantization depends on the presence of the `config` model attribute. Currently converting a model to use `SyncBatchNorm` removes the qconfig - fixing this. This is important if a BN is not fused to anything during quantization convert. Test Plan: ``` python test/test_quantization.py TestDistributed.test_syncbn_preserves_qconfig ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23922072 fbshipit-source-id: cc1bc25c8e5243abb924c6889f78cf65a81be158	2020-09-24 22:52:07 -07:00
Mike Ruberry	103fa3894a	Revert D23841786: [pytorch][PR] Enable distributed package on windows, Gloo backend supported only Test Plan: revert-hammer Differential Revision: D23841786 (`0122299f9b`) Original commit changeset: 334ba1ed73ef fbshipit-source-id: ec95432f9957df56a5a04e52661f5db920b7f57f	2020-09-24 22:44:33 -07:00
Jerry Zhang	bc3151dee0	[quant] Remove unused qconfig argument in qat linear module (#45307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45307 fixes: https://github.com/pytorch/pytorch/issues/35634 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23917339 fbshipit-source-id: 65f8844b98198bbf93547b3d71408c2a54605218	2020-09-24 22:15:16 -07:00
Dhruv Matani	31ae8117ba	[RFC] Remove per-op-registration related code in caffe2/tools/codegen/gen.py (#45134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45134 Per-Op-Registration was a mechanism used for mobile selective build v0. Since then, a new dispathing mechanism has been built for PyTorch, and this code path isn't used any more. Remove it to simplify understanding/updating the code-generator's code-flow. ghstack-source-id: 112723942 Test Plan: `buck build` and sandcastle. Reviewed By: ezyang Differential Revision: D23806632 fbshipit-source-id: d93cd324650c541d9bfc8eeff2ddb2833b988ecc	2020-09-24 22:02:49 -07:00
gunandrose4u	0122299f9b	Enable distributed package on windows, Gloo backend supported only (#42897 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42095 For test case part will be committed to this PR later mrshenli, please help to review Pull Request resolved: https://github.com/pytorch/pytorch/pull/42897 Reviewed By: osalpekar Differential Revision: D23841786 Pulled By: mrshenli fbshipit-source-id: 334ba1ed73eff2f668857390fc32d1bc7f08e5f3	2020-09-24 21:13:55 -07:00
Yanli Zhao	c6500bcf14	[reland] Make grad point to bucket buffer in DDP to save memory usage (#44344 ) Summary: [test all] Pull Request resolved: https://github.com/pytorch/pytorch/pull/44344 reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. ghstack-source-id: 112845787 Test Plan: 1. When grad_is_view=false: a. roberta_base, peak memory usage 8250MB, p50 per iteration latency 0.923second, https://www.internalfb.com/intern/fblearner/details/218029699/?notif_channel=cli b. resnet, peak memory usage 3089MB, p50 per iteration latency 0.120second, https://www.internalfb.com/intern/fblearner/details/218029035/?notif_channel=cli c. accuracy benchmark, distributed=false, .accuracy 40.914535522461, .loss: 1.6370717287064; distributed=true, .accuracy: 39.966053009033, .loss: 1.6849111318588 https://www.internalfb.com/intern/fblearner/details/218035688/?notif_channel=cli d. classy vision uru production flow, https://www.internalfb.com/intern/fblearner/details/219065811/?notif_channel=cli e. pytext flow, https://www.internalfb.com/intern/fblearner/details/219137458/?notif_channel=cli 2. When grad_is_view=true: a. roberta_base, peak memory usage 7183MB, p50 per iteration latency 0.908second, https://www.internalfb.com/intern/fblearner/details/217882539?tab=operator_details b. resnet, peak memory usage 2988 MB, p50 per iteration latency 0.119second, https://www.internalfb.com/intern/fblearner/details/218028479/?notif_channel=cli c. accuracy benchmark, distributed=false, .accuracy 41.713260650635, .loss: 1.69939661026; distributed=true, .accuracy: 39.966053009033, .loss: 1.6849111318588, https://www.internalfb.com/intern/fblearner/details/218037058/?notif_channel=cli d. classy vision uru production flow, expected, can not work well with apex.amp https://www.internalfb.com/intern/fblearner/details/219205218/?notif_channel=cli e. pytext flow, detach_() related error, expected, as pytext zero_grad depends on apex repo where detach_() is called. also seeing the warning in finalize_bucket_dense due to tied weights, which is expected. https://www.internalfb.com/intern/fblearner/details/219150229/?notif_channel=cli Reviewed By: mrshenli Differential Revision: D23588186 fbshipit-source-id: f724d325b954ef6f06ede31759bf01dd29a6f5e5	2020-09-24 20:54:51 -07:00
Jiakai Liu	630bd85aae	[pytorch] refine dispatch keys in native_functions.yaml (2/N) (#45284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45284 This is the 2nd batch of the change described in #45010. In this batch we relaxed some filters to cover more 'backend specific' ops: * ops that not call any 'Tensor::is_xxx()' method OR only call 'Tensor::is_cuda()' - we are adding CUDA dispatch key anyway; * ops that call other ATen ops but ARE differentiable - differentiability is a fuzzy indicator of not being 'composite'; Inherited other filters from the 1st batch: * These ops don't already have dispatch section in native_functions.yaml; * These ops call one or more DispatchStub (thus "backend specific"); Differential Revision: D23909901 Test Plan: Imported from OSS Reviewed By: ailzhang Pulled By: ljk53 fbshipit-source-id: 3b31e176324b6ac814acee0b0f80d18443bd81a1	2020-09-24 20:18:57 -07:00
Xiao Wang	7e5492e1be	[minor] Fix undefined variable (#45246 ) Summary: The commit `2a37f3fd2f` https://github.com/pytorch/pytorch/pull/45130 deleted the python variable `capability` which is used in later lines. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45246 Reviewed By: walterddr Differential Revision: D23923916 Pulled By: malfet fbshipit-source-id: c5d7fef9e4a87ccc621191200e5965710e9d6aaa	2020-09-24 20:17:13 -07:00
Linbin Yu	0f2c648c97	log metadata when model loading failed (#44430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44430 log metadata even when model loading is failed Test Plan: {F331550976} Reviewed By: husthyc Differential Revision: D23577711 fbshipit-source-id: 0504e75625f377269f1e5df0f1ebe34b8e564c4b	2020-09-24 20:09:22 -07:00
Dianshi Li	03dde4c62a	Resend diff D23858329 (#45315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45315 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45314 in D23858329 (`721cfbf842`), we put PriorCorrectionCalibrationPrediction unit test in OSS file which causes test failure issue in public trunk. this diff moves it to FB only test file. Test Plan: ``` buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_gather_ranges_to_dense_op buck test //caffe2/caffe2/fb/python/operator_test:torch_integration_test -- test_prior_correct_calibration_prediction_op ``` all pass. Reviewed By: houseroad Differential Revision: D23899012 fbshipit-source-id: 1ed97d8702e2765991e6caf5695d4c49353dae82	2020-09-24 18:41:49 -07:00
Daya Khudia	677a59dcaa	[aten] Call fbgemm functions for embedding prepack/unpack (#44845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44845 fbgemm functions are vectorized and faster ``` Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/6473924484856786 Summary (total time 15.08s): PASS: 7 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Performance Before: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: qembeddingbag_byte_prepack # Mode: Eager # Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 68.727 # Benchmarking PyTorch: qembeddingbag_byte_prepack # Mode: Eager # Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 131.500 # Benchmarking PyTorch: qembeddingbag_byte_prepack # Mode: Eager # Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 248.190 # Benchmarking PyTorch: qembeddingbag_4bit_prepack # Mode: Eager # Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 172.742 # Benchmarking PyTorch: qembeddingbag_4bit_prepack # Mode: Eager # Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 333.008 # Benchmarking PyTorch: qembeddingbag_4bit_prepack # Mode: Eager # Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 652.423 # Benchmarking PyTorch: qembeddingbag_2bit_prepack # Mode: Eager # Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 167.282 # Benchmarking PyTorch: qembeddingbag_2bit_prepack # Mode: Eager # Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 398.901 # Benchmarking PyTorch: qembeddingbag_2bit_prepack # Mode: Eager # Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 785.254 # Benchmarking PyTorch: qembeddingbag_byte_unpack # Mode: Eager # Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 122.653 # Benchmarking PyTorch: qembeddingbag_byte_unpack # Mode: Eager # Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 230.617 # Benchmarking PyTorch: qembeddingbag_byte_unpack # Mode: Eager # Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 408.807 # Benchmarking PyTorch: qembeddingbag_4bit_unpack # Mode: Eager # Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 176.087 # Benchmarking PyTorch: qembeddingbag_4bit_unpack # Mode: Eager # Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 337.514 # Benchmarking PyTorch: qembeddingbag_4bit_unpack # Mode: Eager # Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 659.716 # Benchmarking PyTorch: qembeddingbag_2bit_unpack # Mode: Eager # Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 342.529 # Benchmarking PyTorch: qembeddingbag_2bit_unpack # Mode: Eager # Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 665.197 # Benchmarking PyTorch: qembeddingbag_2bit_unpack # Mode: Eager # Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 1307.923 ``` Performance After: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: qembeddingbag_byte_prepack # Mode: Eager # Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 10.782 # Benchmarking PyTorch: qembeddingbag_byte_prepack # Mode: Eager # Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 17.443 # Benchmarking PyTorch: qembeddingbag_byte_prepack # Mode: Eager # Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 25.898 # Benchmarking PyTorch: qembeddingbag_4bit_prepack # Mode: Eager # Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 13.903 # Benchmarking PyTorch: qembeddingbag_4bit_prepack # Mode: Eager # Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 18.575 # Benchmarking PyTorch: qembeddingbag_4bit_prepack # Mode: Eager # Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 30.650 # Benchmarking PyTorch: qembeddingbag_2bit_prepack # Mode: Eager # Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 14.158 # Benchmarking PyTorch: qembeddingbag_2bit_prepack # Mode: Eager # Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 19.818 # Benchmarking PyTorch: qembeddingbag_2bit_prepack # Mode: Eager # Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 30.852 # Benchmarking PyTorch: qembeddingbag_byte_unpack # Mode: Eager # Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 47.596 # Benchmarking PyTorch: qembeddingbag_byte_unpack # Mode: Eager # Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 91.025 # Benchmarking PyTorch: qembeddingbag_byte_unpack # Mode: Eager # Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 131.425 # Benchmarking PyTorch: qembeddingbag_4bit_unpack # Mode: Eager # Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 12.637 # Benchmarking PyTorch: qembeddingbag_4bit_unpack # Mode: Eager # Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 20.856 # Benchmarking PyTorch: qembeddingbag_4bit_unpack # Mode: Eager # Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 33.944 # Benchmarking PyTorch: qembeddingbag_2bit_unpack # Mode: Eager # Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim128 # Input: num_embeddings: 80, embedding_dim: 128 Forward Execution Time (us) : 21.181 # Benchmarking PyTorch: qembeddingbag_2bit_unpack # Mode: Eager # Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim256 # Input: num_embeddings: 80, embedding_dim: 256 Forward Execution Time (us) : 34.213 # Benchmarking PyTorch: qembeddingbag_2bit_unpack # Mode: Eager # Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim512 # Input: num_embeddings: 80, embedding_dim: 512 Forward Execution Time (us) : 59.622 ``` ghstack-source-id: 112836216 Test Plan: buck test //caffe2/test:quantization -- 'test_embedding_bag*' --print-passing-details Reviewed By: radkris-git Differential Revision: D23675777 fbshipit-source-id: 0b1a787864663daecc7449295f9ab6264eac52fc	2020-09-24 17:21:03 -07:00
Ailing Zhang	0b6e5ad4a9	Resolve comments in #44354 . (#45150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45150 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D23846796 Pulled By: ailzhang fbshipit-source-id: 7bef89d833848ac3f8993c4c037acf1d4f2ca674	2020-09-24 16:40:02 -07:00
Himangshu	92ebb04f92	added check for NumberType (#44375 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44107 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44375 Reviewed By: mrshenli Differential Revision: D23906728 Pulled By: eellison fbshipit-source-id: 3b534e5dd3af1f5e43a7314953e64117cbe8ffe4	2020-09-24 16:26:59 -07:00
Rohan Varma	bee1d448e7	Fix test_rpc_profiling_remote_record_function (#45162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45162 This test was flaky because it was not able to validate that the overall record_function's CPU times are greater than the sum of its children. It turns out that this is a general bug in the profiler that can be reproduced without RPC, see https://github.com/pytorch/pytorch/issues/45160. Hence, removing this from the test and replacing it by just validating the expected children. Ran the test 1000 times and they all passed. ghstack-source-id: 112632327 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23851854 fbshipit-source-id: 5d9023acd17800a6668ba4849659d8cc902b8d6c	2020-09-24 15:57:32 -07:00
Elias Ellison	5dd288eb06	[JIT] Regularize tensorexpr fuser strategy with other fusers (#44972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44972 Previously, our fusion strategy would be: - start at the end of the block, find a fusable node - iteratively try to merge inputs into the fusion group, sorted topologically This strategy works pretty well, but has the possibility of missing fusion groups. See my attached test case for an example where we wouldn't find all possible fusion groups. bertmaher found an example of a missed fusion groups in one of our rnn examples (jit_premul) that caused a regression from the legacy fuser. Here, I'm updating our fusion strategy to be the same as our other fusion passes - create_autodiff_subgraphs, and graph_fuser.cpp. The basic strategy is: - iterate until you find a fusible node - try to merge the nodes inputs, whenever a succesful merge occurs restart at the beginning of the nodes inputs - after you've exhausted a node, continue searching the block for fusion opportunities from the node - continue doing this on the block until we go through an iteration without an succesful merges Since we create the fusion groups once, and only re-specialize within the fusion groups, we should be running this very infrequently (only re-triggers when we fail undefinedness specializations). Also bc it's the same algorithm as the existing fuser it is unlikely to cause a regression. Test Plan: Imported from OSS Reviewed By: Krovatkin, robieta Differential Revision: D23821581 Pulled By: eellison fbshipit-source-id: e513d1ef719120dadb0bfafc7a14f4254cd806ee	2020-09-24 15:34:21 -07:00
Elias Ellison	0137e3641d	Refactor subgraph merging (#44238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44238 Refactor create_autodiff_subgraphs to use the same updating of output aliasing properties logic as tensorexpr fuser, and factor that out to a common function in subgraph utils. Test Plan: Imported from OSS Reviewed By: Krovatkin, robieta Differential Revision: D23871565 Pulled By: eellison fbshipit-source-id: 72df253b16baf8e4aabf3d68b103b29e6a54d44c	2020-09-24 15:29:34 -07:00
Haixin Liu	1539d4a664	Add operator to compute the equalization scale (#45096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45096 Add operator to compute the equalization scale. This will be used in the integration of equalization into dper int8 fixed quant scheme quantization flow. Design docs: https://fb.quip.com/bb7SAGBxPGNC https://fb.quip.com/PDAOAsgoLfRr Test Plan: buck test caffe2/caffe2/quantization/server:compute_equalization_scale_test Reviewed By: jspark1105 Differential Revision: D23779870 fbshipit-source-id: 5e6a8c220935a142ecf8e61100a8c71932afa8d7	2020-09-24 15:19:49 -07:00
Ashkan Aliabadi	5a59330647	Add architectural support for multi-GPU. (#44059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44059 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23820825 Pulled By: AshkanAliabadi fbshipit-source-id: 0719b00581487a77ebadff867d1e4ac89354bf90	2020-09-24 15:11:55 -07:00
Ashkan Aliabadi	6311c5a483	Minor touchups. (#44317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44317 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23820828 Pulled By: AshkanAliabadi fbshipit-source-id: b83bdea9aed2fb52bd254ff15914d55a1af58c04	2020-09-24 15:07:08 -07:00
vishalrao487	b84dd771e6	Grammatically updated the tech docs (#45192 ) Summary: Small grammatical update to the [https://pytorch.org/docs/stable/tensors.html](url) docs. _update1_ ![update1](https://user-images.githubusercontent.com/62737243/93969792-5c0ea800-fd8a-11ea-8c9f-0033f51a1fdc.png) _update2_ ![update2](https://user-images.githubusercontent.com/62737243/93969801-603ac580-fd8a-11ea-812d-d3026b9fc8a5.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45192 Reviewed By: bwasti Differential Revision: D23877870 Pulled By: ezyang fbshipit-source-id: 929ba3d479925b5132dbe87fad2da487408db7c7	2020-09-24 14:48:30 -07:00
Danny Huang	cd7a682282	[caffe2] adds hypothesis test for queue ops cancel (#45178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45178 ## Motivation * To be able to make C2 ops cancellable so we can safely exit. * Some C2 operators are now blocking thus being non-cancellable. If an error occurs we need to be able to safely stop all net execution so we can throw the exception to the caller. ## Summary * Adds a hypothesis test for queue ops cancellation. Test Plan: ## Unit test added to verify that queue ops propagate errors ``` buck test caffe2/caffe2/python:hypothesis_test buck test caffe2/caffe2/python:hypothesis_test -- test_safe_dequeue_blob__raises_exception_when_hang --stress-runs 1000 ``` ``` Summary Pass: 1000 ListingSuccess: 1 ``` Reviewed By: d4l3k Differential Revision: D23847576 fbshipit-source-id: 2fc351e1ee13ea8b32d976216d2d01dfb6fcc1ad	2020-09-24 14:43:52 -07:00
Mikhail Zolotukhin	71e6ce6616	[JIT] Specialize AutogradZero: merge AutogradAnyNonZero and Not(AutogradAnyNonZero) checks into one. (#44987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44987 This PR introduces new `prim::AutogradAllZero` and `prim::AutogradAllNonZero` ops that are used for a batch check for multiple tensors. The specialize-autogradzero pass now generates one check for all expected-to-be-undefined tensors, one check for all expected-to-be-defined tensors, and a bunch of checks for size parameters passed to `grad_sum_to_size` (this probably could be cleaned up somehow as well in future). An example of what we generated before this change: ``` %1626 : bool = prim::AutogradAnyNonZero(%0) %1627 : bool = prim::AutogradAnyNonZero(%2) %1628 : bool = aten::__not__(%1627) %1629 : bool = prim::AutogradAnyNonZero(%3) %1630 : bool = aten::__not__(%1629) %1631 : bool = prim::AutogradAnyNonZero(%4) %1632 : bool = aten::__not__(%1631) %1633 : bool = prim::AutogradAnyNonZero(%5) %1634 : bool = aten::__not__(%1633) %1635 : bool = prim::AutogradAnyNonZero(%6) %1636 : bool = aten::__not__(%1635) %1637 : bool = prim::AutogradAnyNonZero(%7) %1638 : bool = aten::__not__(%1637) %1639 : bool = prim::AutogradAnyNonZero(%8) %1640 : bool = aten::__not__(%1639) %1641 : bool = prim::AutogradAnyNonZero(%9) %1642 : bool = aten::__not__(%1641) %1643 : bool = prim::AutogradAnyNonZero(%10) %1644 : bool = aten::__not__(%1643) %1645 : bool = prim::AutogradAnyNonZero(%11) %1646 : bool = aten::__not__(%1645) %1647 : bool = prim::AutogradAnyNonZero(%12) %1648 : bool = aten::__not__(%1647) %1649 : bool = prim::AutogradAnyNonZero(%13) %1650 : bool = aten::__not__(%1649) %1651 : bool = prim::AutogradAnyNonZero(%14) %1652 : bool = aten::__not__(%1651) %1653 : bool = prim::AutogradAnyNonZero(%15) %1654 : bool = aten::__not__(%1653) %1655 : bool = prim::AutogradAnyNonZero(%16) %1656 : bool = aten::__not__(%1655) %1657 : bool = prim::AutogradAnyNonZero(%17) %1658 : bool = prim::AutogradAnyNonZero(%18) %1659 : bool = prim::AutogradAnyNonZero(%19) %1660 : bool = prim::AutogradAnyNonZero(%20) %1661 : bool = aten::__is__(%self_size.16, %1625) %1662 : bool = aten::__is__(%other_size.16, %1625) %1663 : bool = aten::__is__(%self_size.14, %1625) %1664 : bool = aten::__is__(%self_size.12, %1625) %1665 : bool = prim::AutogradAnyNonZero(%ingate.7) %1666 : bool = prim::AutogradAnyNonZero(%forgetgate.7) %1667 : bool = prim::AutogradAnyNonZero(%cellgate.7) %1668 : bool = prim::AutogradAnyNonZero(%30) %1669 : bool = prim::AutogradAnyNonZero(%31) %1670 : bool = aten::__is__(%self_size.10, %1625) %1671 : bool = aten::__is__(%other_size.10, %1625) %1672 : bool = prim::AutogradAnyNonZero(%34) %1673 : bool = prim::AutogradAnyNonZero(%35) %1674 : bool = aten::__is__(%self_size.8, %1625) %1675 : bool = aten::__is__(%other_size.8, %1625) %1676 : bool = aten::__is__(%self_size.6, %1625) %1677 : bool = aten::__is__(%other_size.6, %1625) %1678 : bool = prim::AutogradAnyNonZero(%outgate.7) %1679 : bool = prim::AutogradAnyNonZero(%41) %1680 : bool = prim::AutogradAnyNonZero(%42) %1681 : bool = prim::AutogradAnyNonZero(%43) %1682 : bool = aten::__is__(%self_size.4, %1625) %1683 : bool = aten::__is__(%other_size.4, %1625) %1684 : bool[] = prim::ListConstruct(%1626, %1628, %1630, %1632, %1634, %1636, %1638, %1640, %1642, %1644, %1646, %1648, %1650, %1652, %1654, %1656, %1657, %1658, %1659, %1660, %1661, %1662, %1663, %1664, %1665, %1666, %1667, %1668, %1669, %1670, %1671, %1672, %1673, %1674, %1675, %1676, %1677, %1678, %1679, %1680, %1681, %1682, %1683) %1685 : bool = aten::all(%1684) ``` Same example after this change: ``` %1625 : None = prim::Constant() %1626 : bool = aten::__is__(%self_size.16, %1625) %1627 : bool = aten::__is__(%other_size.16, %1625) %1628 : bool = aten::__is__(%self_size.14, %1625) %1629 : bool = aten::__is__(%self_size.12, %1625) %1630 : bool = aten::__is__(%self_size.10, %1625) %1631 : bool = aten::__is__(%other_size.10, %1625) %1632 : bool = aten::__is__(%self_size.8, %1625) %1633 : bool = aten::__is__(%other_size.8, %1625) %1634 : bool = aten::__is__(%self_size.6, %1625) %1635 : bool = aten::__is__(%other_size.6, %1625) %1636 : bool = aten::__is__(%self_size.4, %1625) %1637 : bool = aten::__is__(%other_size.4, %1625) %1638 : bool = prim::AutogradAllNonZero(%0, %17, %18, %19, %20, %ingate.7, %forgetgate.7, %cellgate.7, %30, %31, %34, %35, %outgate.7, %41, %42, %43) %1639 : bool = prim::AutogradAllZero(%2, %3, %4, %5, %6, %7, %8, %9, %10, %11, %12, %13, %14, %15, %16) %1640 : bool[] = prim::ListConstruct(%1626, %1627, %1628, %1629, %1630, %1631, %1632, %1633, %1634, %1635, %1636, %1637, %1638, %1639) %1641 : bool = aten::all(%1640) ``` My performance measurements showed some changes, but I don't really trust them and think that they are probably just a noise. Below are tables with min-aggregation over 10 runs: FastRNN models: \| name \| base time (s) \| diff time (s) \| % change \| \| :--- \| ---: \| ---: \| ---: \| \| lstm[aten]:bwd \| 30.059927 \| 29.834089 \| -0.8% \| \| lstm[aten]:fwd \| 25.673708 \| 25.700039 \| 0.1% \| \| lstm[cudnn]:bwd \| 17.866232 \| 17.893120 \| 0.2% \| \| lstm[cudnn]:fwd \| 11.418444 \| 11.408514 \| -0.1% \| \| lstm[jit]:bwd \| 27.127205 \| 27.141029 \| 0.1% \| \| lstm[jit]:fwd \| 17.018047 \| 16.975451 \| -0.3% \| \| lstm[jit_multilayer]:bwd \| 27.502396 \| 27.365149 \| -0.5% \| \| lstm[jit_multilayer]:fwd \| 16.918591 \| 16.917767 \| -0.0% \| \| lstm[jit_premul]:bwd \| 22.281199 \| 22.215082 \| -0.3% \| \| lstm[jit_premul]:fwd \| 14.848708 \| 14.896231 \| 0.3% \| \| lstm[jit_premul_bias]:bwd \| 20.761206 \| 21.170969 \| 2.0% \| \| lstm[jit_premul_bias]:fwd \| 15.013515 \| 15.037978 \| 0.2% \| \| lstm[jit_simple]:bwd \| 26.715771 \| 26.697786 \| -0.1% \| \| lstm[jit_simple]:fwd \| 16.675898 \| 16.545893 \| -0.8% \| \| lstm[py]:bwd \| 56.327065 \| 54.731030 \| -2.8% \| \| lstm[py]:fwd \| 39.876324 \| 39.230572 \| -1.6% \| Torch Hub models: \| name \| base time (s) \| diff time (s) \| % change \| \| :--- \| ---: \| ---: \| ---: \| \| test_eval[BERT_pytorch-cuda-jit] \| 0.111706 \| 0.106604 \| -4.6% \| \| test_eval[LearningToPaint-cuda-jit] \| 0.002841 \| 0.002801 \| -1.4% \| \| test_eval[Super_SloMo-cuda-jit] \| 0.384869 \| 0.384737 \| -0.0% \| \| test_eval[attension_is_all_you_nee...-cuda-jit] \| 0.123857 \| 0.123923 \| 0.1% \| \| test_eval[demucs-cuda-jit] \| 0.077270 \| 0.076878 \| -0.5% \| \| test_eval[fastNLP-cuda-jit] \| 0.000255 \| 0.000249 \| -2.3% \| \| test_eval[moco-cuda-jit] \| 0.426472 \| 0.427380 \| 0.2% \| \| test_eval[pytorch_CycleGAN_and_pix...-cuda-jit] \| 0.026483 \| 0.026423 \| -0.2% \| \| test_eval[pytorch_mobilenet_v3-cuda-jit] \| 0.036202 \| 0.035853 \| -1.0% \| \| test_eval[pytorch_struct-cuda-jit] \| 0.001439 \| 0.001495 \| 3.9% \| \| test_train[BERT_pytorch-cuda-jit] \| 0.247236 \| 0.247188 \| -0.0% \| \| test_train[Background_Matting-cuda-jit] \| 3.536659 \| 3.581864 \| 1.3% \| \| test_train[LearningToPaint-cuda-jit] \| 0.015341 \| 0.015331 \| -0.1% \| \| test_train[Super_SloMo-cuda-jit] \| 1.018626 \| 1.019098 \| 0.0% \| \| test_train[attension_is_all_you_nee...-cuda-jit] \| 0.446314 \| 0.444893 \| -0.3% \| \| test_train[demucs-cuda-jit] \| 0.169647 \| 0.169846 \| 0.1% \| \| test_train[fastNLP-cuda-jit] \| 0.001990 \| 0.001978 \| -0.6% \| \| test_train[moco-cuda-jit] \| 0.855323 \| 0.856974 \| 0.2% \| \| test_train[pytorch_mobilenet_v3-cuda-jit] \| 0.497723 \| 0.485416 \| -2.5% \| \| test_train[pytorch_struct-cuda-jit] \| 0.309692 \| 0.308792 \| -0.3% \| Differential Revision: D23794659 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 859b68868ef839c5c6cbc7021879ee22d3144ea8	2020-09-24 14:31:49 -07:00
Danny Huang	cbe1eac1f4	[caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp (#45177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45177 ## Motivation * To be able to make C2 ops cancellable so we can safely exit. * Some C2 operators are now blocking thus being non-cancellable. If an error occurs we need to be able to safely stop all net execution so we can throw the exception to the caller. ## Summary * When an error occurs in a net or it got cancelled, running ops will have the `Cancel` method called. This diff adds `Cancel` method to the `SafeEnqueueBlobsOp` and `SafeDequeueBlobsOp` to have the call queue->close() to force all the blocking ops to return. * Adds unit test that verified the error propagation. Test Plan: ## Unit test added to verify that queue ops propagate errors ``` buck test caffe2/caffe2/python:hypothesis_test -- test_safe_dequeue_blob__raises_exception_when_hang --stress-runs 1000 ``` ``` Summary Pass: 1000 ListingSuccess: 1 ``` Reviewed By: d4l3k Differential Revision: D23846967 fbshipit-source-id: c7ddd63259e033ed0bed9df8e1b315f87bf59394	2020-09-24 14:22:46 -07:00
Yi Wang	022ba5a78b	Make ddp_comm_hook_wrapper a private method. (#44643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44643 This method is not used anywhere else. Also formatted the file. Test Plan: buck test caffe2/test/distributed/algorithms/ddp_comm_hooks:test_ddp_hooks Reviewed By: pritamdamania87 Differential Revision: D23675945 fbshipit-source-id: 2d04f94589a20913e46b8d71e6a39b70940c1461	2020-09-24 13:29:48 -07:00
Xiaomeng Yang	e2bcdc7b69	[Caffe2] Fix LayerNormOp when batch_size == 0. (#45250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45250 [Caffe2] Fix LayerNormOp when batch_size == 0. Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:layer_norm_op_test Reviewed By: houseroad Differential Revision: D23892091 fbshipit-source-id: 9a34654dd6880c9d14b7111fcf850e4f48ffdf91	2020-09-24 12:30:03 -07:00
Nikita Shulga	c3a5aed5f7	Run pytorch_core CUDA tests on GPU using TPX Summary: Modify contbuild to disable sanitizers, add option to run "cuda" test using TPX RE (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: walterddr, cspanda Differential Revision: D23854578 fbshipit-source-id: 327d7cc3655c17034a6a7bc78f69967403290623	2020-09-24 12:12:23 -07:00
Jeff Daily	c211a9102f	add rocm 3.8 to nightly builds (#45222 ) Summary: Corresponding change in builder repo: https://github.com/pytorch/builder/pull/528. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45222 Reviewed By: ezyang Differential Revision: D23894831 Pulled By: walterddr fbshipit-source-id: c6a256ec325ddcf5836b4d293f546368d58db538	2020-09-24 12:00:30 -07:00
Xinyu Li	26001a2334	Revert D23753711: [pytorch][PR] Add foreach APIs for binary ops with ScalarList Test Plan: revert-hammer Differential Revision: D23753711 (`71d1b5b0e2`) Original commit changeset: bf3e8c54bc07 fbshipit-source-id: 192692e0d3fff4cade9983db0a1760fedfc9674c	2020-09-24 11:55:49 -07:00
Kyle Chen	c79d493096	added rocm 3.8 docker image (#45205 ) Summary: jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/45205 Reviewed By: malfet Differential Revision: D23906606 Pulled By: walterddr fbshipit-source-id: 604a12bf4c97260215a1881cc96e35e7c42b4578	2020-09-24 11:18:33 -07:00
Gao, Xiang	3f5eee666c	Adjust TF32 tests (#44240 ) Summary: - The thresholds of some tests are bumped up. Depending on the random generator, sometimes these tests fail with things like 0.0059 is not smaller than 0.005. I ran `test_nn.py` and `test_torch.py` for 10+ times to check these are no longer flaky. - Add `tf32_on_and_off` to new `matrix_exp` tests. - Disable TF32 on test suites other than `test_nn.py` and `test_torch.py` cc: ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/44240 Reviewed By: mruberry Differential Revision: D23882498 Pulled By: ngimel fbshipit-source-id: 44a9ec08802c93a2efaf4e01d7487222478b6df8	2020-09-24 10:25:58 -07:00
Rong Rong	b8eab8cdbd	[hotfix] typo in NaiveConvolutionTranspose2d.cu (#45224 ) Summary: Fixes typo in e2f49c8 Fixes https://github.com/pytorch/pytorch/issues/45172 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45224 Reviewed By: ezyang Differential Revision: D23879872 Pulled By: walterddr fbshipit-source-id: c3db6d4c6f2ac0e6887862d4217a79c030647cb9	2020-09-24 10:06:29 -07:00
Rohan Varma	e57a08119b	Add a warning log when there is high skew of uneven inputs in DDP training (#45238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45238 Adds a warning when there is much higher than expected amount of discrepancy of inputs across different processes when running with uneven inputs. This is because a skew in the thousands can reduce performance a nontrivial amount as shown in benchmarks, and it was proposed to add this warning as a result. Tested by running the tests so the threshold is hit and observing the output. ghstack-source-id: 112773552 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23719270 fbshipit-source-id: 306264f62c1de65e733696a912bdb6e9376d5622	2020-09-24 09:50:44 -07:00
Raziel Alvarez Guevara	2b38c09f69	Moves prim ops from C10 back to JIT (#45144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45144 Moves prim ops from C10 back to JIT. These were originally moved to C10 from JIT in D19237648 (`f362cd510d`) ghstack-source-id: 112775781 Test Plan: buck test //caffe2/test/cpp/jit:jit https://pxl.cl/1l22N buck test adsatlas/gavel/lib/ata_processor/tests:ata_processor_test https://pxl.cl/1lBxD Reviewed By: iseeyuan Differential Revision: D23697598 fbshipit-source-id: 36d1eb8c346e9b161ba6af537a218440a9bafd27	2020-09-24 09:44:20 -07:00
Taylor Robie	8507ea22b2	replace timer test with a mocked variant (#45173 ) Summary: I noticed that the recently introduced adaptive_autorange tests occasionally timeout CI, and I've been meaning to improve the Timer tests for a while. This PR allows unit tests to swap the measurement portion of `Timer` with a deterministic mock so we can thoroughly test behavior without having to worry about flaky CI measurements. It also means that the tests can be much more detailed and still finish very quickly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45173 Test Plan: You're lookin' at it. Reviewed By: ezyang Differential Revision: D23873548 Pulled By: robieta fbshipit-source-id: 26113e5cea0cbf46909b9bf5e90c878c29e87e88	2020-09-24 09:42:37 -07:00
Shen Li	bfdf4323ac	Bump up NCCL to 2.7.8 (#45251 ) Summary: Use latest NCCL Pull Request resolved: https://github.com/pytorch/pytorch/pull/45251 Reviewed By: mingzhe09088 Differential Revision: D23893064 Pulled By: mrshenli fbshipit-source-id: 820dd166039e61a5aa59b4c5bbc615a7b18be8c3	2020-09-24 09:33:57 -07:00
Brian Hirsh	5195d727b5	adding a test for ddp save()/load() (#44906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44906 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23825386 Pulled By: bdhirsh fbshipit-source-id: 2276e6e030ef9cffd78fc78c2ffe34d60a1e160e	2020-09-24 09:15:53 -07:00
Brian Hirsh	f9ae296a85	renaming TestDdpCommHook class so it doesn't get picked up as a test by pytest (#44905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44905 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23825308 Pulled By: bdhirsh fbshipit-source-id: 17a07b3bd211850d6ecca793fd9ef3f326ca9274	2020-09-24 08:46:25 -07:00
Rong Rong	bc591d76a1	add skip_if_rocm to all requires_nccl tests (#45158 ) Summary: requires_nccl annotation should skip_if_rocm as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/45158 Reviewed By: seemethere Differential Revision: D23879952 Pulled By: walterddr fbshipit-source-id: 818fb31ab75d5f02e77fe3f1367faf748855bee7	2020-09-24 08:37:49 -07:00
iurii zdebskyi	71d1b5b0e2	Add foreach APIs for binary ops with ScalarList (#44743 ) Summary: In this PR: 1) Added binary operations with ScalarLists. 2) Fixed _foreach_div(...) bug in native_functions 3) Covered all possible cases with scalars and scalar lists in tests 4) [minor] fixed bug in native_functions by adding "use_c10_dispatcher: full" to all _foreach functions tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/44743 Reviewed By: bwasti, malfet Differential Revision: D23753711 Pulled By: izdeby fbshipit-source-id: bf3e8c54bc07867e8f6e82b5d3d35ff8e99b5a0a	2020-09-24 08:30:42 -07:00
Rong Rong	bea7901e38	Enable torch.tensor typechecks (#45077 ) Summary: this fixes https://github.com/pytorch/pytorch/issues/42983. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45077 Reviewed By: ezyang Differential Revision: D23842493 Pulled By: walterddr fbshipit-source-id: 1c516a5ff351743a187d00cba7ed0be11678edf1	2020-09-24 08:22:06 -07:00
Peter Bell	dc67b47bc9	Deprecate old fft functions (#44876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44876 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23866715 Pulled By: mruberry fbshipit-source-id: 73305eb02f92cbd1ef7d175419529d19358fedda	2020-09-24 02:39:44 -07:00
Michael Suo	6d21d5f0b3	gtest-ify JIT tests, through the letter c (#45249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45249 Reland of https://github.com/pytorch/pytorch/pull/45055 and https://github.com/pytorch/pytorch/pull/45020 See https://github.com/pytorch/pytorch/pull/45018 for context. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23892645 Pulled By: suo fbshipit-source-id: e7fe58d5e1a5a0c44f4e2aec9694145afabde0fd	2020-09-24 00:21:20 -07:00
Alexander	29dc3c5ec8	Sparse softmax support (CUDA) (#42307 ) Summary: This PR implements softmax support for sparse tensors. Resolves gh-23651 for CUDA. - [x] sparse softmax - [x] CUDA C++ implementation - [x] unittests - [x] update softmax documentation - [x] autograd support - [x] sparse log_softmax - [x] CUDA C++ implementation - [x] unittests - [x] update log_softmax documentation - [x] autograd support Here are some benchmark (script is [here](https://gist.github.com/aocsa/fbc1827b3e49901512a33ba96092cbc1)) results for `torch.sparse.softmax and torch.softmax`, using CPU and GPU, values are float64 scalars, timing repeat is 1000: \| size \| density \| sparse CUDA \| sparse CPU \| \|--------------\|---------\|-------------\|------------\| \| (32, 10000) \| 0.01 \| 380.2 \| 687.5 \| \| (32, 10000) \| 0.05 \| 404.3 \| 2357.9 \| \| (32, 10000) \| 0.1 \| 405.9 \| 3677.2 \| \| (512, 10000) \| 0.01 \| 438.0 \| 5443.4 \| \| (512, 10000) \| 0.05 \| 888.1 \| 24485.0 \| \| (512, 10000) \| 0.1 \| 1921.3 \| 45340.5 \| \| size \| density \| dense CUDA \| dense CPU \| \|--------------\|---------\|-------------\|------------\| \| (32, 10000) \| 0.01 \| 23.6 \| 1943.2 \| \| (32, 10000) \| 0.05 \| 23.6 \| 1954.0 \| \| (32, 10000) \| 0.1 \| 23.5 \| 1950.0 \| \| (512, 10000) \| 0.01 \| 639.3 \| 39797.9 \| \| (512, 10000) \| 0.05 \| 640.3 \| 39374.4 \| \| (512, 10000) \| 0.1 \| 639.6 \| 39192.3 \| Times are in microseconds (us). Quick note: I updated the performance test again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42307 Reviewed By: ngimel Differential Revision: D23774427 Pulled By: mruberry fbshipit-source-id: bfabf726075b39dde544c10249f27ae1871f82c7	2020-09-24 00:07:30 -07:00
Negin Raoof	b3d7c2f978	[ONNX] Update ONNX docs for release (#45086 ) Summary: ONNX doc updates. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45086 Reviewed By: ezyang Differential Revision: D23880383 Pulled By: bzinodev fbshipit-source-id: ca29782fd73024967ee7708c217a005233e7b970	2020-09-23 23:28:36 -07:00
Alex Suhan	3dd0e362db	[TensorExpr] Fix min and max for integral inputs in CUDA backend (#44984 ) Summary: For integral types, isnan is meaningless. Provide specializations for maximum and minimum which don't call it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44984 Test Plan: python test/test_jit_fuser_te.py -k TestTEFuser.test_minmax_int_ops Reviewed By: ezyang Differential Revision: D23885259 Pulled By: asuhan fbshipit-source-id: 2e6da2c43c0ed18f0b648a2383d510894c574437	2020-09-23 23:19:12 -07:00
Hong Xu	b470fa4500	Add complex number support for binary logical operators (#43174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43174 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23684425 Pulled By: mruberry fbshipit-source-id: 4857b16e18ec4c65327136badd7f04c74e32d330	2020-09-23 23:03:00 -07:00
kshitij12345	0b6b735863	[fix] type promotion atan2 (#43466 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43360 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43466 Reviewed By: malfet Differential Revision: D23834928 Pulled By: mruberry fbshipit-source-id: 2e7e0b4fcf1a846efc171c275d65a6daffd3c631	2020-09-23 22:23:05 -07:00
Peter Bell	6a2e9eb51c	torch.fft: Multi-dimensional transforms (#44550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44550 Part of the `torch.fft` work (gh-42175). This adds n-dimensional transforms: `fftn`, `ifftn`, `rfftn` and `irfftn`. This is aiming for correctness first, with the implementation on top of the existing `_fft_with_size` restrictions. I plan to follow up later with a more efficient rewrite that makes `_fft_with_size` work with arbitrary numbers of dimensions. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23846032 Pulled By: mruberry fbshipit-source-id: e6950aa8be438ec5cb95fb10bd7b8bc9ffb7d824	2020-09-23 22:09:58 -07:00
Rohan Varma	070fe15e4c	Add link to profiling recipe from rpc main docs (#45235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45235 This is so that users know that the profiler works as expected with RPC and they can learn how to use it to profile RPC-based workloads. ghstack-source-id: 112773748 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23777888 fbshipit-source-id: 4805be9b949c8c7929182f291a6524c3c6a725c1	2020-09-23 22:02:38 -07:00
Mike Ruberry	956a25d061	Revert D23858329: [PT Model Split] Support 2 operators in PT by C2 conversion Test Plan: revert-hammer Differential Revision: D23858329 (`721cfbf842`) Original commit changeset: ed37118ca7f0 fbshipit-source-id: 30c700f80665be11afc608b00a77766064e60b35	2020-09-23 21:20:21 -07:00
Bert Maher	2d00ebd29f	Failing test demonstrating problems with mixed output shapes (#44455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44455 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23886119 Pulled By: bertmaher fbshipit-source-id: 41787930f154cf4e8a1766613c4cf33b18246555	2020-09-23 21:15:37 -07:00
Jordan Fix	c760bc8fb1	Add GlowLoadAOTModel flag (#45189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45189 Pull Request resolved: https://github.com/pytorch/glow/pull/4902 Test Plan: Test locally Reviewed By: yinghai Differential Revision: D23810445 fbshipit-source-id: 56e717d80abbfe76b15d0f4249e1e399a9722753	2020-09-23 20:50:04 -07:00
Supriya Rao	60665ace17	[quant] Add optimized approach to calculate qparams for qembedding_bag (#45149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45149 The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23848060 fbshipit-source-id: c6c57c9bb07664c3f1c87dd7664543e09f634aee	2020-09-23 19:00:22 -07:00
Dianshi Li	721cfbf842	[PT Model Split] Support 2 operators in PT by C2 conversion (#45231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45231 There are two operators: `PriorCorrectionCalibrationPrediction` and `GatherRangesToDense` is not supported in PT which makes GLOW cannot work. To unblock, we first try to use C2->PT conversion. In the long-term, we need to implement PT custom ops. This diff does this conversion to unblock current project. Test Plan: Run unit test. the Test input is from current DPER example. All pass. ```buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_prior_correct_calibration_prediction_op --print-passing-details > c2 reference output > [0.14285715 0.27272728 0.39130434 0.5 ] > PT converted output > tensor([0.1429, 0.2727, 0.3913, 0.5000]) buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_gather_ranges_to_dense_op --print-passing-details c2 reference output > [array([[6, 5, 4, 3], [0, 0, 0, 0]], dtype=int64)] > PT converted output > [tensor([[6, 5, 4, 3], [0, 0, 0, 0]])] ``` Reviewed By: allwu, qizzzh Differential Revision: D23858329 fbshipit-source-id: ed37118ca7f09e1cd0ad1fdec3d37f66dce60dd9	2020-09-23 18:31:57 -07:00
Bugra Akyildiz	27c7158166	Remove __future__ imports for legacy Python2 supports (#45033 ) Summary: There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports: ```2to3 -f future -w caffe2``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033 Reviewed By: seemethere Differential Revision: D23808648 Pulled By: bugra fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38	2020-09-23 17:57:02 -07:00
Michael Suo	e9aa6898ab	Revert D23802296: gtest-ify JIT tests, through the letter c Test Plan: revert-hammer Differential Revision: D23802296 (`d2b045030e`) Original commit changeset: 20c9798a414e fbshipit-source-id: a28d56039ca404fe94ed7572f1febd1673e3e788	2020-09-23 17:42:19 -07:00
Nikita Shulga	89c570ed0a	Revert D23811085: gtestify dce and fuser tests Test Plan: revert-hammer Differential Revision: D23811085 (`246bd9422a`) Original commit changeset: 45008e41f239 fbshipit-source-id: 94c981f565cab9b710fe52a55bbe8dbf9c179c23	2020-09-23 17:27:59 -07:00
Alex Suhan	76c185dcca	[TensorExpr] When lanes differ, insert Broadcast instead of Cast (#45179 ) Summary: We need to check if dtypes differ in scalar type or lanes to decide between Cast and Broadcast. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45179 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.SimplifyBroadcastTermExpander Reviewed By: bwasti Differential Revision: D23873316 Pulled By: asuhan fbshipit-source-id: ca141be67e10c2b6c5f2ff9c11e42dcfc62ac620	2020-09-23 17:06:54 -07:00
Jerry Zhang	f93ead6d37	[quant][eagermode] Custom module support (#44835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44835 This is for feature parity with fx graph mode quantization Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D23745086 fbshipit-source-id: ae2fc86129f9896d5a9039b73006a4da15821307	2020-09-23 15:39:40 -07:00
Alex Suhan	0495998862	[TensorExpr] Disallow arithmetic binary operations on Bool (#44677 ) Summary: Arithmetic operations on Bool aren't fully supported in the evaluator. Moreover, such semantics can be implemented by the client code through insertion of explicit casts to widen and narrow to the desired types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44677 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.ExprDisallowBoolArithmetic python test/test_jit_fuser_te.py Reviewed By: agolynski Differential Revision: D23801412 Pulled By: asuhan fbshipit-source-id: fff5284e3a216655dbf5a9a64d1cb1efda271a36	2020-09-23 14:59:11 -07:00
Alex Suhan	8e0fc711f4	[TensorExpr] Remove unused EvalConstExpr function (#45180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45180 Test Plan: build Reviewed By: ezyang Differential Revision: D23877151 Pulled By: asuhan fbshipit-source-id: a5d4d211c1dc85e6f7045330606163a933b9474e	2020-09-23 14:55:27 -07:00
Yi Wang	2a1a51facb	Fix typos. (#45195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45195 Fix some typos in reducer class. ghstack-source-id: 112673443 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D23862399 fbshipit-source-id: 0dc69e5ea1fa7d33c85d1909b2216bcd1f579f6a	2020-09-23 14:51:15 -07:00
Michael Suo	246bd9422a	gtestify dce and fuser tests (#45055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45055 See https://github.com/pytorch/pytorch/pull/45018 for context. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23811085 Pulled By: suo fbshipit-source-id: 45008e41f2394d2ba319745b0340392e1b3d3172	2020-09-23 14:33:22 -07:00
Michael Suo	d2b045030e	gtest-ify JIT tests, through the letter c (#45020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45020 See https://github.com/pytorch/pytorch/pull/45018 for context. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23802296 Pulled By: suo fbshipit-source-id: 20c9798a414e9ba30869a862012cbdee0613c8b1	2020-09-23 14:28:45 -07:00
Wanchao Liang	3f89b779c4	[jit] allow submodule methods inference rule be different (#43872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43872 This PR allows the recursive scripting to have a separate submodule_stubs_fn to create its submodule with specific user provided rules. Fixes https://github.com/pytorch/pytorch/issues/43729 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D23430176 Pulled By: wanchaol fbshipit-source-id: 20530d7891ac3345b36f1ed813dc9c650b28d27a	2020-09-23 14:10:31 -07:00
Nick Gibson	9e206ee9f1	[NNC] Fix a bug in SplitWithMask when splitting multiple times (#45141 ) Summary: When doing a splitWithMask we only mask if the loop extent is not cleanly divide by the split factor. However, the logic does not simplify so any nontrivial loop extents will always cause a mask to be added, e.g. if the loop had been previously split. Unlike splitWithTail, the masks added by splitWithMask are always overhead and we don't have the analysis to optimize them out if they are unnecessary, so it's good to avoid inserting them if we can. The fix is just to simplify the loop extents before doing the extent calculation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45141 Reviewed By: ezyang Differential Revision: D23869170 Pulled By: nickgg fbshipit-source-id: 44686fd7b802965ca4f5097b0172a41cf837a1f5	2020-09-23 14:04:58 -07:00
Jerry Zhang	adb2b380ba	[quant][graphmode][fx] qconfig_dict support more types of configurations (#44856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44856 Support following format of qconfig_dict ```python qconfig_dict = { # optional, global config "": qconfig?, # optional, used for module and function types # could also be split into module_types and function_types if we prefer "object_type": [ (nn.Conv2d, qconfig?), (F.add, qconfig?), ..., ], # optional, used for module names "module_name": [ ("foo.bar", qconfig?) ..., ], # optional, matched in order, first match takes precedence "module_name_regex": [ ("foo.bar.conv[0-9]+", qconfig?) ..., ] # priority (in increasing order): global, object_type, module_name_regex, module_name # qconfig == None means fusion and quantization should be skipped for anything # matching the rule } ``` Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23751304 fbshipit-source-id: 5b98f4f823502b12ae2150c93019c7b229c49c50	2020-09-23 13:59:53 -07:00
Bradley Davis	21fabae47a	Remove expensive call to PyObject_GetAttrString in PyTorch_LookupSpecial (#44684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44684 The ad-hoc quantization benchmarking script in D23689062 recently highlighted that quantized ops were surprisingly slow after the introduction of support for custom ops in torch.fx in D23203204 (`f15e27265f`). Using strobelight, it's immediately clear that up to 66% of samples were seen in `c10::get_backtrace`, which is descends from `torch::is_tensor_and_apppend_overloaded -> torch::check_has_torch_function -> torch::PyTorch_LookupSpecial -> PyObject_HasAttrString -> PyObject_GetAttrString`. I'm no expert by any means so please correct any/all misinterpretation, but it appears that: - `check_has_torch_function` only needs to return a bool - `PyTorch_LookupSpecial` should return `NULL` if a matching method is not found on the object - in the impl of `PyTorch_LookupSpecial` the return value from `PyObject_HasAttrString` only serves as a bool to return early, but ultimately ends up invoking `PyObject_GetAttrString`, which raises, spawning the generation of a backtrace - `PyObject_FastGetAttrString` returns `NULL` (stolen ref to an empty py::object if the if/else if isn't hit) if the method is not found, anyway, so it could be used singularly instead of invoking both `GetAttrString` and `FastGetAttrString` - D23203204 (`f15e27265f`) compounded (but maybe not directly caused) the problem by increasing the number of invocations so, removing it in this diff and seeing how many things break :) before: strobelight: see internal section output from D23689062 script: ``` $ ./buck-out/gen/scripts/v/test_pt_quant_perf.par Sequential( (0): Quantize(scale=tensor([0.0241]), zero_point=tensor([60]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.017489388585090637, zero_point=68, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.010896682739257812 q 0.11908197402954102 ``` after: strobelight: see internal section output from D23689062 script: ``` $ ./buck-out/gen/scripts/v/test_pt_quant_perf.par Sequential( (0): Quantize(scale=tensor([0.0247]), zero_point=tensor([46]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.012683945707976818, zero_point=41, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.011141300201416016 q 0.022639036178588867 ``` which roughly restores original performance seen in P142370729 UPDATE: 9/22 mode/opt benchmarks ``` buck run //scripts/x:test_pt_quant_perf mode/opt Sequential( (0): Quantize(scale=tensor([0.0263]), zero_point=tensor([82]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.021224206313490868, zero_point=50, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.002968311309814453 q 0.5138928890228271 ``` with patch: ``` buck run //scripts/x:test_pt_quant_perf mode/opt Sequential( (0): Quantize(scale=tensor([0.0323]), zero_point=tensor([70]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.017184294760227203, zero_point=61, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.0026655197143554688 q 0.0064449310302734375 ``` Reviewed By: ezyang Differential Revision: D23697334 fbshipit-source-id: f756d744688615e01c94bf5c48c425747458fb33	2020-09-23 13:52:54 -07:00
Tim Nieradzik	99242eca1d	Dockerfile: Support CUDA 11 (#45071 ) Summary: Although PyTorch already supports CUDA 11, the Dockerfile still relies on CUDA 10. This pull request upgrades all the necessary versions such that recent NVIDIA GPUs like A100 can be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45071 Reviewed By: ezyang Differential Revision: D23873224 Pulled By: seemethere fbshipit-source-id: 822c25f183dcc3b4c5b780c00cd37744d34c6e00	2020-09-23 11:38:49 -07:00
Zino Benaissa	4d80c8c648	Fix inlining interface call in fork subgraph (#43790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43790 Interface calls were not handled properly when they are used in fork subgraph. This PR fixes this issue. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23402039 Pulled By: bzinodev fbshipit-source-id: 41adc5ee7d942250e732e243ab30e356d78d9bf7	2020-09-23 11:17:19 -07:00
Edward Yang	da4033d32a	Make cudaHostRegister actually useful on cudart. (#45159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45159 By default, pybind11 binds void* to be capsules. After a lot of Googling, I have concluded that this is not actually useful: you can't actually create a capsule from Python land, and our data_ptr() function returns an int, which means that the function is effectively unusable. It didn't help that we had no tests exercising it. I've replaced the void* with uintptr_t, so that we now accept int (and you can pass data_ptr() in directly). I'm not sure if we should make these functions accept ctypes types; unfortunately, pybind11 doesn't seem to have any easy way to do this. Fixes #43006 Also added cudaHostUnregister which was requested. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D23849731 Pulled By: ezyang fbshipit-source-id: 8a79986f3aa9546abbd2a6a5828329ae90fd298f	2020-09-23 11:05:44 -07:00
Taylor Robie	a5a4924c27	Warn if `import torch` is called from the source root. (#39995 ) Summary: This is a small developer quality of life improvement. I commonly try to run some snippet of python as I'm working on a PR and forget that I've cd-d into the local clone to run some git commands, resulting in annoying failures like: `ImportError: cannot import name 'default_generator' from 'torch._C' (unknown location)` This actually took a non-trivial amount of time to figure out the first time I hit it, and even now it's annoying because it happens just infrequently enough to not sit high in the mental cache. This PR adds a check to `torch/__init__.py` and warns if `import torch` is likely resolving to the wrong thing: ``` WARNING:root:You appear to be importing PyTorch from a clone of the git repo: /data/users/taylorrobie/repos/pytorch This will prevent `import torch` from resolving to the PyTorch install (instead it will try to load /data/users/taylorrobie/repos/pytorch/torch/__init__.py) and will generally lead to other failures such as a failure to load C extensions. ``` so that the soon to follow internal import failure makes some sense. I elected to make this a warning rather than an exception because I'm not 100% sure that it's always wrong. (e.g. weird `PYTHONPATH` or `importlib` corner cases.) EDIT: There are now separate cases for `cwd` vs. `PYTHONPATH`, and failure is an `ImportError`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39995 Reviewed By: malfet Differential Revision: D23817209 Pulled By: robieta fbshipit-source-id: d9ac567acb22d9c8c567a8565a7af65ac624dbf7	2020-09-23 10:55:08 -07:00
Ailing Zhang	9db3871288	Update true_divide_out to use at::. (#45079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45079 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23821701 Pulled By: ailzhang fbshipit-source-id: 562eac10faba7a503eda0029a0b026c1fb85fe1e	2020-09-23 10:50:48 -07:00
Nikita Shulga	9e30a76697	Filter `strtod_l` is undeclared errors from sccache log (#45183 ) Summary: This prevents DrCI from misidentifying test failures for the compilation failures, such as: ``` /var/lib/jenkins/workspace/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: error: use of undeclared identifier \'strtod_l\' return ((int*)(&strtod_l))[argc]; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45183 Reviewed By: ezyang Differential Revision: D23859267 Pulled By: malfet fbshipit-source-id: 283d9bd2ab712f23239b72f3758d121e2d026fb0	2020-09-23 09:49:49 -07:00
Ivan Yashchuk	5b20bf4fd9	Added support for complex input for Cholesky decomposition (#44895 ) Summary: Cholesky decomposition now works for complex inputs. Fixes https://github.com/pytorch/pytorch/issues/44637. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44895 Reviewed By: ailzhang Differential Revision: D23841583 Pulled By: anjali411 fbshipit-source-id: 3b1f34a7af17827884540696f8771a0d5b1df478	2020-09-23 08:25:56 -07:00
Shen Li	94c3cdd994	Let rpc._all_gather use default RPC timeout (#44983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44983 `_all_gather` was converted from `_wait_all_workers` and inherited its 5 seconds fixed timeout. As `_all_gather` meant to support a broader set of use cases, the timeout configuration should be more flexible. This PR makes `rpc._all_gather` use the global default RPC timeout. Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23794383 Pulled By: mrshenli fbshipit-source-id: 382f52c375f0f25c032c5abfc910f72baf4c5ad9	2020-09-23 08:06:09 -07:00
Martin Yuan	e5bade7b2c	[PyTorch Mobile] Move string op registrations to prim and make them selective (#44960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44960 Since we have templated selective build, it should be safe to move the operators to prim so that they can be selectively built in mobile Test Plan: CI Reviewed By: linbinyu Differential Revision: D23772025 fbshipit-source-id: 52cebae76e4df5a6b2b51f2cd82f06f75e2e45d0	2020-09-23 07:42:35 -07:00
Luca Wehrstedt	76dc50e9c8	[RPC] Infer backend type if only options are given (#45065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45065 To preserve backwards compatibility with applications that were passing in some ProcessGroupRpcBackendOptions but were not explicitly setting backend=BackendType.PROCESS_GROUP, we're here now inferring the backend type from the options if only the latter ones are passed. If neither are passed, we'll default to TensorPipe, as before this change. ghstack-source-id: 112586258 Test Plan: Added new unit tests. Reviewed By: pritamdamania87 Differential Revision: D23814289 fbshipit-source-id: f4be7919e0817a4f539a50ab12216dc3178cb752	2020-09-23 00:46:27 -07:00
Alex Suhan	215679573e	[TensorExpr] Fix operator order in combineMultilane (#45157 ) Summary: combineMultilane used the wrong order when ramp was on the left hand side, which matters for subtract. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45157 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.SimplifyRampSubBroadcast Reviewed By: ailzhang Differential Revision: D23851751 Pulled By: asuhan fbshipit-source-id: 864d1611e88769fb43327ef226bb3310017bf858	2020-09-22 23:50:47 -07:00
Supriya Rao	7fba30c2be	[quant][fx][bug] Fix error in convert step for QAT (#45050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45050 Update tests to actually test for QAT Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_linear Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23808022 fbshipit-source-id: d749ab2d215fe19238ff9d539307ffce9ef0ca9b	2020-09-22 22:48:31 -07:00
Xiang Gao	144dacd8d9	CUDA BFloat16 batched gemm (#45167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45167 Reviewed By: mruberry Differential Revision: D23860458 Pulled By: ngimel fbshipit-source-id: 698de424a046963a30017b58d227fa510f85bf3f	2020-09-22 22:43:52 -07:00
Nikita Shulga	989d877c95	[JIT] Do not allow creating generics with None types (#44958 ) Summary: Otherwise, invoking something like `python -c "import torch._C;print(torch._C.ListType(None))"` will result in SIGSEGV Discovered while trying to create a torch script for function with the following type annotation `Tuple[int, Ellipsis] -> None` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44958 Reviewed By: suo Differential Revision: D23799906 Pulled By: malfet fbshipit-source-id: 916a243007d13ed3e7a5b282dd712da3d66e3bf7	2020-09-22 21:50:40 -07:00
Jiakai Liu	0a9ac98bed	[reland][pytorch] refine dispatch keys in native_functions.yaml (1/N) (#45137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45137 Reland https://github.com/pytorch/pytorch/pull/45010 - which broke master due to merge conflict. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D23843510 Pulled By: ljk53 fbshipit-source-id: 28aabb9da533b6b806ab8779a0ee96b695e9e242	2020-09-22 21:44:55 -07:00
Zachary DeVito	25ed739ac9	[packaging] rstrip fix (#45166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45166 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D23852505 Pulled By: zdevito fbshipit-source-id: 6bb743b37333ae19fc24629686e8d06aef812c50	2020-09-22 21:23:47 -07:00
Zachary DeVito	cb75addee4	torch.package - a way to package models and code (#45015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45015 torch.package allows you to write packages of code, pickled python data, and arbitrary binary and text resources into a self-contained package. torch.package.PackageExporter writes the packages and torch.package.PackageImporter reads them. The importers can load this code in a hermetic way, such that code is loaded from the package rather than the normal python import system. This allows for the packaging of PyTorch model code and data so that it can be run on a server or used in the future for transfer learning. The code contained in packages is copied file-by-file from the original source when it is created, and the file format is a specially organized zip file. Future users of the package can unzip the package, and edit the code in order to perform custom modifications to it. The importer for packages ensures that code in the module can only be loaded from within the package, except for modules explicitly listed as external using :method:`extern_module`. The file `extern_modules` in the zip archive lists all the modules that a package externally depends on. This prevents "implicit" dependencies where the package runs locally because it is importing a locally-installed package, but then fails when the package is copied to another machine. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23824337 Pulled By: zdevito fbshipit-source-id: 1247c34ba9b656f9db68a83e31f2a0fbe3bea6bd	2020-09-22 21:21:21 -07:00
Rohan Varma	d4a634c209	[RPC profiling] Don't wrap toHere() calls with profiling (#44655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44655 Since `toHere()` does not execute operations over RPC and simply transfers the value to the local node, we don't need to enable the profiler remotely for this message. This causes unnecessary overhead and is not needed. Since `toHere` is a blocking call, we already profile the call on the local node using `RECORD_USER_SCOPE`, so this does not change the expected profiler results (validated by ensuring all remote profiling tests pass). ghstack-source-id: 112605610 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23641466 fbshipit-source-id: 109d9eb10bd7fe76122b2026aaf1c7893ad10588	2020-09-22 21:17:00 -07:00
Rohan Varma	70d2e4d1f6	[RPC profiling] Allow disableProfiler() to be called from another thread. (#44653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44653 This changes the profiler per a discussion with ilia-cher offline that enables `disableProfiler()` event consolidation logic to be called from different threads (i.e. threads where the profiler was not explicitly enabled). This is needed to support the functionality enabled by D23638387 where we defer profiling event collection until executing an async callback that can execute on a different thread, to support RPC async function profiling. This is done by introducing 2 flags `cleanupTLSState` and `consolidate` which controls whether we should clean up thread local settings (we don't do this when calling `disableProfiler()` on non-main threads) and whether we should consolidate all profiled events. Backwards compatiblity is ensured since both options are true by default. Added a test in `test_misc.cpp` to test this. ghstack-source-id: 112605620 Reviewed By: mrshenli Differential Revision: D23638499 fbshipit-source-id: f5bbb0d41ef883c5e5870bc27e086b8b8908f46b	2020-09-22 21:16:58 -07:00
Rohan Varma	1bd6533d60	Remove thread_local RecordFunctionGuard from profiler. (#44646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44646 Per a discussion with ilia-cher, this is not needed anymore and removing it would make some future changes to support async RPC profiling easier. Tested by ensuring profiling tests in `test_autograd.py` still pass. ghstack-source-id: 112605618 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23683998 fbshipit-source-id: 4e49a439509884fe04d922553890ae353e3331ab	2020-09-22 21:15:31 -07:00
Xiang Gao	67a19fecef	CUDA BFloat16 pooling (#45151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45151 Reviewed By: ailzhang Differential Revision: D23854056 Pulled By: ngimel fbshipit-source-id: 32f0835218c2602a09654a9ac2d161c4eb360f90	2020-09-22 20:19:25 -07:00
Michael Suo	666223df46	[jit] gtestify test_argument_spec.cpp (#45019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45019 See https://github.com/pytorch/pytorch/pull/45018 for context. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23802298 Pulled By: suo fbshipit-source-id: 0e36d095d4d81dcd5ebe6d56b3dc469d6d5482d0	2020-09-22 19:44:14 -07:00
Jerry Zhang	f575df201f	[quant][graphmode][jit][api] Expose preserved_attrs from finalize to convert_jit (#44490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44490 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D23631142 fbshipit-source-id: f0913f0cb4576067e2a7288326024942d12e0ae0	2020-09-22 19:37:25 -07:00
Meghan Lele	e045119956	[JIT] Add default arguments for class types (#45098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45098 Summary This commit adds support for default arguments in methods of class types. Similar to how default arguments are supported for regular script functions and methods on scripted modules, default values are retrieved from the definition of a TorchScript class in Python as Python objects, converted to IValues, and then attached to the schemas of already compiled class methods. Test Plan This commit adds a set of new tests to TestClassType to test default arguments. Fixes This commit fixes #42562. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23844769 Pulled By: SplitInfinity fbshipit-source-id: ceedff7703bf9ede8bd07b3abcb44a0f654936bd	2020-09-22 18:37:44 -07:00
Bram Wasti	ebde5a80bb	[tensorexpr] Add flag to fuse with unknown shapes (#44401 ) Summary: This flag simply allows users to get fusion groups that will eventually have shapes (such that `getOperation` is a valid). This is useful for doing early analysis and compiling just in time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44401 Reviewed By: ZolotukhinM Differential Revision: D23656140 Pulled By: bwasti fbshipit-source-id: 9a26c202752399d1932ad7d69f21c88081ffc1e5	2020-09-22 18:17:47 -07:00
Hao Lu	c0267c6845	[caffe2] Support data types in shape hints (#45110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45110 A recent change in DSNN quantizes the ad embedding to 8 bits. Ad embeddings are part of the inputs to the DSNN merge net. To correctly pass shape hints of input tensors including quantized ad embeddings, we need to be able to annotate the data types in shape hints. A bit on the corner cases, if type is omitted or not a valid type, e.g., white spaces, instead of throwing an exception, I decided to return the default type, float. Test Plan: ``` buck test caffe2/caffe2/fb/opt:shape_info_utils_test ``` Reviewed By: yinghai Differential Revision: D23834091 fbshipit-source-id: 5e072144a7a7ff4b5126b618062dfc4041851dd3	2020-09-22 17:49:33 -07:00
Daily, Jeff	b98ac20849	install ATen/native/cuda and hip headers (#45097 ) Summary: The ATen/native/cuda headers were copied to torch/include, but then not included in the final package. Further, add ATen/native/hip headers to the installation, as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45097 Reviewed By: mruberry Differential Revision: D23831006 Pulled By: malfet fbshipit-source-id: ab527928185faaa912fd8cab208733a9b11a097b	2020-09-22 17:43:47 -07:00
Nikita Shulga	2a37f3fd2f	Relax CUDA architecture check (#45130 ) Summary: NVIDIA GPUs are binary compatible within major compute capability revision This would prevent: "GeForce RTX 3080 with CUDA capability sm_86 is not compatible with the current PyTorch installation." messages from appearing, since CUDA-11 do not support code generation for sm_85. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45130 Reviewed By: ngimel Differential Revision: D23841556 Pulled By: malfet fbshipit-source-id: bcfc9e8da63dfe62cdec06909b6c049aaed6a18a	2020-09-22 17:26:47 -07:00
Jerry Zhang	ccfbfe5eb5	[quant][graphmode][fx] Custom module support (#44766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44766 There might be modules that are not symbolically traceable, e.g. LSTM (since it has input dependent control flows), to support quantization in these cases, user will provide the corresponding observed and quantized version of the custom module, the observed custom module with observers already inserted in the module and the quantized version will have the corresponding ops quantized. And use ``` from torch.quantization import register_observed_custom_module_mapping from torch.quantization import register_quantized_custom_module_mapping register_observed_custom_module_mapping(CustomModule, ObservedCustomModule) register_quantized_custom_module_mapping(CustomModule, QuantizedCustomModule) ``` to register the custom module mappings, we'll also need to define a custom delegate class for symbolic trace in order to prevent the custom module from being traced: ```python class CustomDelegate(DefaultDelegate): def is_leaf_module(self, m): return (m.__module__.startswith('torch.nn') and not isinstance(m, torch.nn.Sequential)) or \ isinstance(m, CustomModule) m = symbolic_trace(original_m, delegate_class=CustomDelegate) ``` Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D23723455 fbshipit-source-id: 50d666e29b94cbcbea5fb6bcc73b00cff87eb77a	2020-09-22 17:11:46 -07:00
James Reed	7f4a27be3a	[resubmit][FX] s/get_param/get_attr/ (#45147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45147 ghstack-source-id: 112605923 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23845096 fbshipit-source-id: 9ca209aa84cbaddd6e89c52b541e43b11197e2d5	2020-09-22 17:06:18 -07:00
hangjunxu	35cdb01327	[PyTorch] Enable type check for autocast_test_lists (#45107 ) Summary: This is a sub-task for addressing: https://github.com/pytorch/pytorch/issues/42969. We re-enable type check for `autocast_test_lists `. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45107 Test Plan: `python test/test_type_hints.py` passed: ``` (pytorch) bash-5.0$ with-proxy python test/test_type_hints.py .... ---------------------------------------------------------------------- Ran 4 tests in 103.871s OK ``` Reviewed By: walterddr Differential Revision: D23842884 Pulled By: Hangjun fbshipit-source-id: a39f3810e3abebc6b4c1cb996b06312f6d42ffd6	2020-09-22 16:54:26 -07:00
Meghan Lele	cddcfde81d	[JIT] Fix WithTest.test_with_exceptions (#45106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45106 Summary This commit fixes `WithTest.test_with_exceptions`. It's been running in regular Python this whole time; none of the functions created and invoked for the test were scripted. Fortunately, the tests still pass after being fixed. Test Plan Ran unit tests + continuous integration. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23848206 Pulled By: SplitInfinity fbshipit-source-id: fd975ee34db9441ef4e4a4abf2fb21298166bbaa	2020-09-22 16:31:17 -07:00
Kurt Mohler	d1c68a7069	Clarify that 5-D 'bilinear' grid_sample is actually trilinear (#45090 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41528 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45090 Reviewed By: ailzhang Differential Revision: D23841046 Pulled By: zou3519 fbshipit-source-id: 941770cd5b3e705608957739026e9113e5f0c616	2020-09-22 15:10:22 -07:00
James Reed	79fe794f87	[FX] Make Graphs immutable and make GraphModule recompile after assigning graph (#44830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44830 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23743850 Pulled By: jamesr66a fbshipit-source-id: 501b92a89ff636c26abeff13105a75462384554c	2020-09-22 15:02:11 -07:00
Eli Uriegas	def433bbb6	.circleci: Upgrade all xcode 9 workers to xcode 11 (#45153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45153 xcode 9 is being deprectated within circleci infra so we should get everything else on a more recent version of xcode Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D23852774 Pulled By: seemethere fbshipit-source-id: c02e162f1993d408de439fee21b340e9640e5a24	2020-09-22 14:57:43 -07:00
Viswesh Sankaran	a4ce3f4194	Fix type hint warnings for common_methods_invocations.py (#44971 ) Summary: Fixes a subtask of https://github.com/pytorch/pytorch/issues/42969 Tested the following and no warnings were seen. python test/test_type_hints.py .... ---------------------------------------------------------------------- Ran 4 tests in 180.759s OK Pull Request resolved: https://github.com/pytorch/pytorch/pull/44971 Reviewed By: walterddr Differential Revision: D23822274 Pulled By: visweshfb fbshipit-source-id: e3485021e348ee0a8508a9d128f04bad721795ef	2020-09-22 13:40:46 -07:00
Yanan Cao	c253b10154	Fix incorrect EnumValue serialization issue (#44891 ) Summary: Previously, `prim::EnumValue` is serialized to `ops.prim.EnumValue`, which doesn't have the right implementation to refine return type. This diff correctly serializes it to enum.value, thus fixing the issue. Fixes https://github.com/pytorch/pytorch/issues/44892 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44891 Reviewed By: malfet Differential Revision: D23818962 Pulled By: gmagogsfm fbshipit-source-id: 6edfdf9c4b932176b08abc69284a916cab10081b	2020-09-22 11:59:45 -07:00
Zafar	2b1f25885e	[quant] Fix ConvTranspose mapping (#44844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44844 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23746466 Pulled By: z-a-f fbshipit-source-id: cb84e0fef5ab82e8ed8dd118d9fb21ee7b480ef7	2020-09-22 11:59:42 -07:00
Daya Khudia	09aee06e82	[caffe2] Replace embedding conversion ops with fbgemm functions (#44843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44843 Replace perfkernels calls with fbgemm kernels to avoid code duplication ghstack-source-id: 112496292 Test Plan: CI Reviewed By: radkris-git Differential Revision: D23675519 fbshipit-source-id: 05c285a9eeb9ea109a04a78cb442a24ee40a4aec	2020-09-22 11:57:01 -07:00
Hong Xu	e2b40ce793	Support BFloat16 for binary logical operators on CUDA (#42485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42485 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23684423 Pulled By: mruberry fbshipit-source-id: edc2b46b726361d4c8bf8a4bf4e4a09197b20428	2020-09-22 11:42:34 -07:00
Mike Ruberry	ef885c10d8	[pytorch] Add triplet margin loss with custom distance (#43680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43680 As discussed [here](https://github.com/pytorch/pytorch/issues/43342), adding in a Python-only implementation of the triplet-margin loss that takes a custom distance function. Still discussing whether this is necessary to add to PyTorch Core. Test Plan: python test/run_tests.py Imported from OSS Reviewed By: albanD Differential Revision: D23363898 fbshipit-source-id: 1cafc05abecdbe7812b41deaa1e50ea11239d0cb	2020-09-22 11:35:52 -07:00
Ailing Zhang	10f287539f	Align casing in test_dispatch with dispatch keys. (#44933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44933 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23778247 Pulled By: ailzhang fbshipit-source-id: bc3725eae670b03543015afe763cb3bb16baf8f6	2020-09-22 10:50:08 -07:00
James Reed	1fd48a9d1f	Revert D23798016: [FX] s/get_param/get_attr/ Test Plan: revert-hammer Differential Revision: D23798016 (`c941dd3492`) Original commit changeset: 1d2f3db1994a fbshipit-source-id: 974d930064b37d396c5d66c905a63d45449813e5	2020-09-22 10:32:51 -07:00
Negin Raoof	8501b89a87	[ONNX] Update ort release (#45095 ) Summary: Update ort release Pull Request resolved: https://github.com/pytorch/pytorch/pull/45095 Reviewed By: bwasti Differential Revision: D23832041 Pulled By: malfet fbshipit-source-id: 39c47a87e451c4c43ba4d4e8be385cc195cc611a	2020-09-22 10:08:48 -07:00
Ailing Zhang	4b42f0b613	Support Math keyword in native_functions.yaml. (#44556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44556 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D23698386 Pulled By: ailzhang fbshipit-source-id: f10ea839a2cfe7d16f5823a75b8b8c5f1ae22dde	2020-09-22 10:00:40 -07:00
Elias Ellison	ae286d81e0	[JIT] improve alias analysis for list constructs (#39111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39111 In our present alias analysis, we consider any Value that enter another container as entering the heap, and thus aliasing all other heap values of the same type. There are a number of advantages to this approach: - it is not to hard to maintain the aliasDb implementation - it is much easier from an op schema perspective - there are many composite list ops registered internally and externally that would be tricky to register and get right if we did something more complicated - It limits the size of the AliasDb, because a container of size 10 only contains a single memory dag element instead of 10 elements. The downside is that we have are unable to handle the simple and extremely common case of a list of tensors being used in an ATen op. In an example like: ``` def foo(input): x = torch.tensor([1, 2, 3, 4]) y = [x, x] input.add_(1) return torch.cat(y) ``` we will consider x to be written to. any write to any wildcard element (an element that enters a tuple, an element that is taken from a list) will mark x as written to. This can be limiting for our ability to create a functional subset and fuse graphs - as a result, 4 of TorchVision classification models could not be functionalized. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23828003 Pulled By: eellison fbshipit-source-id: 9109fcb6f2ca20ca897cae71683530285da9d537	2020-09-22 09:38:59 -07:00
Himangshu	9fc7a942f0	Change from self to self.class() in _DecoratorManager to ensure a new object is every time a function is called recursively (#44633 ) Summary: Change from self to self._class_() in _DecoratorManager to ensure a new object is every time a function is called recursively Fixes https://github.com/pytorch/pytorch/issues/44531 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44633 Reviewed By: agolynski Differential Revision: D23783601 Pulled By: albanD fbshipit-source-id: a818664dee7bdb061a40ede27ef99e9546fc80bb	2020-09-22 09:13:39 -07:00
Nikita Shulga	63fd257879	Add `Ellipsis` constant to the list of recognized tokens (#44959 ) Summary: Per https://docs.python.org/3.6/library/constants.html > `Ellipsis` is the same as ellipsis literal `...` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44959 Reviewed By: suo Differential Revision: D23785660 Pulled By: malfet fbshipit-source-id: f68461849e7d16ef68042eb96566f2c936c06b0f	2020-09-22 09:05:25 -07:00
albanD	e155fbe915	add warning when ParameterList/Dict is used with DataParallel (#44405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44405 Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D23783987 Pulled By: albanD fbshipit-source-id: 5018b0d381cb09301d2f88a98a910854f740ace1	2020-09-22 08:58:00 -07:00
Rong Rong	4a0aa69a66	Fix undefined variable 'namedshape' in tensor.py (#45085 ) Summary: Hot Fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/45085 Reviewed By: malfet, seemethere Differential Revision: D23824444 Pulled By: walterddr fbshipit-source-id: c9f37b394d281b7ef44b14c30699bb7510a362a7	2020-09-22 08:52:47 -07:00
Brandon Lin	36ec8f8fb8	[dper3] Create dper LearningRate low-level module (#44639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44639 As title; this will unblock migration of several modules that need learning rate functionality. Test Plan: ``` buck test //dper3/dper3/modules/low_level_modules/tests:learning_rate_test ``` Reviewed By: yf225 Differential Revision: D23681733 fbshipit-source-id: 1d98cb35bf6a4ff0718c9cb6abf22401980b523c	2020-09-22 08:26:07 -07:00
anjali411	58b6ab69e5	torch.sgn for complex tensors (#39955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39955 resolves https://github.com/pytorch/pytorch/issues/36323 by adding `torch.sgn` for complex tensors. `torch.sgn` returns `x/abs(x)` for `x != 0` and returns `0 + 0j` for `x==0` This PR doesn't test the correctness of the gradients. It will be done as a part of auditing all the ops in future once we decide the autograd behavior (JAX vs TF) and add gradchek. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23460526 Pulled By: anjali411 fbshipit-source-id: 70fc4e14e4d66196e27cf188e0422a335fc42f92	2020-09-22 08:24:53 -07:00
Bugra Akyildiz	1b059f2c6d	Directly use work.result() to retrieve tensor rather than passing as a separate argument (#44914 ) Summary: We currently are fetching an allreduced tensor from Python in C++ in, where we are storing the resulting tensor in a struct's parameter. This PR removes extra tensor paratemeter in the function parameter and fetch from a single place. Fixes https://github.com/pytorch/pytorch/issues/43960 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44914 Reviewed By: rohan-varma Differential Revision: D23798888 Pulled By: bugra fbshipit-source-id: ad1b8c31c15e3758a57b17218bbb9dc1f61f1577	2020-09-22 06:28:47 -07:00
Luca Wehrstedt	71aeb84ab4	Revert D23803951: [pytorch] refine dispatch keys in native_functions.yaml (1/N) Test Plan: revert-hammer Differential Revision: D23803951 (`339961187a`) Original commit changeset: aaced7c34427 fbshipit-source-id: fcc4fb6a2c1d79b587f62347b43f8851fe1647fd	2020-09-22 05:41:59 -07:00
Jiakai Liu	339961187a	[pytorch] refine dispatch keys in native_functions.yaml (1/N) (#45010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45010 The motivation of this change is to differentiate "backend specific" ops and "generic" ops. "backend specific" ops are those invoking backend specific kernels thus only able to run on certain backends, e.g.: CPU, CUDA. "generic" ops are those not directly invoking backend specific kernels. They are usually calling other "backend specific" ops to get things done. Thus, they are also referred to as "composite" ops, or "math" ops (because they are usually pure C++ code constructed from math formula). The other way to see the difference is that: we have to implement new kernels for the "backend specific" ops if we want to run these ops on a new backend. In contrast, "generic"/"composite" ops can run on the new backend if we've added support for all the "backend specific" ops to which they delegate their work. Historically we didn't make a deliberate effort to always populate supported backends to the "dispatch" section for all the "backend specific" ops in native_functions.yaml. So now there are many ops which don't have "dispatch" section but are actually "backend specific" ops. Majority of them are calling "DispatchStub" kernels, which usually only support CPU/CUDA (via TensorIterator) or QuantizedCPU/CUDA. The ultimate goal is to be able to differentiate these two types of ops by looking at the "dispatch" section in native_functions.yaml. This PR leveraged the analysis script on #44963 to populate missing dispatch keys for a set of "backend specific" ops. As the initial step, we only deal with the simplest case: * These ops don't already have dispatch section in native_functions.yaml; * These ops call one or more DispatchStub (thus "backend specific"); * These ops don't call any other aten ops - except for some common ones almost every op calls via framework, e.g. calling aten::eq via Dispatcher::checkSchemaCompatibility. Calling other nontrivial aten ops is a sign of being "composite", so we don't want to deal with this case now; * These ops don't call Tensor::is_quantized() / Tensor::is_sparse() / etc. Some ops call thse Tensor::is_XXX() methods to dispatch to quantized / sparse kernels internally. We don't deal with this case now. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23803951 Pulled By: ljk53 fbshipit-source-id: aaced7c34427d1ede72380af4513508df366ea16	2020-09-22 03:20:01 -07:00
vfdev-5	c947ab0bb9	Added sparse support for asin and neg functions, updated log1p (#44028 ) Summary: Description: - [x] added C++ code for sparse `asin` and `neg` ops similarly to `log1p` op - [x] added tests - [x] coalesced input CPU/CUDA - [x] uncoalesced input CPU/CUDA - [x] added tests for `negative` and `arcsin` Backprop will be addressed in another PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44028 Reviewed By: agolynski Differential Revision: D23793027 Pulled By: mruberry fbshipit-source-id: 5fd642808da8e528cf6acd608ca0dcd720c4ccc3	2020-09-22 02:04:38 -07:00
Tao Xu	d126a0d4fd	[iOS] Disable the iOS nightly build until the cert issue has resolved (#45094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45094 Test Plan: Imported from OSS Reviewed By: husthyc Differential Revision: D23831152 Pulled By: xta0 fbshipit-source-id: 6327edba01e4d5abad63ac35680eefb22276423f	2020-09-22 01:47:41 -07:00
Jerry Zhang	5aed75b21b	[quant][graphmode][jit] Try to support append (#44641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44641 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D23682356 fbshipit-source-id: 09a03dfde0b1346a5764e8e28ba56e32b343d239	2020-09-21 23:13:56 -07:00
Gao, Xiang	2111ec3bf3	CUDA BFloat16 losses (#45011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45011 Reviewed By: mruberry Differential Revision: D23805840 Pulled By: ngimel fbshipit-source-id: 3eb60d4367c727100763879e20e9df9d58bf5ad6	2020-09-21 22:51:17 -07:00
Hector Yuen	32c1a8c79f	adjust shape inference in sls tests (#44936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44936 need to provide max sequence size and max element size instead of total added a check that onnxifi was succesful Test Plan: sls tests Reviewed By: yinghai Differential Revision: D23779437 fbshipit-source-id: 5048d6536ca00f0a3b0b057c4e2cf6584b1329d6	2020-09-21 22:09:55 -07:00
Ksenija Stanojevic	0dda65ac77	[ONNX] add jit pass for lists (#43820 ) Summary: Add jit preprocessing pass for adding int lists. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43820 Reviewed By: albanD Differential Revision: D23674598 Pulled By: bzinodev fbshipit-source-id: 35766403a073e202563bba5251c07efb7cc5cfb1	2020-09-21 22:05:25 -07:00
Shen Li	09e7f62ce2	Fix RPC and ProcessGroup GIL deadlock (#45088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45088 Fixes #45082 Found a few problems while working on #44983 1. We deliberately swallow RPC timeouts during shutdown, as we haven't found a good way to handle those. When we convert `_wait_all_workers` into `_all_gather`, the same logic was inherited. However, as `_all_gather` meant to be used in more general scenarios, we should no longer keep silent about errors. This commit let the error throw in `_all_gather` and also let `shutdown()` to catch them and log. 2. After fixing (1), I found that `UnpickledPythonCall` needs to acquire GIL on destruction, and this can lead to deadlock when used in conjuction with `ProcessGroup`. Because `ProcessGroup` ctor is a synchronization point which holds GIL. In `init_rpc`, followers (`rank != 0`) can exit before the leader (`rank == 0`). If the two happens together, we could get a) on a follower, it exits `init_rpc` after running `_broadcast_to_followers` and before the reaching dtor of `UnpickledPythonCall`. Then it runs the ctor of `ProcessGroup`, which holds the GIL and wait for the leader to join. However, the leader is waiting for the response from `_broadcast_to_followers`, which is blocked by the dtor of `UnpickledPythonCall`. And hence the deadlock. This commit drops the GIL in `ProcessGroup` ctor. 3. After fixing (2), I found that `TensorPipe` backend nondeterministically fails with `test_local_shutdown`, due to a similar reason as (2), but this time it is that `shutdown()` on a follower runs before the leader finishes `init_rpc`. This commit adds a join for `TensorPipe` backend `init_rpc` after `_all_gather`. The 3rd one should be able to solve the 2nd one as well. But since I didn't see a reason to hold GIL during `ProcessGroup` ctor, I made that change too. Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23825592 Pulled By: mrshenli fbshipit-source-id: 94920f2ad357746a6b8e4ffaa380dd56a7310976	2020-09-21 21:47:27 -07:00
Ivan Kobzarev	dfc88d4fd0	[vulkan] support dimensions negative indexing (#45068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45068 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D23816081 Pulled By: IvanKobzarev fbshipit-source-id: bda753f3f216dac7c05b6f728a3bd6068e5d06a0	2020-09-21 21:24:16 -07:00
Ivan Kobzarev	5621ba87a2	[vulkan] reshape op to use infer_size to expand -1 (#45104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45104 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D23834249 Pulled By: IvanKobzarev fbshipit-source-id: 0e3699d6a4227788d1d634349c0bf259c0ad5e8d	2020-09-21 21:08:59 -07:00
lixinyu	8968030f19	[WIP] Add vec256 test to linux CI (#44912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44912 This is to add vec256 test into linux CI system. The whole test will last 50 to 70 seconds. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D23772923 Pulled By: glaringlee fbshipit-source-id: ef929b53f3ea7894abcd9510a8e0389979cab4a2	2020-09-21 21:00:29 -07:00
Hong Xu	4b3046ed28	Vectorize int8_t on CPU (#44759 ) Summary: int8_t is not vectorized in vec256_int.h. This PR adds vectorization for int8_t. As pointed out in https://github.com/pytorch/pytorch/issues/43033, this is an important type for vectorization because a lot of images are loaded in this data type. Related issue: https://github.com/pytorch/pytorch/issues/43033 Benchmark (Debian Buster, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz, Turbo off, Release build): ```python import timeit dtype = 'torch.int8' for op in ('+', '-'): for n, t in [(10_000, 200000), (100_000, 20000)]: print(f'a {op} b, numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit(f'c = a {op} b', setup=f'import torch; a = torch.arange(1, {n}, dtype={dtype}); b = torch.arange({n}, 1, -1, dtype={dtype})', number=t)) ``` Results: Before: ``` a + b, numel() == 10000 for 200000 times, dtype=torch.int8 1.2223373489978258 a + b, numel() == 100000 for 20000 times, dtype=torch.int8 0.6108450189931318 a - b, numel() == 10000 for 200000 times, dtype=torch.int8 1.256775538000511 a - b, numel() == 100000 for 20000 times, dtype=torch.int8 0.6101213909860235 ``` After: ``` a + b, numel() == 10000 for 200000 times, dtype=torch.int8 0.5713336059998255 a + b, numel() == 100000 for 20000 times, dtype=torch.int8 0.39169703199877404 a - b, numel() == 10000 for 200000 times, dtype=torch.int8 0.5838428330025636 a - b, numel() == 100000 for 20000 times, dtype=torch.int8 0.37486923701362684 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44759 Reviewed By: malfet Differential Revision: D23786383 Pulled By: glaringlee fbshipit-source-id: 67f5bcd344c0b5014bacbc876143231fca156713	2020-09-21 19:55:13 -07:00
Lin.Sung	f77ba0e48c	Change typo 'momemtum' to 'momentum' (#45045 ) Summary: As the title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45045 Reviewed By: mruberry Differential Revision: D23808563 Pulled By: mrshenli fbshipit-source-id: ca818377f4c23d67b037c146fef667ab8731961e	2020-09-21 19:03:26 -07:00
Lingyi Liu	20f52cdd76	[hpc]optimize the torch.cat cuda kernel (#44833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44833 Current cat cuda kernel employs the pin memory to pass the tensor data. 1) It is much slower than passing through argument using constant memory 2) the H2D sometimes overlaps with other H2D in training, and thus generates some random delay and leads to desync issue. For small N, we actually saw 2X improvements. Test Plan: benchmark ``` ./buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/pt/cat_test.par --tag_filter all --device cuda ``` ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1,1,1)_N2_dim0_cuda # Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 38.825 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 45.440 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(128,1024,2)_N2_dim1_cuda # Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 38.765 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim0_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 60.075 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1025,1023,2)_N2_dim1_cuda # Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 65.203 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim2_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 83.941 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f0d50fc2440>,111,65]_N5_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f0d50fc2440>, 111, 65], N: 5, dim: 0, device: cuda Forward Execution Time (us) : 51.059 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[96,<function<lambda>at0x7f0d50fc2b90>,64]_N5_dim1_cuda # Input: sizes: [96, <function <lambda> at 0x7f0d50fc2b90>, 64], N: 5, dim: 1, device: cuda Forward Execution Time (us) : 42.134 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[128,64,<function<lambda>at0x7f0b22b7e3b0>]_N5_dim2_cuda # Input: sizes: [128, 64, <function <lambda> at 0x7f0b22b7e3b0>], N: 5, dim: 2, device: cuda Forward Execution Time (us) : 78.333 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f0b22b7e5f0>,32,64]_N50_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f0b22b7e5f0>, 32, 64], N: 50, dim: 0, device: cuda Forward Execution Time (us) : 77.065 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[32,<function<lambda>at0x7f0b22b7e680>,64]_N50_dim1_cuda # Input: sizes: [32, <function <lambda> at 0x7f0b22b7e680>, 64], N: 50, dim: 1, device: cuda Forward Execution Time (us) : 74.632 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[33,65,<function<lambda>at0x7f0b22b7e710>]_N50_dim2_cuda # Input: sizes: [33, 65, <function <lambda> at 0x7f0b22b7e710>], N: 50, dim: 2, device: cuda Forward Execution Time (us) : 81.846 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda # Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 99.291 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda # Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda Forward Execution Time (us) : 114.060 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda # Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda Forward Execution Time (us) : 478.777 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f0b22b7e7a0>]_N100_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f0b22b7e7a0>], N: 100, dim: 0, device: cuda Forward Execution Time (us) : 80.165 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f0b22b7e830>]_N1000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f0b22b7e830>], N: 1000, dim: 0, device: cuda Forward Execution Time (us) : 491.983 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f0b22b7e8c0>]_N2000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f0b22b7e8c0>], N: 2000, dim: 0, device: cuda Forward Execution Time (us) : 966.613 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f0b22b7e950>]_N3000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f0b22b7e950>], N: 3000, dim: 0, device: cuda Forward Execution Time (us) : 1500.133 ``` After optimization ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1,1,1)_N2_dim0_cuda # Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 22.168 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 33.430 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(128,1024,2)_N2_dim1_cuda # Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 19.884 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim0_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 48.082 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1025,1023,2)_N2_dim1_cuda # Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 53.261 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim2_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 71.294 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f837a135200>,111,65]_N5_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f837a135200>, 111, 65], N: 5, dim: 0, device: cuda Forward Execution Time (us) : 40.165 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[96,<function<lambda>at0x7f837a135950>,64]_N5_dim1_cuda # Input: sizes: [96, <function <lambda> at 0x7f837a135950>, 64], N: 5, dim: 1, device: cuda Forward Execution Time (us) : 32.666 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[128,64,<function<lambda>at0x7f82e50e2440>]_N5_dim2_cuda # Input: sizes: [128, 64, <function <lambda> at 0x7f82e50e2440>], N: 5, dim: 2, device: cuda Forward Execution Time (us) : 67.003 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f82e50e24d0>,32,64]_N50_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f82e50e24d0>, 32, 64], N: 50, dim: 0, device: cuda Forward Execution Time (us) : 67.035 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[32,<function<lambda>at0x7f82e50e2560>,64]_N50_dim1_cuda # Input: sizes: [32, <function <lambda> at 0x7f82e50e2560>, 64], N: 50, dim: 1, device: cuda Forward Execution Time (us) : 63.803 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[33,65,<function<lambda>at0x7f82e50e25f0>]_N50_dim2_cuda # Input: sizes: [33, 65, <function <lambda> at 0x7f82e50e25f0>], N: 50, dim: 2, device: cuda Forward Execution Time (us) : 69.969 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda # Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 98.327 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda # Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda Forward Execution Time (us) : 112.363 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda # Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda Forward Execution Time (us) : 478.224 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f82e50e2680>]_N100_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f82e50e2680>], N: 100, dim: 0, device: cuda Forward Execution Time (us) : 63.269 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f82e50e2710>]_N1000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f82e50e2710>], N: 1000, dim: 0, device: cuda Forward Execution Time (us) : 470.141 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f82e50e27a0>]_N2000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f82e50e27a0>], N: 2000, dim: 0, device: cuda Forward Execution Time (us) : 966.668 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f82e50e2830>]_N3000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f82e50e2830>], N: 3000, dim: 0, device: cuda Forward Execution Time (us) : 1485.309 ``` Reviewed By: ngimel Differential Revision: D23727275 fbshipit-source-id: 171275ac541c649f7aeab0a2f8f0fea9486d0180	2020-09-21 18:38:25 -07:00
Nikita Shulga	81bb19c9f0	[JIT] Prohibit subscripted assignments for tuple types (#44929 ) Summary: This would force jit.script to raise an error if someone tries to mutate tuple ``` Tuple[int, int] does not support subscripted assignment: File "/home/nshulga/test/tupleassignment.py", line 9 torch.jit.script def foo(x: Tuple[int, int]) -> int: x[-1] = x[0] + 1 ~~~~~ <--- HERE ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44929 Reviewed By: suo Differential Revision: D23777668 Pulled By: malfet fbshipit-source-id: 8efaa4167354ffb4930ccb3e702736a3209151b6	2020-09-21 16:35:44 -07:00
Ivan Kobzarev	9a31eee107	[vulkan] Remove duplication of op registration and clean unused vars (#44932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44932 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D23778203 Pulled By: IvanKobzarev fbshipit-source-id: d1bc0a5c2cdd711d8a4cd983154a4f6774987674	2020-09-21 15:57:32 -07:00
Gao, Xiang	dfb8f2d51f	CUDA BFloat16 addmm, addmv (#44986 ) Summary: This PR was originally authored by slayton58. I steal his implementation and added some tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44986 Reviewed By: mruberry Differential Revision: D23806039 Pulled By: ngimel fbshipit-source-id: 305d66029b426d8039fab3c3e011faf2bf87aead	2020-09-21 14:28:27 -07:00
Xiang Gao	581a364437	CUDA BFloat16 unary ops part 1 (#44813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44813 Reviewed By: mruberry Differential Revision: D23805816 Pulled By: ngimel fbshipit-source-id: 28c645dc31f094c8b6c3d3803f0b4152f0475a64	2020-09-21 14:22:31 -07:00
ahassan@azavea.com	1cab27d485	Add a torch.hub.load_local() function that can load models from any local directory with a hubconf.py (#44204 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43622 - Moves the model loading part of `torch.hub.load()` into a new `torch.hub.load_local()` function that takes in a path to a local directory that contains a `hubconf.py` instead of a repo name. - Refactors `torch.hub.load()` so that it now calls `torch.hub.load_local()` after downloading and extracting the repo. - Updates `torch.hub` docs to include the new function + minor fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44204 Reviewed By: malfet Differential Revision: D23817429 Pulled By: ailzhang fbshipit-source-id: 788fd83c87a94f487b558715b2809d346ead02b2	2020-09-21 14:17:21 -07:00
James Reed	c941dd3492	[FX] s/get_param/get_attr/ (#45000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45000 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D23798016 Pulled By: jamesr66a fbshipit-source-id: 1d2f3db1994a62b95d0ced03bf958e54d30c35dd	2020-09-21 14:09:32 -07:00
Nikita Vedeneev	9dc2bcdc07	Introducing (Const)StridedRandomAccessor + CompositeRandomAccessor + migrate `sort` to ATen (CPU) (#39744 ) Summary: This PR introduces a (Const)StridedRandomAccessor, a [random access iterator](https://en.cppreference.com/w/cpp/named_req/RandomAccessIterator) over a strided array, and a CompositeRandomAccessor, a random access iterator over two random access iterators. The main motivation is to be able to use a handful of operations from STL and thrust in numerous dim-apply types of algorithms and eliminate unnecessary buffer allocations. Plus more advanced algorithms are going to be available with C++17. Porting `sort` provides a hands-on example of how these iterators could be used. Fixes [https://github.com/pytorch/pytorch/issues/24770](https://github.com/pytorch/pytorch/issues/24770). Some benchmarks: ```python from IPython import get_ipython torch.manual_seed(13) ipython = get_ipython() sizes = [ [10000, 10000], [1000, 1000, 100] ] for size in sizes: t = torch.randn(*size) dims = len(size) print(f"Tensor of size {size}") for dim in range(dims): print(f"sort for dim={dim}") print("float:") ipython.magic("timeit t.sort(dim)") print() ``` #### Master ``` Tensor of size [10000, 10000] sort for dim=0 float: 10.7 s ± 201 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sort for dim=1 float: 6.27 s ± 50.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Tensor of size [1000, 1000, 100] sort for dim=0 float: 7.21 s ± 23.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sort for dim=1 float: 6.1 s ± 21.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sort for dim=2 float: 3.58 s ± 27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` #### This PR ``` Tensor of size [10000, 10000] sort for dim=0 float: 10.5 s ± 209 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sort for dim=1 float: 6.16 s ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Tensor of size [1000, 1000, 100] sort for dim=0 float: 5.94 s ± 60.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sort for dim=1 float: 5.1 s ± 11.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sort for dim=2 float: 3.43 s ± 8.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` As you can see, the legacy sorting routine is actually quite efficient. The performance gain is likely due to the improved reduction with TensorIterator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39744 Reviewed By: malfet Differential Revision: D23796486 Pulled By: glaringlee fbshipit-source-id: 7bddad10dfbc0a0e5cad7ced155d6c7964e8702c	2020-09-21 13:24:58 -07:00
Michael Suo	7118d53711	add .cache to gitignore (#45017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45017 this is the default indexing folder for clangd 11. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23817619 Pulled By: suo fbshipit-source-id: 6a60136e591b2fec3d432ac5343cb76ac0934502	2020-09-21 12:51:35 -07:00
Zafar	1a580c1021	Adding test to quantized copy for 'from float' (#43681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43681 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D23364507 Pulled By: z-a-f fbshipit-source-id: ef1b00937b012b0647d9b9afa054437f2bce032a	2020-09-21 12:38:59 -07:00
Anthony Scopatz	7de512ced8	nightly robustness fixes for linking across devices (#43771 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43761 CC rgommers ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/43771 Reviewed By: glaringlee Differential Revision: D23819835 Pulled By: malfet fbshipit-source-id: a3be2780c4b8bdbf347d456c4d14df863c2ff8c2	2020-09-21 12:32:32 -07:00
Michael Suo	42af2c7923	[jit] gtest-ify test_alias_analysis.cpp (#45018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45018 Now that https://github.com/pytorch/pytorch/pull/44795 has landed, we can convert the bulk of our cpp tests to use gtest APIs. Eventually we'll want to get rid of our weird harness for cpp tests entirely in favor of using regular gtest everywhere. This PR demonstrates some of the benefits of this approach: 1. You don't need to register your test twice (once to define it, once in tests.h). 2. Consequently, it's easier to have many individual test cases. Failures can be reported independently (rather than having huge functions to test entire modules. 3. Some nicer testing APIs, notably test fixtures. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23802297 Pulled By: suo fbshipit-source-id: 774255da7716294ac573747dcd5e106e5fe3ac8f	2020-09-21 12:19:37 -07:00
Ailing Zhang	92f8f75c59	Add alias dispatch key Math. (#44354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44354 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23591481 Pulled By: ailzhang fbshipit-source-id: 6e93c4ec99a07f3fc920ba2d09dc222e6ced5adf	2020-09-21 11:10:39 -07:00
gunandrose4u	acc2a1e5fa	Update submodule gloo (#45025 ) Summary: Including commits to fix Windows CI failure of enable distributed training on Windows PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/45025 Reviewed By: beauby Differential Revision: D23807995 Pulled By: mrshenli fbshipit-source-id: a2f4c1684927ca66d7d3e9920ecb588fb4386f7c	2020-09-21 10:28:37 -07:00
gunandrose4u	a4aba1d465	fix compile error (#45052 ) Summary: Update vulkanOptimizeForMobile function invoking in optimize_for_mobile.cc to align latest call contract in PR https://github.com/pytorch/pytorch/pull/44903. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45052 Reviewed By: malfet Differential Revision: D23814953 Pulled By: mrshenli fbshipit-source-id: 0fa844a8291e952715b9de35cdec0e411c42b7f9	2020-09-21 10:23:49 -07:00
Lucas Hosseini	ac8c7c4e9f	Make Channel API accept buffer structs rather than raw pointers. (#45014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45014 Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/219 Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/212 + Introduce buffer.h defining the buffer struct(s). The `CpuBuffer` struct is always defined, while the `CudaBuffer` struct is defined only when `TENSORPIPE_SUPPORTS_CUDA` is true. + Update all channels to take a `CpuBuffer` or `CudaBuffer` for `send`/`recv` rather than a raw pointer and a length. + Make the base `Channel`/`Context` classes templated on `TBuffer`, effectively creating two channel hierarchies (one for CPU channels, one for CUDA channels). + Update the Pipe and the generic channel tests to use the new API. So far, generic channel tests are CPU only, and tests for the CUDA IPC channel are (temporarily) disabled. A subsequent PR will take care of refactoring tests so that generic tests work for CUDA channels. An other PR will add support for CUDA tensors in the Pipe. Differential Revision: D23598033 Test Plan: Imported from OSS Reviewed By: lw Pulled By: beauby fbshipit-source-id: 1d6c3f91e288420858835cd5e7962e8da051b44b	2020-09-21 10:18:45 -07:00
Nick Gibson	4bbb6adff5	[NNC] fix SyncThreads insertion and reenable CudaSharedMem test (#44909 ) Summary: A previous fix for masking Cuda dimensions (https://github.com/pytorch/pytorch/issues/44733) changed the behaviour of inserting thread synchronization barriers in the Cuda CodeGen, causing the CudaSharedMemReduce_1 to be flaky and ultimately disabled. The issue is working out where these barriers must be inserted - solving this optimally is very hard, and I think not possible without dependency analysis we don't have, so I've changed our logic to be quite pessimistic. We'll insert barriers before and after any blocks that have thread dimensions masked (even between blocks that have no data dependencies). This should be correct, but it's an area we could improve performance. To address this somewhat I've added a simplifier pass that removes obviously unnecessary syncThreads. To avoid this test being flaky again, I've added a check against the generated code to ensure there is a syncThread in the right place. Also fixed a couple of non-functional but clarity issues in the generated code: fixed the missing newline after Stores in the CudaPrinter, and prevented the PrioritizeLoad mutator from pulling out loads contained within simple Let statements (such as those produced by the Registerizer). Pull Request resolved: https://github.com/pytorch/pytorch/pull/44909 Reviewed By: agolynski Differential Revision: D23800565 Pulled By: nickgg fbshipit-source-id: bddef1f40d8d461da965685f01d00b468d8a2c2f	2020-09-21 09:27:22 -07:00
Rong Rong	e2f49c8437	skip im2col & vol2col in cpu/cuda convolution methods (#44600 ) Summary: this fixes https://github.com/pytorch/pytorch/issues/44482. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44600 Reviewed By: ngimel Differential Revision: D23733483 Pulled By: walterddr fbshipit-source-id: 90e188027ef6bb08588619b6629110b5f73d63e3	2020-09-21 09:20:23 -07:00
Gregory Chanan	a6895d43b6	Turn on gradgrad check for BCELoss Criterion Tests. (#44894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44894 Looks like we added double backwards support but only turned on the ModuleTests. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23762544 Pulled By: gchanan fbshipit-source-id: b5cef579608dd71f3de245c4ba92e49216ce8a5e	2020-09-21 07:14:22 -07:00
Kaushik Ram Sadagopan	4810365576	Enabled torch.testing._internal.jit_utils.* typechecking. (#44985 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/44985 Reviewed By: malfet Differential Revision: D23794444 Pulled By: kauterry fbshipit-source-id: 9893cc91780338a8223904fb574efa77fa3ab2b9	2020-09-21 01:19:06 -07:00
anjali411	9f67176b82	Complex gradcheck logic (#43208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43208 This PR adds gradcheck for complex. The logic used for complex gradcheck is described in Section 3.5.3 here: https://arxiv.org/pdf/1701.00392.pdf More concretely, this PR introduces the following changes: 1. Updates get_numerical_jacobian to take as input a scalar value for vector (v). Adds gradcheck logic for C -> C, C-> R, R -> C. For R -> C functions, only the real value of gradient is propagated. 2. Adds backward definition for `torch.complex` and also adds a test to verify the definition added. 3. Updates backward for `mul`, `sin`, `cos`, `sinh`, `cosh`. 4. Adds tests for all `torch.real`, `torch.imag`, `torch.view_as_real`, `torch.view_as_complex`, `torch.conj`. Follow up tasks: 1. Add more thorough tests for R -> C cases. Specifically, add R->C test variants for functions. for e.g., `torch.mul(complex_tensor, real_tensor)` 2. Add back commented test in `common_methods_invocation.py`. 3. Add more special case checking for complex gradcheck to make debugging easier. 4. Update complex autograd note. 5. disable complex autograd for operators not tested for complex. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23655088 Pulled By: anjali411 fbshipit-source-id: caa75e09864b5f6ead0f988f6368dce64cf15deb	2020-09-20 22:05:04 -07:00
Peter Bell	da7863f46b	Add one dimensional FFTs to torch.fft namespace (#43011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43011 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23751850 Pulled By: mruberry fbshipit-source-id: 8dc5fec75102d8809eeb85a3d347ba1b5de45b33	2020-09-19 23:32:22 -07:00
Hong Xu	49db7b59e0	For logical tests, use the dtypes decorator (#42483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42483 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23684424 Pulled By: mruberry fbshipit-source-id: ba7ab5c3a6eaa0c16975728200f27d164ed4f852	2020-09-19 19:01:49 -07:00
Mike Ruberry	60709ad1bf	Adds multiply and divide aliases (#44463 ) Summary: These alias are consistent with NumPy. Note that C++'s naming would be different (std::multiplies and std::divides), and that PyTorch's existing names (mul and div) are consistent with Python's dunders. This also improves the instructions for adding an alias to clarify that dispatch keys should be removed when copying native_function.yaml entries to create the alias entries. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44463 Reviewed By: ngimel Differential Revision: D23670782 Pulled By: mruberry fbshipit-source-id: 9f1bdf8ff447abc624ff9e9be7ac600f98340ac4	2020-09-19 15:47:52 -07:00
Xiang Gao	faef89c89f	CUDA BFloat Pooling (#44836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44836 Reviewed By: mruberry Differential Revision: D23800992 Pulled By: ngimel fbshipit-source-id: 2945a27874345197cbd1d8a4fbd20816afc02c86	2020-09-19 15:43:36 -07:00
Xiang Gao	7ecfaef7ec	CUDA BFloat16 layernorm (#45002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45002 Reviewed By: mruberry Differential Revision: D23800931 Pulled By: ngimel fbshipit-source-id: cc213d02352907a3e945cd9fffd1de29e355a16c	2020-09-19 15:36:03 -07:00
Vasiliy Kuznetsov	2163d31016	histogram observer: ensure buffer shape consistency (#44956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44956 Makes buffer shapes for HistogramObserver have the same shapes in uninitialized versus initialized states. This is useful because the detectron2 checkpointer assumes that these states will stay the same, so it removes the need for manual hacks around the shapes changing. Test Plan: ``` python test/test_quantization.py TestObserver.test_histogram_observer_consistent_buffer_shape ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23785382 fbshipit-source-id: 1a83fd4f39b244b00747c368d5d305a07d877c92	2020-09-19 09:29:39 -07:00
Bert Maher	0714c003ee	[pytorch][tensorexpr] Make gtest-style macros in tests match actual gtest signatures (#44861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44861 We were redefining things like ASSERT_EQ to take a _VA_ARGS_ parameter, so compiling these files with gtest (instead of pytorch's custom python-based cpp test infra) fails. Test Plan: buck build //caffe2/test/cpp/tensorexpr Reviewed By: asuhan Differential Revision: D23711293 fbshipit-source-id: 8af14fa7c1f1e8169d14bb64515771f7bc3089e5	2020-09-19 07:25:05 -07:00
Jiakai Liu	9e5045e978	[pytorch] clean up normalized_dynamic_type() hack (#44889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44889 This HACK doesn't seem to be necessary any more - there is no 'real' type in generated Declarations.yaml file. Verified by comparing generated code before/after. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23761624 Pulled By: ljk53 fbshipit-source-id: de996f04d77eebea3fb9297dd90a8ebeb07647bb	2020-09-18 23:49:46 -07:00
Xiao Wang	d75c402755	Add cusolver to build, rewrite MAGMA inverse with cusolver (#42403 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42265 This PR adds cusolver to the pytorch build, and enables the use of cusolver/cublas library functions on GPU `torch.inverse` on certain tensor shapes. Specifically, when * the tensor is two dimensional (single batch), or * has >2 dimensions (multiple batches) and `batch_size <= 2`, or * magma is not linked, cusolver/cublas will be used. In other conditions, the current implementation of MAGMA will still be used. `8c0949ae45/aten/src/ATen/native/cuda/BatchLinearAlgebra.cu (L742-L752)` The reason for this is that for tensors with large batch_size, `cublasXgetrfBatched` and `cublasXgetriBatched` doesn't perform very well. For `batch_size > 1`, we launch cusolver functions in multiple streams. This lets cusolver functions run in parallel, and can greatly increase the performance. When `batch_size > 2`, the parallel launched cusolver functions are slightly slower than the current magma implementation, so we still use the current magma impl. On CUDA 9.2, there were some numerical issues detected, so cusolver impl will not be used. The cusolver impl will also not be used on platforms other than Nvidia CUDA. `060769feaf/aten/src/ATen/native/cuda/BatchLinearAlgebraLib.h (L10-L13)` Note that there is a new heuristic used before cusolver/cublas calls here: `8c0949ae45/aten/src/ATen/native/cuda/MiscUtils.h (L113-L121)` where `use_loop_launch = true` means launch single batch cusolver functions in parallel, and `use_loop_launch = false` means use cublas_X_batched functions. When magma is enabled (only `batch_size <= 2` will be dispatched to cusolver/cublas), the heuristic will always return `true` and the cusolver calls are faster than small batch_size magma calls. When magma is disabled, this adds the functionality of `torch.inverse`, which was disabled before for all shapes (though large batch_size cublas performance may not be as well as magma). Checklist: - [X] Add benchmark, cpu, gpu-before (magma), gpu-after (cusolver) - [X] Rewrite single inverse (ndim == 2) with cusolver - [X] Rewrite batched inverse (ndim > 2) with cublas - [X] Add cusolver to build - [x] Clean up functions related to `USE_MAGMA` define guard - [x] Workaround for non-cuda platform - [x] Workaround for cuda 9.2 - [x] Add zero size check - [x] Add tests Next step: If cusolver doesn't cause any problem in pytorch build, and there are no major performance regressions reported after this PR being merged, I will start porting other cusolver/cublas functions for linear algebra to improve the performance. <details> <summary> benchmark 73499c6 </summary> benchmark code: https://github.com/xwang233/code-snippet/blob/master/torch.inverse/inverse-cusolver.ipynb shape meaning: * `[] 2 torch.float32 -> torch.randn(2, 2, dtype=torch.float32)` * `[2] 4 torch.float32 -> torch.randn(2, 4, 4, dtype=torch.float32)` \| shape \| cpu_time (ms) \| gpu_time_before (magma) (ms) \| gpu_time_after (ms) \| \| --- \| --- \| --- \| --- \| \| [] 2 torch.float32 \| 0.095 \| 7.534 \| 0.129 \| \| [] 4 torch.float32 \| 0.009 \| 7.522 \| 0.129 \| \| [] 8 torch.float32 \| 0.011 \| 7.647 \| 0.138 \| \| [] 16 torch.float32 \| 0.075 \| 7.582 \| 0.135 \| \| [] 32 torch.float32 \| 0.073 \| 7.573 \| 0.191 \| \| [] 64 torch.float32 \| 0.134 \| 7.694 \| 0.288 \| \| [] 128 torch.float32 \| 0.398 \| 8.073 \| 0.491 \| \| [] 256 torch.float32 \| 1.054 \| 11.860 \| 1.074 \| \| [] 512 torch.float32 \| 5.218 \| 14.130 \| 2.582 \| \| [] 1024 torch.float32 \| 19.010 \| 18.780 \| 6.936 \| \| [1] 2 torch.float32 \| 0.009 \| 0.113 \| 0.128 *regressed \| \| [1] 4 torch.float32 \| 0.009 \| 0.113 \| 0.131 regressed \| \| [1] 8 torch.float32 \| 0.011 \| 0.116 \| 0.129 regressed \| \| [1] 16 torch.float32 \| 0.015 \| 0.122 \| 0.135 regressed \| \| [1] 32 torch.float32 \| 0.032 \| 0.177 \| 0.178 regressed \| \| [1] 64 torch.float32 \| 0.070 \| 0.420 \| 0.281 \| \| [1] 128 torch.float32 \| 0.328 \| 0.816 \| 0.490 \| \| [1] 256 torch.float32 \| 1.125 \| 1.690 \| 1.084 \| \| [1] 512 torch.float32 \| 4.344 \| 4.305 \| 2.576 \| \| [1] 1024 torch.float32 \| 16.510 \| 16.340 \| 6.928 \| \| [2] 2 torch.float32 \| 0.009 \| 0.113 \| 0.186 regressed \| \| [2] 4 torch.float32 \| 0.011 \| 0.115 \| 0.184 regressed \| \| [2] 8 torch.float32 \| 0.012 \| 0.114 \| 0.184 regressed \| \| [2] 16 torch.float32 \| 0.019 \| 0.119 \| 0.173 regressed \| \| [2] 32 torch.float32 \| 0.050 \| 0.170 \| 0.240 regressed \| \| [2] 64 torch.float32 \| 0.120 \| 0.429 \| 0.375 \| \| [2] 128 torch.float32 \| 0.576 \| 0.830 \| 0.675 \| \| [2] 256 torch.float32 \| 2.021 \| 1.748 \| 1.451 \| \| [2] 512 torch.float32 \| 9.070 \| 4.749 \| 3.539 \| \| [2] 1024 torch.float32 \| 33.655 \| 18.240 \| 12.220 \| \| [4] 2 torch.float32 \| 0.009 \| 0.112 \| 0.318 regressed \| \| [4] 4 torch.float32 \| 0.010 \| 0.115 \| 0.319 regressed \| \| [4] 8 torch.float32 \| 0.013 \| 0.115 \| 0.320 regressed \| \| [4] 16 torch.float32 \| 0.027 \| 0.120 \| 0.331 regressed \| \| [4] 32 torch.float32 \| 0.085 \| 0.173 \| 0.385 regressed \| \| [4] 64 torch.float32 \| 0.221 \| 0.431 \| 0.646 regressed \| \| [4] 128 torch.float32 \| 1.102 \| 0.834 \| 1.055 regressed \| \| [4] 256 torch.float32 \| 4.042 \| 1.811 \| 2.054 regressed \| \| [4] 512 torch.float32 \| 18.390 \| 4.884 \| 5.087 regressed \| \| [4] 1024 torch.float32 \| 69.025 \| 19.840 \| 20.000 *regressed \| </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/42403 Reviewed By: ailzhang, mruberry Differential Revision: D23717984 Pulled By: ngimel fbshipit-source-id: 54cbd9ea72a97989cff4127089938e8a8e29a72b	2020-09-18 20:43:29 -07:00
Natalia Gimelshein	620c999979	update gloo submodule (#45008 ) Summary: Revert accidental gloo submodule changes in https://github.com/pytorch/pytorch/issues/41977 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45008 Reviewed By: malfet Differential Revision: D23799892 Pulled By: ngimel fbshipit-source-id: e8dab244c6abad32ed60efe3c26cab40837e57c8	2020-09-18 19:02:36 -07:00
Rong Rong	21a1b9c7cf	skip more nccl tests that causes flaky timeouts on rocm build (#44996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44996 Reviewed By: malfet Differential Revision: D23797564 Pulled By: walterddr fbshipit-source-id: 4d60f76bb8ae54bb04a9f4143a68623933461b2a	2020-09-18 18:53:47 -07:00
Nikita Shulga	1c15452703	Update Windows builders to latest VS2019 (#44746 ) Summary: Restore https://github.com/pytorch/pytorch/issues/44706, which should workaround VC compiler crash, which was reverted by https://github.com/pytorch/pytorch/issues/41977 Update configs to use ":stable" Windows images Pull Request resolved: https://github.com/pytorch/pytorch/pull/44746 Reviewed By: walterddr Differential Revision: D23793682 Pulled By: malfet fbshipit-source-id: bfdc36c35b920f58798a18c15642ec7efc68f00e	2020-09-18 18:46:44 -07:00
Ivan Kobzarev	e9941a5dd4	[vulkan][py] torch.utils.optimize_for_vulkan (#44903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44903 Test Plan: Imported from OSS Reviewed By: kimishpatel Differential Revision: D23766039 Pulled By: IvanKobzarev fbshipit-source-id: dbdf484ee7d3a7719aab105efba51b92ebc51568	2020-09-18 18:20:11 -07:00
Shawn Wu	572f7e069c	Enable type check for torch.testing._internal.te_utils.* (#44927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44927 Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D23776842 Pulled By: sshawnwu fbshipit-source-id: 65c028169a37e1f2f7d9fdce8a958234ee1caa26	2020-09-18 18:09:15 -07:00
James Reed	043466f978	[FX] Pass module's qualname to is_leaf_module (#44966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44966 Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D23790360 Pulled By: jamesr66a fbshipit-source-id: 7ef569fd93646584b27af7a615fa69c8d8bbdd3b	2020-09-18 17:02:33 -07:00
Nikita Shulga	40c09cfe14	[CircleCI] Fix CUDA test setup (#44982 ) Summary: Circle updated windows-nvidia-2019:canary image to exclude VC++ 14.26 Update the config to use 14.27 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44982 Reviewed By: seemethere Differential Revision: D23794116 Pulled By: malfet fbshipit-source-id: f3281f7d51acae4a4d06cecff01100fa77bd81ff	2020-09-18 16:20:24 -07:00
Gao, Xiang	e255a4e1fd	Enable bfloat16 random kernels on Windows (#44918 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44918 Reviewed By: pbelevich Differential Revision: D23777548 Pulled By: ngimel fbshipit-source-id: 9cf13166d7deba17bc72e402b82ed0afe347cb9b	2020-09-18 15:55:32 -07:00
Gao, Xiang	06389406bb	CUDA BFloat activations 1 (#44834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44834 Reviewed By: mruberry Differential Revision: D23752660 Pulled By: ngimel fbshipit-source-id: 209a937e8a9afe12b7dd86ecfa493c9417fd22fb	2020-09-18 15:48:49 -07:00
Andrew Gallagher	76a109c930	[caffe2/aten] Fix clang build (#44934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44934 Fix build errors when using clang to build cuda sources: ``` In file included from aten/src/ATen/native/cuda/DistributionBernoulli.cu:4: In file included from aten/src/ATen/cuda/CUDAApplyUtils.cuh:5: caffe2/aten/src/THC/THCAtomics.cuh:321:1: error: control reaches end of non-void function [-Werror,-Wreturn-type] } ^ 1 error generated when compiling for sm_70. In file included from aten/src/ATen/native/cuda/DistributionBernoulli.cu:4: In file included from aten/src/ATen/cuda/CUDAApplyUtils.cuh:5: caffe2/aten/src/THC/THCAtomics.cuh:321:1: error: control reaches end of non-void function [-Werror,-Wreturn-type] } ^ 1 error generated when compiling for sm_60. In file included from aten/src/ATen/native/cuda/DistributionBernoulli.cu:4: In file included from aten/src/ATen/cuda/CUDAApplyUtils.cuh:5: caffe2/aten/src/THC/THCAtomics.cuh:321:1: error: control reaches end of non-void function [-Werror,-Wreturn-type] } ^ 1 error generated when compiling for sm_52. ``` Test Plan: CI Reviewed By: ngimel Differential Revision: D23775266 fbshipit-source-id: 141e6624e2da870a8c50ff9f71fcf0717222fb17	2020-09-18 15:22:09 -07:00
Peter Bell	fd4e21c91e	Add optional string support to native_functions schema (#43010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43010 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23751851 Pulled By: mruberry fbshipit-source-id: 648f7430e1b7311eff28421f38e01f52d998fcbd	2020-09-18 14:57:24 -07:00
Lingyi Liu	2d884f2263	Optimize Scale function (#44913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44913 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18322 Optimize Scale function i-am-not-moving-c2-to-c10 Test Plan: buck test mode/dbg caffe2/caffe2/python/operator_test:weighted_sum_test Reviewed By: BIT-silence Differential Revision: D14575780 fbshipit-source-id: db333a7964581dcaff6e432ff1d6b517ba1a075f	2020-09-18 14:31:33 -07:00
Michael Suo	374e9373b5	[jit] Pull (most) tests out of libtorch_python (#44795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44795 Today, we build our cpp tests twice, once as a standalone gtest binary, and once linked in `libtorch_python` so we can call them from `test_jit.py`. This is convenient (it means that `test_jit.py` is a single entry point for all our tests), but has a few drawbacks: 1. We can't actually use the gtest APIs, since we don't link gtest into `libtorch_python`. We're stuck with the subset that we want to write polyfills for, and an awkward registration scheme where you have to write a test then include it in `tests.h`). 2. More seriously, we register custom operators and classes in these tests. In a world where we may be linking many `libtorch_python`s, this has a tendency to cause errors with `libtorch`. So now, only tests that explicitly require cooperation with Python are built into `libtorch_python`. The rest are built into `build/bin/test_jit`. There are tests which require that we define custom classes and operators. In these cases, I've built thm into separate `.so`s that we call `torch.ops.load_library()` on. Test Plan: Imported from OSS Reviewed By: SplitInfinity, ZolotukhinM Differential Revision: D23735520 Pulled By: suo fbshipit-source-id: d146bf4e7eb908afa6f96b394e4d395d63ad72ff	2020-09-18 14:04:40 -07:00
Lucas Hosseini	af3fc9725d	Extract rpc/tensorpipe_utils.{cpp,h} from rpc/utils.{cpp,h} (#44803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44803 Test Plan: CI Reviewed By: lw Differential Revision: D23732022 fbshipit-source-id: 5b839c7997bbee162a14d03414ee32baabbc8ece	2020-09-18 13:51:43 -07:00
wuyangz	d22dd80128	Enable type check for torch.testing._internal.common_device_type. (#44911 ) Summary: This PR intends to fix the type exceptions in common_device_type.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44911 Reviewed By: walterddr Differential Revision: D23768397 Pulled By: wuyangzhang fbshipit-source-id: 053692583b4d6169b0eb5ffe0c3d30635c0db699	2020-09-18 13:42:11 -07:00
Ailing Zhang	a47e3697ab	Use iterator of DispatchKeySet. (#44682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44682 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23698387 Pulled By: ailzhang fbshipit-source-id: 4fa140db9254c2c9c342bf1c8dfd952469b0b779	2020-09-18 13:34:27 -07:00
Richard Zou	6d312132e1	Beef up vmap docs and expose to master documentation (#44825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44825 Test Plan: - build and view docs locally. Reviewed By: ezyang Differential Revision: D23742727 Pulled By: zou3519 fbshipit-source-id: f62b7a76b5505d3387b7816c514c086c01089de0	2020-09-18 13:26:25 -07:00
Sam Estep	c2cf6efd96	Enable type check for torch.testing._internal.dist_utils.* (#44832 ) Summary: Addresses a sub-task of https://github.com/pytorch/pytorch/issues/44752. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44832 Reviewed By: malfet Differential Revision: D23744260 Pulled By: samestep fbshipit-source-id: 46aede57b4fa66a770d5df382b0aea2bd6772b9b	2020-09-18 12:50:48 -07:00
Xiang Gao	7bd8a6913d	CUDA BFloat div, addcdiv, addcmul, mean, var (#44758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44758 Reviewed By: mruberry Differential Revision: D23752317 Pulled By: ngimel fbshipit-source-id: 77992cf991f4e2b4b6839de73ea7e6ce2e1061c6	2020-09-18 11:51:11 -07:00
Nick Gibson	f175830558	[NNC] Fuse identical conditions in simplifier (#44886 ) Summary: Adds a pass to the IR Simplifier which fuses together the bodies of Cond statements which have identical conditions. e.g. ``` if (i < 10) { do_thing_1; } else { do_thing_2; } if (i < 10) { do_thing_3; } ``` is transformed into: ``` if (i < 10) { do_thing_1; do_thing_3; } else { do_thing_2; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44886 Reviewed By: glaringlee Differential Revision: D23768565 Pulled By: nickgg fbshipit-source-id: 3fe40d91e82bdfff8dcb8c56a02a4fd579c070df	2020-09-18 11:38:03 -07:00
Louis Feng	09f2c6a94c	Back out "Revert D23494065: Refactor CallbackManager as a friend class of RecordFunction." (#44699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44699 Original commit changeset: 3b1ec928e3db Previous revert (D23698861) was on the wrong diff stack. Backing out the revert. Test Plan: Passed unit tests and previously landed. Reviewed By: mruberry Differential Revision: D23702258 fbshipit-source-id: 5c3e197bca412f454db5a7e86251ec85faf621c1	2020-09-18 11:08:27 -07:00
Yanan Cao	174cbff00a	Improve sugared value's error message (#42889 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/42889 Improve sugared value's error message I think most (if not all) cases where this code path is reached can be attributed to closing over a global variable. Improving error message to make this clearer to users. close https://github.com/pytorch/pytorch/issues/41288 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42889 Reviewed By: SplitInfinity Differential Revision: D23779347 Pulled By: gmagogsfm fbshipit-source-id: ced702a96234040f79eb16ad998d202e360d6654	2020-09-18 11:01:40 -07:00
shubhambhokare1	0063512a4b	[ONNX] Updates to diagnostic tool to find missing ops (#44124 ) Summary: Moved description of tool and changes in function name Pull Request resolved: https://github.com/pytorch/pytorch/pull/44124 Reviewed By: albanD Differential Revision: D23674618 Pulled By: bzinodev fbshipit-source-id: 5db0bb14fc106fc96358b1e0590f08e975388c6d	2020-09-18 10:32:30 -07:00
Yi Wang	c68cc78299	Add a device parameter to RemoteModule (#44254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44254 Add a device parameter to RemoteModule, so it can be placed on any device and not just CPU. Original PR issue: RemoteModule enhancements #40550 Test Plan: buck test test/distributed/rpc:process_group_agent -- RemoteModule Reviewed By: pritamdamania87 Differential Revision: D23483803 fbshipit-source-id: 4918583c15c6a38a255ccbf12c9168660ab7f6db	2020-09-18 10:31:03 -07:00
Bugra Akyildiz	cff0e57c31	Remove Incorrect Comment in tools/build_libtorch and remove Python2 support in the module import (#44888 ) Summary: Fixes #{44293} and removes Python2 imports from MNIST download module as Python2 is not being supported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44888 Reviewed By: agolynski Differential Revision: D23785579 Pulled By: bugra fbshipit-source-id: d9380502380876282008dd2d5feb92a446648982	2020-09-18 10:03:36 -07:00
Gregory Chanan	07b7e44ed1	Stop using check_criterion_jacobian. (#44786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44786 This predates gradcheck and gradcheck does the same and more. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23731902 Pulled By: gchanan fbshipit-source-id: 425fd30e943194f63a663708bada8960265b8f05	2020-09-18 07:04:57 -07:00
Gregory Chanan	6d178f6b8e	Stop ignoring errors in cuda nn module tests. (#44783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44783 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23731778 Pulled By: gchanan fbshipit-source-id: 32df903a9e36bbf3f66645ee2d77efa5ed6ee429	2020-09-18 07:03:41 -07:00
Peter Bell	df39c40054	Cleanup tracer handling of optional arguments (#43009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43009 * #43009 Cleanup tracer handling of optional arguments Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23766621 Pulled By: mruberry fbshipit-source-id: c1b46cd23b58b18ef4c03021b2514d7e692badb6	2020-09-18 06:54:09 -07:00
Peter Bell	caea1adc35	Complex support for stft and istft (#43886 ) Summary: Ref https://github.com/pytorch/pytorch/issues/42175, fixes https://github.com/pytorch/pytorch/issues/34797 This adds complex support to `torch.stft` and `torch.istft`. Note that there are really two issues with complex here: complex signals, and returning complex tensors. ## Complex signals and windows `stft` currently assumes all signals are real and uses `rfft` with `onesided=True` by default. Similarly, `istft` always takes a complex fourier series and uses `irfft` to return real signals. For `stft`, I now allow complex inputs and windows by calling the full `fft` if either are complex. If the user gives `onesided=True` and the signal is complex, then this doesn't work and raises an error instead. For `istft`, there's no way to automatically know what to do when `onesided=False` because that could either be a redundant representation of a real signal or a complex signal. So there, the user needs to pass the argument `return_complex=True` in order to use `ifft` and get a complex result back. ## stft returning complex tensors The other issue is that `stft` returns a complex result, represented as a `(... X 2)` real tensor. I think ideally we want this to return proper complex tensors but to preserver BC I've had to add a `return_complex` argument to manage this transition. `return_complex` defaults to false for real inputs to preserve BC but defaults to True for complex inputs where there is no BC to consider. In order to `return_complex` by default everywhere without a sudden BC-breaking change, a simple transition plan could be: 1. introduce `return_complex`, defaulted to false when BC is an issue but giving a warning. (this PR) 2. raise an error in cases where `return_complex` defaults to false, making it a required argument. 3. change `return_complex` default to true in all cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43886 Reviewed By: glaringlee Differential Revision: D23760174 Pulled By: mruberry fbshipit-source-id: 2fec4404f5d980ddd6bdd941a63852a555eb9147	2020-09-18 01:39:47 -07:00
Saif Ul Islam	e400150c3b	Fixed for caffe2/opt/tvm_transformer.cc (#44249 ) Summary: Fixes #https://github.com/pytorch/pytorch/issues/41706 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44249 Reviewed By: gmagogsfm Differential Revision: D23752331 Pulled By: SplitInfinity fbshipit-source-id: 1d7297e080bc1e065129259e406af7216f3f0665	2020-09-18 00:03:59 -07:00
Xiang Gao	f2b3480795	CUDA BFloat softmax (#44837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44837 Reviewed By: glaringlee Differential Revision: D23767981 Pulled By: ngimel fbshipit-source-id: be92c25a1b66ed50a52e090db167079def6f6b39	2020-09-17 21:52:47 -07:00
Xiao Wang	1694fde7eb	Fix a GroupNorm cuda bug when input does not require_grad (#44863 ) Summary: Fix https://discuss.pytorch.org/t/illegal-memory-access-when-i-use-groupnorm/95800 `dX` is a Tensor, comparing `dX` with `nullptr` was wrong. cc BIT-silence who wrote the kernel. The test couldn't pass with `rtol=0` and `x.requires_grad=True`, so I have to update that to `1e-5`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44863 Reviewed By: mruberry Differential Revision: D23754101 Pulled By: BIT-silence fbshipit-source-id: 2eb0134dd489480e5ae7113a7d7b84629104cd49	2020-09-17 19:01:28 -07:00
Rohan Varma	5dbcbea265	TorchScript with record_function (#44345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44345 As part of enhancing profiler support for RPC, when executing TorchScript functions over RPC, we would like to be able to support user-defined profiling scopes created by `with record_function(...)`. Since after https://github.com/pytorch/pytorch/pull/34705, we support `with` statements in TorchScript, this PR adds support for `with torch.autograd.profiler.record_function` to be used within TorchScript. This can be accomplished via the following without this PR: ``` torch.opts.profiler._record_function_enter(...) # Script code, such as forward pass torch.opts.profiler._record_function_exit(....) ``` This is a bit hacky and it would be much cleaner to use the context manager now that we support `with` statements. Also, `_record_function_` type operators are internal operators that are subject to change, this change will help avoid BC issues in the future. Tested with `python test/test_jit.py TestWith.test_with_record_function -v` ghstack-source-id: 112320645 Test Plan: Repro instructions: 1) Change `def script_add_ones_return_any(x) -> Any` to `def script_add_ones_return_any(x) -> Tensor` in `jit/rpc_test.py` 2) `buck test mode/dev-nosan //caffe2/test/distributed/rpc:process_group_agent -- test_record_function_on_caller_rpc_async --print-passing-details` 3) The function which ideally should accept `Future[Any]` is `def _call_end_callbacks_on_future` in `autograd/profiler.py`. python test/test_jit.py TestWith.test_with_foo -v Reviewed By: pritamdamania87 Differential Revision: D23332074 fbshipit-source-id: 61b0078578e8b23bfad5eeec3b0b146b6b35a870	2020-09-17 18:45:00 -07:00
Jiakai Liu	4a9c80e82e	[pytorch][bot] update mobile op deps (#44854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44854 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23751925 Pulled By: ljk53 fbshipit-source-id: 8e1905091bf3abaac20d97182eb88f96e905ffc2	2020-09-17 18:33:13 -07:00
Yuxin Wu	9a007ba4cb	[jit] stop parsing the block after seeing exit statements (#44870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44870 fix https://github.com/pytorch/pytorch/issues/44864 Test Plan: buck test mode/dev-nosan //caffe2/test:jit -- 'test_assert_is_script' Reviewed By: eellison Differential Revision: D23755094 fbshipit-source-id: ca3f8b27dc6f9dc9364a22a1bce0e2f588ed4308	2020-09-17 18:09:16 -07:00
James Reed	60ae6c9c18	[FX] Fix GraphModule copy methods not regenerating forward (#44806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44806 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23738732 Pulled By: jamesr66a fbshipit-source-id: 14e13551c6568c562f3f789b6274b6c86afefd0b	2020-09-17 17:14:38 -07:00
Yanli Zhao	e14b2080be	[reland] move rebuild buckets from end of first iteration to beginning of second iteration (#44798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44798 [test all] Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well. Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration ghstack-source-id: 112279261 ghstack-source-id: 112279261 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D23735185 fbshipit-source-id: c26e0efeecb3511640120faa1122a2c856cd694e	2020-09-17 17:10:21 -07:00
Nikita Shulga	2043fbdfb6	Enable torch.backends.cuda typechecking in CI (#44916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44916 Reviewed By: walterddr Differential Revision: D23769844 Pulled By: malfet fbshipit-source-id: 3be3616fba9e2f9c6d89cc71d5f0d24ffcc45cf2	2020-09-17 15:31:38 -07:00
Alex Suhan	18b77d7d17	[TensorExpr] Add Mod support to the LLVM backend (#44823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44823 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseMod_LLVM Reviewed By: glaringlee Differential Revision: D23761996 Pulled By: asuhan fbshipit-source-id: c3c5b2fe0d989dec04f0152ce47c5cae35ed19c9	2020-09-17 15:25:42 -07:00
BowenBao	e535fb3f7d	[ONNX] Enable true_divide scripting export with ONNX shape inference (#43991 ) Summary: Fixes the `true_divide` symbolic to cast tensors correctly. The logic depends on knowing input types at export time, which is a known gap for exporting scripted modules. On that end we are improving exporter by enabling ONNX shape inference https://github.com/pytorch/pytorch/issues/40628, and starting to increase coverage for scripting support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43991 Reviewed By: mruberry Differential Revision: D23674614 Pulled By: bzinodev fbshipit-source-id: 1b1b85340eef641f664a14c4888781389c886a8b	2020-09-17 14:38:24 -07:00
Jane (Yuan) Xu	1c996b7170	Enable typechecking for torch.testing._internal.common_quantized.* (#44805 ) Summary: Addresses a subproblem of [Issue 42969](https://github.com/pytorch/pytorch/issues/42969) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44805 Reviewed By: malfet Differential Revision: D23742754 Pulled By: janeyx99 fbshipit-source-id: e916a6a0c049cac318549a485d47f19363087d15	2020-09-17 14:24:32 -07:00
Alex Suhan	f5b92332c1	[TensorExpr] Fix order comparisons for unsigned types (#44857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44857 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMCompareSelectByte*_LLVM Reviewed By: glaringlee Differential Revision: D23762162 Pulled By: asuhan fbshipit-source-id: 1553429bd2d5292ccda57910326b8c70e4e6ab88	2020-09-17 14:16:54 -07:00
Hong Xu	a153eafab7	Let logspace support bfloat16 on both CPU and CUDA (#44675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44675 Reviewed By: ngimel Differential Revision: D23710801 Pulled By: mruberry fbshipit-source-id: 12d8e56f41bb635b500e89aaaf5df86a1795eb72	2020-09-17 14:13:55 -07:00
Kurt Mohler	40e44c5f0a	Make nuclear and frobenius norm non-out depend on out variants (#44095 ) Summary: Part of https://github.com/pytorch/pytorch/issues/24802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44095 Reviewed By: ngimel Differential Revision: D23735893 Pulled By: mruberry fbshipit-source-id: bd1264b6a8e7f9220033982b0118aa962991ca88	2020-09-17 14:11:31 -07:00
Jongsoo Park	086a2e7a4e	[caffe2] add cost inference for FusedFakeQuantFC and FusedFakeQuantFCGradient (#44840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44840 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44762 Move CostInferenceForFCGradient to fc_inference.cc/h to be used in multiple .cc files. Test Plan: CI Reviewed By: qizzzh Differential Revision: D23714877 fbshipit-source-id: d27f33e270a93b0e053f2af592dc4a24e35526cd	2020-09-17 14:07:17 -07:00
Nikita Shulga	4066022146	Do not use `PRId64` in torch/csrc (#44767 ) Summary: Instead use `fmt::format()` or `%lld` and cast argument to `(long long)` Fix typos and add helper `PyErr_SetString()` method in torch/csrc/Exceptions.h Pull Request resolved: https://github.com/pytorch/pytorch/pull/44767 Reviewed By: ezyang Differential Revision: D23723671 Pulled By: malfet fbshipit-source-id: c0101aed222184aa436b1e8768480d1531dff232	2020-09-17 14:00:02 -07:00
Alex Suhan	5d57025206	[TensorExpr] Add log1p support to the LLVM backend (#44839 ) Summary: Also corrected Sleef_log1p registrations, float versions had a redundant f. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44839 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseLog1pFloat_LLVM Reviewed By: glaringlee Differential Revision: D23762113 Pulled By: asuhan fbshipit-source-id: b5cf003b5c0c1ad549c7f04470352231929ac459	2020-09-17 13:38:35 -07:00
Xiang Gao	f5440a448a	CUDA BFloat16 i0 support (#44750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44750 Reviewed By: glaringlee Differential Revision: D23764383 Pulled By: ngimel fbshipit-source-id: d0e784d89241e8028f97766fdac51fe1ab4c188c	2020-09-17 13:30:10 -07:00
Rohan Varma	bee97d5be0	Document the default behavior for dist.new_group() when ranks=None (#44000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44000 This wasn't documented, so add a doc saying all ranks are used when ranks=None ghstack-source-id: 111206308 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D23465034 fbshipit-source-id: 4c51f37ffcba3d58ffa5a0adcd5457e0c5676a5d	2020-09-17 11:30:37 -07:00
Yanan Cao	2558e5769d	Implement sort for list of tuples (#43448 ) Summary: * Implement tuple sort by traversing contained IValue types and generate a lambda function as comparator for sort. * Tuple, class objects can now arbitrarily nest within each other and still be sortable Fixes https://github.com/pytorch/pytorch/issues/43219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43448 Reviewed By: eellison Differential Revision: D23352273 Pulled By: gmagogsfm fbshipit-source-id: b6efa8d00e112178de8256da3deebdba7d06c0e1	2020-09-17 11:20:56 -07:00
Xiang Gao	c189328e5d	CUDA BFloat16 unary ops part 2 (#44824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44824 Reviewed By: mruberry Differential Revision: D23752360 Pulled By: ngimel fbshipit-source-id: 3aadaf9db9d4e4937aa38671e8589ecbeece709d	2020-09-17 10:57:43 -07:00
Rong Rong	c1fa42497b	fix legacy GET_BLOCKS code from THCUNN/common.h (#44789 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44472 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44789 Reviewed By: malfet Differential Revision: D23732762 Pulled By: walterddr fbshipit-source-id: c3748e365e9a1d009b00140ab0ef892da905d09b	2020-09-17 10:49:53 -07:00
vfdev	24df3b7373	torch.empty_like and torch.zeros_like raise error if any memory format is provided with sparse input (#43699 ) (#44058 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43699 - Changed the order of `TORCH_CHECK` and `if (options.layout() == kSparse && self.is_sparse())` inside `empty_like` method. - [x] Added tests EDIT: More details on that and why we can not take zeros_like approach. Python code : ```python res = torch.zeros_like(input_coalesced, memory_format=torch.preserve_format) ``` is routed to ```c++ // TensorFactories.cpp Tensor zeros_like( const Tensor& self, const TensorOptions& options, c10::optional<c10::MemoryFormat> optional_memory_format) { if (options.layout() == kSparse && self.is_sparse()) { auto res = at::empty({0}, options); // to be resized res.sparse_resize_and_clear_( self.sizes(), self.sparse_dim(), self.dense_dim()); return res; } auto result = at::empty_like(self, options, optional_memory_format); return result.zero_(); } ``` and passed to `if (options.layout() == kSparse && self.is_sparse())` When we call in Python ```python res = torch.empty_like(input_coalesced, memory_format=torch.preserve_format) ``` it is routed to ```c++ Tensor empty_like( const Tensor& self, const TensorOptions& options_, c10::optional<c10::MemoryFormat> optional_memory_format) { TORCH_CHECK( !(options_.has_memory_format() && optional_memory_format.has_value()), "Cannot set memory_format both in TensorOptions and explicit argument; please delete " "the redundant setter."); TensorOptions options = self.options() .merge_in(options_) .merge_in(TensorOptions().memory_format(optional_memory_format)); TORCH_CHECK( !(options.layout() != kStrided && optional_memory_format.has_value()), "memory format option is only supported by strided tensors"); if (options.layout() == kSparse && self.is_sparse()) { auto result = at::empty({0}, options); // to be resized result.sparse_resize_and_clear_( self.sizes(), self.sparse_dim(), self.dense_dim()); return result; } ``` cc pearu Pull Request resolved: https://github.com/pytorch/pytorch/pull/44058 Reviewed By: albanD Differential Revision: D23672494 Pulled By: mruberry fbshipit-source-id: af232274dd2b516dd6e875fc986e3090fa285658	2020-09-17 10:25:31 -07:00
Supriya Rao	1fde54d531	[quant][qat] Ensure fake_quant and observer can be disabled on scriptmodule (#44773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44773 The model is created and prepared using fx APIs and then scripted for training. In order to test QAT on scriptmodel we need to be able to disable/enable fake_quant and observer modules on it. Test Plan: python test/test_quantization.py TestQuantizeFx.test_qat_and_script Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23741354 fbshipit-source-id: 3fee7aa9b049d9901313b977710f4dc1c4501532	2020-09-17 10:21:52 -07:00
Supriya Rao	361b38da19	[quant][fx] Add node name as prefix to observer module name (#44765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44765 Test Plan: python test/test_quantization.py TestQuantizeFx.test_save_observer_state_dict Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23741355 fbshipit-source-id: 7185ceae5b3b520ac0beebb627c44eab7ae7d231	2020-09-17 10:17:42 -07:00
Natalia Gimelshein	74c3dcd1d2	Revert D23725053: [pytorch][PR] change self.generator to generator Test Plan: revert-hammer Differential Revision: D23725053 (`a011b86115`) Original commit changeset: 89706313013d fbshipit-source-id: 035214f0d4298d29a52f8032d364b52dfd956fe8	2020-09-17 09:42:37 -07:00
Yanli Zhao	d2b4534d4d	refactor intialize bucket views (#44330 ) Summary: [test all] Pull Request resolved: https://github.com/pytorch/pytorch/pull/44330 Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well ghstack-source-id: 112257271 Test Plan: unit tests Reviewed By: mrshenli Differential Revision: D23583347 fbshipit-source-id: a5f2041b2c4f2c2b5faba1af834c7143eaade938	2020-09-17 09:20:23 -07:00
Eli Uriegas	6006e45028	.circleci: Switch to dynamic MAX_JOBS (#44729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44729 Switches our MAX_JOBS from a hardcoded value to a more dynamic value so that we can always utilize all of the core that are available to us Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D23759643 Pulled By: seemethere fbshipit-source-id: ad26480cb0359c988ae6f994e26a09f601b728e3	2020-09-17 09:16:36 -07:00
Kimish Patel	f605d7581e	Implement better caching allocator for segmentation usecase. (#44618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44618 This diff refactors caching allocator to allow for overriding behavior by making it a virtual class. Test Plan: https://www.internalfb.com/intern/fblearner/details/218419618?tab=Experiment%20Results Reviewed By: dreiss Differential Revision: D23672902 fbshipit-source-id: 976f02922178695fab1c87f453fcb59142c258ec	2020-09-17 08:56:14 -07:00
Jane Xu	4affbbd9f8	minor style edits to torch/testing/_internal/common_quantized.py (#44807 ) Summary: style nits Pull Request resolved: https://github.com/pytorch/pytorch/pull/44807 Reviewed By: malfet Differential Revision: D23742537 Pulled By: janeyx99 fbshipit-source-id: 446343822d61f8fd9ef6dfcb8e5da4feff6522b6	2020-09-17 08:02:43 -07:00
Bert Maher	a40ef25e30	[te] Disable flaky test CudaSharedMemReduce_1 (#44862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44862 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23753831 Pulled By: bertmaher fbshipit-source-id: d7d524ac34e4ca208df022a5730c2d11b3068f12	2020-09-17 07:58:16 -07:00
Gregory Chanan	503c74888f	Always use NewModuleTest instead of ModuleTest. (#44745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44745 Much like CriterionTest, NewCriterionTest these are outdated formulations and we should just use the new one. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23717808 Pulled By: gchanan fbshipit-source-id: eb91982eef23452456044381334bfc9a5bbd837e	2020-09-17 07:36:39 -07:00
Heitor Schueroff de Souza	28085cbd39	Fixed quantile nan propagation and implemented nanquantile (#44393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44393 torch.quantile now correctly propagates nan and implemented torch.nanquantile similar to numpy.nanquantile. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23649613 Pulled By: heitorschueroff fbshipit-source-id: 5201d076745ae1237cedc7631c28cf446be99936	2020-09-17 05:53:25 -07:00
Yanan Cao	99093277c0	Support Python Slice class in TorchScript (#44335 ) Summary: Implements support for[ Python Slice class](https://docs.python.org/3/c-api/slice.html) (not slice expression, which is already supported) Slice object can be used in any place that supports slice expression, including multi-dim tensor slicing. Fixes https://github.com/pytorch/pytorch/issues/43511 Fixes https://github.com/pytorch/pytorch/issues/43125 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44335 Reviewed By: suo, jamesr66a Differential Revision: D23682213 Pulled By: gmagogsfm fbshipit-source-id: f74fe25370e89fbfd2b3727d95ce4e1c4ba8dec4	2020-09-17 00:41:53 -07:00
Mike Ruberry	b6f4bb0a70	Revert D23236088: [pytorch][PR] [caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp Test Plan: revert-hammer Differential Revision: D23236088 (`0ccc38b773`) Original commit changeset: daa90d9ee324 fbshipit-source-id: 933c7deab177250075683a9bea143ac37f16a598	2020-09-16 23:32:50 -07:00
Sameer Deshmukh	e18a2219dd	Implement scatter reductions (CUDA), remove divide/subtract (#41977 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33394 . This PR does two things: 1. Implement CUDA scatter reductions with revamped GPU atomic operations. 2. Remove support for divide and subtract for CPU reduction as was discussed with ngimel . I've also updated the docs to reflect the existence of only multiply and add. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41977 Reviewed By: mruberry Differential Revision: D23748888 Pulled By: ngimel fbshipit-source-id: ea643c0da03c9058e433de96db02b503514c4e9c	2020-09-16 23:25:21 -07:00
Ivan Kobzarev	fdeee74590	[pytorch][vulkan] Fix downcast warnings-errors, aten_vulkan buck target Summary: buck build has -Wall for downcasts - need to add safe_downcast<int32_t> everywhere BUCK build changes for aten_vulkan to include vulkan_wrapper lib Test Plan: The next diff with segmentation demo works fine Reviewed By: dreiss Differential Revision: D23739445 fbshipit-source-id: b22a30e1493c4174c35075a68586defb0fccd2af	2020-09-16 20:49:34 -07:00
Muthu Arivoli	b61d3d8be8	Implement torch.kaiser_window (#44271 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44271 Reviewed By: ngimel Differential Revision: D23727972 Pulled By: mruberry fbshipit-source-id: b4c931b2eb3a536231ad6d6c3cb66e52a13286ac	2020-09-16 20:41:31 -07:00
Xiang Gao	34331b0e0f	CUDA BFloat16 and other improvements on abs (#44804 ) Summary: Not sure if ROCm supports `std::abs` today, let's see the CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/44804 Reviewed By: mruberry Differential Revision: D23748837 Pulled By: ngimel fbshipit-source-id: ccf4e63279f3e5927a85d8d8f70ba4b8c334156b	2020-09-16 20:37:07 -07:00
alanashine	ba6534ae2b	enable type check common_distributed (#44821 ) Summary: Enabled type checking in common_distributed by using tensors of ints Pull Request resolved: https://github.com/pytorch/pytorch/pull/44821 Test Plan: Run python test/test_type_hints.py, errors are no longer ingnored by mypy.ini Reviewed By: walterddr Differential Revision: D23747466 Pulled By: alanadakotashine fbshipit-source-id: 820fd502d7ff715728470fbef0be90ae7f128dd6	2020-09-16 19:19:36 -07:00
Xiang Gao	e48201c5cf	Mention TF32 on related docs (#44690 ) Summary: cc: ptrblck ![image](https://user-images.githubusercontent.com/1032377/93168022-cbbfcb80-f6d6-11ea-8f6e-f2c8a15c5bea.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44690 Reviewed By: ngimel Differential Revision: D23727921 Pulled By: mruberry fbshipit-source-id: db7cc8e74cde09c13d6a57683129fd839863b914	2020-09-16 19:18:30 -07:00
Meghan Lele	79108fc16c	[JIT] Improve Future subtype checking (#44570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44570 Summary This commit improves subtype checking for futures so that `Future[T]` is considered to be a subtype of `Future[U]` if `U` is a subtype of `V`. Test Plan This commit adds a test case to `test_async.py` that tests this. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23660588 Pulled By: SplitInfinity fbshipit-source-id: b606137c91379debab91b9f41057f7b1605757c5	2020-09-16 18:54:51 -07:00
James Reed	29664e6aa3	[FX] Further sanitize generated names (#44808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44808 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D23739413 Pulled By: jamesr66a fbshipit-source-id: b759c3ea613dfa717fb23977b72ff4773d9dcc99	2020-09-16 18:47:38 -07:00
Nick Gibson	204f985fc3	[NNC] Add simplification of Loop + Condition patterns. (#44764 ) Summary: Adds a new optimization to the IRSimplifier which changes this pattern: ``` for ... if ... do thing; ``` into: ``` if ... for ... do thing; ``` Which should be almost strictly better. There are many cases where this isn't safe to do, hence tests. Most obviously when the condition depends on something modified within the loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44764 Reviewed By: mruberry Differential Revision: D23734463 Pulled By: nickgg fbshipit-source-id: 51617e837de96b354fb702d0090ac65ddc523d36	2020-09-16 18:41:58 -07:00
Ivan Kobzarev	8ec6bc7292	[pytorch][vulkan][jni] LiteModuleLoader load argument to use vulkan device Summary: ### Java, CPP Introducing additional parameter `device` to LiteModuleLoader to specify device on which the `forward` will work. On the java side this is enum that contains CPU and VULKAN, passing as jint to jni side and storing it as a member field on the same level as module. On pytorch_jni_lite.cpp - for all input tensors converting them to vulkan. On pytorch_jni_common.cpp (also goes to OSS) - if result Tensor is not cpu - call cpu. (Not Cpu at the moment is only Vulkan). ### BUCK Introducing `pytorch_jni_lite_with_vulkan` target, that depends on `pytorch_jni_lite_with_vulkan` and adds `aten_vulkan` In that case `pytorch_jni_lite_with_vulkan` can be used along with `pytorch_jni_lite_with_vulkan`. Test Plan: After the following diff with aidemo segmentation: ``` buck install -r aidemos-android ``` {F296224521} Reviewed By: dreiss Differential Revision: D23198335 fbshipit-source-id: 95328924e398901d76718c4d828f96e112dfa1b0	2020-09-16 18:35:22 -07:00
Danny Huang	0ccc38b773	[caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp (#44495 ) Summary: ## Motivation * To be able to make C2 ops cancellable so we can safely exit. * Some C2 operators are now blocking thus being non-cancellable. If an error occurs we need to be able to safely stop all net execution so we can throw the exception to the caller. * When an error occurs in a net or it got cancelled, running ops will have the `Cancel` method called. * This diff adds `Cancel` method to the `SafeEnqueueBlobsOp` and `SafeDequeueBlobsOp` to have the call queue->close() to force all the blocking ops to return. * Adds unit test that verified the error propagation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44495 Test Plan: ## Unit Test added to verify that queue ops propagate errors ``` buck test caffe2/caffe2/python:hypothesis_test ``` Reviewed By: dzhulgakov Differential Revision: D23236088 Pulled By: dahsh fbshipit-source-id: daa90d9ee32483fb51195e269a52cf5987bb0a5a	2020-09-16 18:17:34 -07:00
Jiakai Liu	3fa7f515a5	[pytorch][bot] update mobile op deps (#44700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44700 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23719486 Pulled By: ljk53 fbshipit-source-id: 39219ceeee51861f90b228fdfe2ab59ac8a9704d	2020-09-16 17:20:15 -07:00
Yanan Cao	6befc09465	Fix misuse of PyObject_IsSubclass (#44769 ) Summary: PyObject_IsSubclass may set python live exception bit if given object is not a class. `IsNamedTuple` is currently using it incorrectly, which may trip all following python operations in debug-build python. Normal release-build python is not affected because `assert` is no-op in release-build. Fixes https://github.com/pytorch/pytorch/issues/43577 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44769 Reviewed By: jamesr66a Differential Revision: D23725584 Pulled By: gmagogsfm fbshipit-source-id: 2dabd4f8667a045d5bf75813500876c6fd81542b	2020-09-16 16:19:01 -07:00
Meghan Lele	43fe034514	[JIT] Disallow plain Optional type annotation without arg (#44586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44586 Summary This commit disallows plain `Optional` type annotations without any contained types both in type comments and in-line as Python3-style type annotations. Test Plan This commit adds a unit test for these two situations. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23721517 Pulled By: SplitInfinity fbshipit-source-id: ead411e94aa0ccce227af74eb0341e2a5331370a	2020-09-16 16:07:26 -07:00
Mingzhe Li	574f9af160	[NCCL] Add option to run NCCL on high priority cuda stream (#43796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43796 This diff adds an option for the process group NCCL backend to pick high priority cuda streams. Test Plan: waitforsandcastle Reviewed By: jiayisuse Differential Revision: D23404286 fbshipit-source-id: b79ae097b7cd945a26e8ba1dd13ad3147ac790eb	2020-09-16 16:00:41 -07:00
Michael Suo	161490d441	Move `torch/version.py` generation to cmake (#44577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44577 I would like to to move this to cmake so that I can depend on it happening from other parts of the build. This PR pulls out the logic for determining the version string and writing the version file into its own module. `setup.py` still receives the version string and uses it as before, but now the code for writing out `torch/version.py` lives in a custom command in torch/CMakeLists.txt I noticed a small inconsistency in how version info is populated. `TORCH_BUILD_VERSION` is populated from `setup.py` at configuration time, while `torch/version.py` is written at build time. So if, e.g. you configured cmake on a certain git rev, then built it in on another, the two versions would be inconsistent. This does not appear to matter, so I opted to preserve the existing behavior. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23734781 Pulled By: suo fbshipit-source-id: 4002c9ec8058503dc0550f8eece2256bc98c03a4	2020-09-16 15:49:22 -07:00
Meghan Lele	ffe127e4f1	[JIT] Disallow plain Tuple type annotation without arg (#44585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44585 Summary This commit disallows plain `Tuple` type annotations without any contained types both in type comments and in-line as Python3-style type annotations. Test Plan This commit adds a unit test for these two situations. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23721515 Pulled By: SplitInfinity fbshipit-source-id: e11c77a4fac0b81cd535c37a31b9f4129c276592	2020-09-16 15:49:19 -07:00
qxu	09a84071a3	enable mypy check for jit_metaprogramming_utils (#44752 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42969 enable mypy check for jit_metaprogramming_utils.py and fixed all errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44752 Reviewed By: walterddr Differential Revision: D23741285 Pulled By: qxu-fb fbshipit-source-id: 21e36ca5d25c8682fb93b806e416b9e1db76f71e	2020-09-16 15:44:37 -07:00
Jerry Zhang	3f5bb2bade	[quant] Support clone for per channel affine quantized tensor (#44573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44573 fixes: https://github.com/pytorch/pytorch/issues/33309 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D23663828 fbshipit-source-id: 9a021a22b6075b1e94b3f91c0c101fbb9246ec0e	2020-09-16 15:37:44 -07:00
Alex Suhan	7b3432caff	[TensorExpr] Support boolean in simplifier (#44659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44659 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.ConstantFoldCastToBool Reviewed By: ngimel Differential Revision: D23714675 Pulled By: asuhan fbshipit-source-id: 4c18d972b628d5ad55bad58eddd5f6974e043d9c	2020-09-16 15:30:19 -07:00
Edward Yang	ac0d13cc88	Vectorize complex copy. (#44722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44722 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D23731276 Pulled By: ezyang fbshipit-source-id: 4902c4b79577ae3c70aca94828006b12914ab7f9	2020-09-16 15:15:12 -07:00
Meghan Lele	78b806ab4a	[JIT] Disallow plain List type annotation without arg (#44584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44584 Summary This commit extends the work done in #38130 and disallows plain Python3-style `List` type annotations. Test Plan This commit extends `TestList.test_no_element_type_annotation` to the Python3-style type annotation. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23721514 Pulled By: SplitInfinity fbshipit-source-id: 48957868286f44ab6d5bf5e1bf97f0a4ebf955df	2020-09-16 15:08:04 -07:00
Meghan Lele	cb3b8a33f1	[JIT] Disallow plain Dict type annotation without arg (#44334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44334 Summary This commit detects and prohibits the case in which `typing.Dict` is used as an annotation without type arguments (i.e. `typing.Dict[K, V]`). At present, `typing.Dict` is always assumed to have two arguments, and when it is used without them, `typing.Dict.__args__` is nonempty and contains some `typing.TypeVar` instances, which have no JIT type equivalent. Consequently, trying to convert `typing.Dict` to a JIT type results in a `c10::DictType` with `nullptr` for its key and value types, which can cause a segmentation fault. This is fixed by returning a `DictType` from `jit.annotations.try_ann_to_type` only if the key and value types are converted successfully to a JIT type and returning `None` otherwise. Test Plan This commit adds a unit test to `TestDict` that tests the plain `Dict` annotations throw an error. Fixes This commit closes #43530. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23610766 Pulled By: SplitInfinity fbshipit-source-id: 036b10eff6e3206e0da3131cfb4997d8189c4fec	2020-09-16 14:38:28 -07:00
Edward Yang	5027c161a9	Add TORCH_SELECTIVE_NAME to AMP definitions (#44711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44711 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23711425 Pulled By: ezyang fbshipit-source-id: d4b0ef77893af80fe9b74791e66825e223ae221d	2020-09-16 14:25:17 -07:00
Nick Gibson	82ab167cce	[NNC] Fix masking for all block and thread dimensions in CudaCodeGen (#44733 ) Summary: Unifies a number of partial solutions to the thread and block dimension extent masking, including the NoThreadIdxWriter and my last fix https://github.com/pytorch/pytorch/issues/44325. The NoThreadIdxWriter is gone in favour of tracking the current loop extents and masking any statements that have a lower rank than the launch parameters in any Block or Thread dimension, which handles both the "no" and "smaller" axis binding cases. For example it will transform the following: ``` for i in 0..10 // blockIdx.x for j in 0..10 // threadIdx.x do thing(i, j); for k in 0..5 // threadIdx.x do other thing(i, k); ``` Into: ``` do thing(blockIdx.x, threadIdx.x); if (threadIdx.x < 5) { do other thing(blockIdx.x, threadIdx.x); } ``` And handle the case where statements are not bound by any axis, eg. ``` do outer thing; for i in 0..10 // blockIdx.x for j in 0..10 // threadIdx.x do thing(i, j); do other thing(i); ``` will become: ``` if (blockIdx.x < 1) { if (threadIdx.x < 1) { do outer thing; } } syncthreads(); do thing(blockIdx.x, threadIdx.x); syncthreads(); if (threadIdx.x < 1) { do other thing(blockIdx.x); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44733 Reviewed By: mruberry Differential Revision: D23736878 Pulled By: nickgg fbshipit-source-id: 52d08626ae8043d53eb937843466874d479a6768	2020-09-16 14:23:47 -07:00
Venkata Chintapalli	a3835179a1	[FakeLowP] Addressing FakeLowP OSS issues. (#44819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44819 [12:39 AM] Cherckez, Tal please review the following patch. should address these issues that our validation team found: A) test_op_nnpi_fp16: hypothesis to trigger max_example*max_example. B) batchnorm: batchNorm has derived from unit test which doesnt have setting required for hypothesis. hence default value as 100 getting set. Test Plan: buck test //caffe2/caffe2/contrib/fakelowp/test/... https://our.intern.facebook.com/intern/testinfra/testrun/5910974543950859 Reviewed By: hyuen Differential Revision: D23740970 fbshipit-source-id: 16fcc49f7bf84a5d7342786f671cd0b4e0fc87d3	2020-09-16 13:56:11 -07:00
Ivan Yashchuk	07d9cc80a4	Fix error code checks for triangular_solve (CPU) (#44720 ) Summary: Added missing error checks for the CPU version of `triangular_solve`. Fixes https://github.com/pytorch/pytorch/issues/43141. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44720 Reviewed By: mruberry Differential Revision: D23733400 Pulled By: ngimel fbshipit-source-id: 9837e01b04a6bfd9181e08d46bf96329f292cae0	2020-09-16 13:54:45 -07:00
Yi Wang	f3bd984e44	Move the description comment of compute_bucket_assignment_by_size from cpp to the header file. (#44703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44703 The description of this public function should be in the header file. Also fix some typos. Test Plan: N/A. Reviewed By: pritamdamania87 Differential Revision: D23703661 fbshipit-source-id: 24ae63de9498e321b31dfb2efadb44183c6370df	2020-09-16 13:44:14 -07:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
Ivan Kobzarev	6debe825be	[vulkan] glsl shaders relaxed precision mode to cmake option (#43076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43076 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D23143354 Pulled By: IvanKobzarev fbshipit-source-id: 7b3ead1e63cf8acf6e8e547080a8ead7a2db994b	2020-09-16 12:51:34 -07:00
James Reed	e9c6449b46	[FX][EZ] Allow constructing GraphModule with dict for root (#44679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44679 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23696766 Pulled By: jamesr66a fbshipit-source-id: fe18b7b579c1728d00589bd5fd5e54c917cc61fe	2020-09-16 12:43:23 -07:00
Nikita Shulga	1718b16d15	[Caffe2] gcs_cuda_only is trivial if CUDA not available (#44578 ) Summary: Make `gcs_cuda_only` and `gcs_gpu_only` return empty device lists if CUDA/GPU(CUDA or RocM) not available Pull Request resolved: https://github.com/pytorch/pytorch/pull/44578 Reviewed By: walterddr Differential Revision: D23664227 Pulled By: malfet fbshipit-source-id: 176b5d964c0b02b8379777cd9a38698c11818690	2020-09-16 12:24:08 -07:00
Nikita Shulga	c44e4878ae	Enable torch.backends.quantized typechecks (#44794 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44794 Reviewed By: walterddr Differential Revision: D23734353 Pulled By: malfet fbshipit-source-id: 491bd7c8f147759715eb296d7537a172685aa066	2020-09-16 12:21:20 -07:00
Richard Zou	1cd5ba49c6	Add batching rule for "is_complex", "conj" (#44649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44649 To unblock #43208, which adds "is_complex" checks to backward formulas that are being tested for batched gradient support with vmap. Test Plan: - `pytest test/test_vmap.py -v` Reviewed By: anjali411 Differential Revision: D23685356 Pulled By: zou3519 fbshipit-source-id: 29e41a9296336f6d1008e3040cade4c643bf5ebf	2020-09-16 12:19:46 -07:00
Shen Li	cce7680a23	Add bound method tests for async_execution with RRef helper (#44716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44716 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23707326 Pulled By: mrshenli fbshipit-source-id: a2f8db17447e9f82c9f6ed941ff1f8cb9090ad74	2020-09-16 12:01:07 -07:00
Shen Li	257c6d0fde	Make async_execution compatible with RRef helpers (#44666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44666 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23691989 Pulled By: mrshenli fbshipit-source-id: b36f4b1c9d7782797a0220434a8272610a23e83e	2020-09-16 12:01:05 -07:00
Shen Li	924717bf51	Add _get_type() API to RRef (#44663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44663 The new API returns the type of the data object referenced by this `RRef`. On the owner, this is same as `type(rref.local_value())`. On a user, this will trigger an RPC to fetch the `type` object from the owner. After this function is run once, the `type` object is cached by the `RRef`, and subsequent invocations no longer trigger RPC. closes #33210 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23691990 Pulled By: mrshenli fbshipit-source-id: a2d87cd601a691dd75164b6bcd7315245e9cf6bd	2020-09-16 11:59:22 -07:00
Abdelrauf	6954ae1278	Vec256 Test cases (#42685 ) Summary: [Tests for Vec256 classes https://github.com/pytorch/pytorch/issues/15676](https://github.com/pytorch/pytorch/issues/15676) Testing Current list: - [x] Blends - [x] Memory: UnAlignedLoadStore - [x] Arithmetics: Plus,Minu,Multiplication,Division - [x] Bitwise: BitAnd, BitOr, BitXor - [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual - [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp - [x] SignManipulation: Absolute, Negate - [x] Interleave: Interleave, DeInterleave - [x] Rounding: Round, Ceil, Floor, Trunc - [x] Mask: ZeroMask - [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal - [x] Trigonometric: Sin, Cos, Tan - [x] Hyperbolic: Tanh, Sinh, Cosh - [x] InverseTrigonometric: Asin, ACos, ATan, ATan2 - [x] Logarithm: Log, Log2, Log10, Log1p - [x] Exponents: Exp, Expm1 - [x] ErrorFunctions: Erf, Erfc, Erfinv - [x] Pow: Pow - [x] LGamma: LGamma - [x] Quantization: quantize, dequantize, requantize_from_int - [x] Quantization: widening_subtract, relu, relu6 Missing: - [ ] Constructors, initializations - [ ] Conversion , Cast - [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex) #### Notes on tests and testing framework - some math functions are tested within domain range - mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions. - some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~ - round was tested against pytorch at::native::round_impl. ~~for double type on Vsx vec_round failed for (even)+0 .5 values~~ . it was solved by using vec_rint - ~~complex types are not tested~~ After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain - ~~quantizations are not tested~~ Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions - the testing framework should be improved further - ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~ Vec256 Test cases will be built for each CPU_CAPABILITY Fixes: https://github.com/pytorch/pytorch/issues/15676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42685 Reviewed By: malfet Differential Revision: D23034406 Pulled By: glaringlee fbshipit-source-id: d1bf03acdfa271c88744c5d0235eeb8b77288ef8	2020-09-16 11:48:02 -07:00
Natalia Gimelshein	e6101f5507	fixes lda condition for blas functions, fixes bug with beta=0 in addmv slow path (#44681 ) Summary: per title. If `beta=0` and slow path was taken, `nan` and `inf` in the result were not masked as is the case with other linear algebra functions. Similarly, since `mv` is implemented as `addmv` with `beta=0`, wrong results were sometimes produced for `mv` slow path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44681 Reviewed By: mruberry Differential Revision: D23708653 Pulled By: ngimel fbshipit-source-id: e2d5d3e6f69b194eb29b327e1c6f70035f3b231c	2020-09-16 11:47:56 -07:00
Hong Xu	570102ce85	Remove many unused THC pointwise math operators (#44230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44230 Reviewed By: albanD Differential Revision: D23701185 Pulled By: ngimel fbshipit-source-id: caf7b7a815b37d50232448d6965e591508546bd7	2020-09-16 11:47:51 -07:00
Yanan Cao	07d07e3c6c	Remove EXPERIMENTAL_ENUM_SUPPORT feature guard (#44243 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44243 Reviewed By: ZolotukhinM Differential Revision: D23605979 Pulled By: gmagogsfm fbshipit-source-id: 098ae69049c4664ad5d1521c45b8a7dd22e72f6c	2020-09-16 11:45:59 -07:00
Michael Carilli	3e6bb5233f	Reference amp tutorial (recipe) from core amp docs (#44725 ) Summary: https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html is live. Core amp docs should reference it. Also i fixed some typos in the `zero_grad` docs we ignored when git was behaving weirdly during ngimel 's merge of https://github.com/pytorch/pytorch/pull/44423. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44725 Reviewed By: mruberry Differential Revision: D23723807 Pulled By: ngimel fbshipit-source-id: ca0b76365f8ca908bd978e3b38bf81857fa6c2a3	2020-09-16 11:37:58 -07:00
Fang Zhang	a011b86115	change self.generator to generator (#44461 ) Summary: bug fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/44461 Reviewed By: mruberry Differential Revision: D23725053 Pulled By: ngimel fbshipit-source-id: 89706313013d9eae96aaaf144924867457efd2c0	2020-09-16 11:32:17 -07:00
Xiang Gao	ee493e1a91	CUDA bfloat compare ops (#44748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44748 Reviewed By: mruberry Differential Revision: D23725997 Pulled By: ngimel fbshipit-source-id: 4f89dce3a8b8f1295ced522011b59e60d756e749	2020-09-16 11:32:14 -07:00
Louis Feng	eb75cfb9c0	Back out "Revert D23323486: DPP Async Tracing" plus windows build fix. (#44702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44702 Original commit changeset: c6bd6d277aca This diff caused windows build to fail due to a compiler bug in VS2019 (lambda capture constant int value). This back out works around the issue with explicit capture of const int value. Test Plan: Tested and previously landed. Reviewed By: mruberry Differential Revision: D23703215 fbshipit-source-id: f9ef23be97540bc9cf78a855295fb8c69f360459	2020-09-16 11:32:11 -07:00
Antonio Cuni	ced8727d88	Fix a broken link in CONTRIBUTING.md (#44701 ) Summary: as the title says :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44701 Reviewed By: ngimel Differential Revision: D23724919 Pulled By: mrshenli fbshipit-source-id: 5ca5ea974ee6a94ed132dbe7892a9b4b9c3dd9be	2020-09-16 11:30:05 -07:00
Jimmy Yao	5e717f0d5e	delete the space for the docs rendering (#44740 ) Summary: see the docs rendering of `jacobian` and `hessian` at https://pytorch.org/docs/stable/autograd.html ![image](https://user-images.githubusercontent.com/20907377/93268949-f0618500-f762-11ea-9ec6-ddd062540c59.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44740 Reviewed By: ngimel Differential Revision: D23724899 Pulled By: mrshenli fbshipit-source-id: f7558ff53989e5dc7e678706207be2ac7ce22c66	2020-09-16 11:13:45 -07:00
Nikita Shulga	a5cc151b8c	Build EigenBlas as static library (#44747 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43709 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44747 Reviewed By: ezyang Differential Revision: D23717927 Pulled By: malfet fbshipit-source-id: c46fbcf5a55895cb984dd4c5301fbcb784fc17d5	2020-09-16 10:25:26 -07:00
Pritam Damania	b63b684394	Consolidate CODEOWNERS file for distributed package. (#44763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44763 The file had separate rules for RPC and DDP/c10d, consolidated all of it together and placed all the distributed rules together. ghstack-source-id: 112140871 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D23721162 fbshipit-source-id: d41c757eb1615376d442bd6b2802909624bd1d3f	2020-09-16 10:19:25 -07:00
Pritam Damania	dbf17a1d4c	Fixing a few links in distributed CONTRIBUTING.md (#44753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44753 ghstack-source-id: 112132781 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D23719077 fbshipit-source-id: 3d943dfde100d175f417554fc7fca1fdb295129f	2020-09-16 10:14:19 -07:00
Xiang Gao	06036f76b6	CUDA BFloat16 pow (#44760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44760 Reviewed By: ngimel Differential Revision: D23727936 Pulled By: mruberry fbshipit-source-id: 8aa89e989294347d7f593b1a63ce4a1dbfdf783e	2020-09-16 10:01:21 -07:00
Rohan Varma	63469da3bb	Add a test to ensure DDP join works with RPC (#44439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44439 Adds a test to ddp_under_dist_autograd_test to enusre that that uneven inputs join() API works properly when DDP + RPC is combined. We test that when running in outside DDP mode (DDP applied to whole hybrid module) we can correctly process uneven inputs across different trainers. ghstack-source-id: 112156980 Test Plan: CI Reviewed By: albanD Differential Revision: D23612409 fbshipit-source-id: f1e328c096822042daaba263aa8747a9c7e89de7	2020-09-16 09:51:43 -07:00
Supriya Rao	3f512b0de2	[quant][qat] Ensure observers and fq modules are scriptable (#44749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44749 Ensure fx module is scriptable after calling prepare_qat on it Test Plan: python test/test_quantization.py TestQuantizeFx.test_qat_and_script Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23718380 fbshipit-source-id: abf63ffb21e707f7def8f6c88246877f5aded58c	2020-09-16 09:30:07 -07:00
Mikhail Zolotukhin	b85568a54a	[CI] Add profiling-te benchmarks. (#44756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44756 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23719728 Pulled By: ZolotukhinM fbshipit-source-id: 739940e02a6697fbed2a43a13682a6e5268f710b	2020-09-15 21:33:03 -07:00
Mikhail Zolotukhin	d66520ba08	[TensorExpr] Fuser: try merging adjacent fusion groups. (#43671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43671 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23360796 Pulled By: ZolotukhinM fbshipit-source-id: 60ec318fe77ae9f2c821d9c4d106281845266e0f	2020-09-15 21:31:02 -07:00
Kent Gauen	2efc618f19	lr_schedule.py redundant code (#44613 ) Summary: The subclass sets "self.last_epoch" when this is set in the parent class's init function. Why would we need to set last_epoch twice? I think calling "super" resets last_epoch anyway, so I am not sure why we would want to include this in the subclass. Am I missing something? For the record, I am just a Pytorch enthusiast. I hope my question isn't totally silly. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/44613 Reviewed By: albanD Differential Revision: D23691770 Pulled By: mrshenli fbshipit-source-id: 080d9acda86e1a2bfaafe2c6fcb8fc1544f8cf8a	2020-09-15 20:28:39 -07:00
Zachary DeVito	2c1b215b48	[fx] remove delegate, replace with tracer (#44566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44566 The Delegate objects were confusing. They were suppose to be a way to configure how tracing works, but in some cases they appeared necessary for consturcting graphs, which was not true. This makes the organization clearer by removing Delgate and moving its functionality into a Tracer class, similar to how pickle has a Pickler class. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23683177 Pulled By: zdevito fbshipit-source-id: 7605a34e65dfac9a487c0bada39a23ca1327ab00	2020-09-15 16:52:22 -07:00
Jeffrey Wan	993b4651fd	Convert num_kernels to int64 before calling into CUDA GET_BLOCKS (#44688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44688 this fixes https://github.com/pytorch/pytorch/issues/44472 Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D23699819 Pulled By: soulitzer fbshipit-source-id: 7ecfe78d09344178d1e6c7e1503417feb6beff6c	2020-09-15 15:10:55 -07:00
Ailing Zhang	fb085d90e3	Revert D23583017: move rebuild buckets from end of first iteration to beginning of second iteration Test Plan: revert-hammer Differential Revision: D23583017 (`f5d231d593`) Original commit changeset: ef67f79437a8 fbshipit-source-id: fd914b7565aba6a5574a32b31403525abb80ff07	2020-09-15 15:10:52 -07:00
Kevin Stephano	26a91a9f04	[WIP][JIT] Add benchmarking support of NV Fuser with FP16 dtype support (#44101 ) Summary: Modified files in `benchmarks/tensorexpr` to add support for NVIDIA's Fuser for the jit compiler. This support has some modifications besides adding an option to support the NVIDIA fuser: * Adds FP16 Datatype support * Fixes SOL/Algo calculations to generally use the data type instead of being fixed to 4 bytes * Adds IR printing and kernel printing knobs * Adds a knob `input_iter` to create ranges of inputs currently only for reductions * Adds further reduction support for Inner and Outer dimension reductions that are compatible with the `input_iter` knob. * Added `simple_element`, `reduce2d_inner`, and `reduce2d_outer` to isolate performance on elementwise and reduction operations in the most minimal fashion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44101 Reviewed By: ngimel Differential Revision: D23713658 Pulled By: bertmaher fbshipit-source-id: d6b83cfab559aefe107c23b3c0f2df9923b3adc1	2020-09-15 15:10:49 -07:00
Dmytro Dzhulgakov	2f4c31ce3a	[jit] Speed up saving in case of many classes (#44589 ) Summary: There's an annoying O(N^2) in module export logic that makes saving some of the models (if they have many classes) take eternity. I'm not super familiar with this code to properly untangle the deps and make it a pure hash lookup. So I just added a side lookup table for raw pointers. It's still quadratic, but it's O(num_classes^2) instead of O(num_classes * num_references) which already gives huge savings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44589 Test Plan: Tested with one of the offending models - just loading a saving a Torchscript file: ``` Before: load 1.9239683151245117 save 165.74712467193604 After: load 1.9409027099609375 save 1.4711427688598633 ``` Reviewed By: suo Differential Revision: D23675278 Pulled By: dzhulgakov fbshipit-source-id: 8f3fa7730941085ea20d9255b49a149ac1bf64fe	2020-09-15 15:10:45 -07:00
Yan Xie	285ba0d068	Enable fp16 for UniformFill (#44540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44540 Support output type to be fp16 for UniformFill Reviewed By: jianyuh Differential Revision: D23558030 fbshipit-source-id: 53a5b2c92cfe78cd11f55e6ee498e1bd682fe4a1	2020-09-15 15:09:18 -07:00
Nick Gibson	69839ea3f6	[NNC] make inlining immediate (take 3) (#44231 ) Summary: This is a reup https://github.com/pytorch/pytorch/issues/43885 with an extra commit which should fix the bugs that caused it to be reverted. Read that for general context. The issue here was that we were still using the side maps `tensor_to_stmt_` and `stmt_to_tensor_` which get invalidated by any transform of the IR (rather than just any transform that isn't computeInline). I added a comment about this but didn't actually address our usages of it. I've removed these maps and changed the `getLoopBodyFor` and `getLoopStatementsFor` helpers to search the root stmt directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44231 Reviewed By: albanD Differential Revision: D23689688 Pulled By: nickgg fbshipit-source-id: 1c6009a880f8c0cebf2300fd06b5cc9322bffbf9	2020-09-15 11:12:24 -07:00
Elias Ellison	8df0400a50	Fix fallback graph in specialize autogradzero (#44654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44654 Previously we weren't creating a fallback graph as intended in specialize autograd zero, so if a Tensor failed one of our undefinedness checks we would run the backward normally without reprofiling & optimizing. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23691764 Pulled By: eellison fbshipit-source-id: 10c6fa79518c84a6f5ef2bfbd9ea10843af751eb	2020-09-15 11:12:20 -07:00
Yan Xie	4ce6af35c4	Enable fp16 for CUDA SparseLengthsSum/Mean (#44089 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44089 Add support of fp16 as input type in SparseLengthSum/Mean caffe2 operator Reviewed By: xianjiec Differential Revision: D23436877 fbshipit-source-id: 02fbef2fde17d4b0abea9ca5d17a36aa989f98a0	2020-09-15 11:10:54 -07:00
Richard Zou	07cba8b1fc	Run vmap tests in CI (#44656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44656 All this time, test_vmap wasn't running in the CI. Fortunately all the tests pass locally for me. h/t to anjali411 for pointing this out. Test Plan: - Wait for CI Reviewed By: anjali411 Differential Revision: D23689355 Pulled By: zou3519 fbshipit-source-id: 543c3e6aed0af77bfd6ea7a7549337f8230e3d32	2020-09-15 10:59:00 -07:00
Eli Uriegas	d62994a94d	ci: Add anaconda pruning to CI pipeline (#44651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44651 Adds pruning for our anaconda channels (pytorch-nightly, pytorch-test) into our CI pipeline so that it gets run on a more consistent basis. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D23692851 Pulled By: seemethere fbshipit-source-id: fa69b506b73805bf2ffbde75d221aef1ee3f753e	2020-09-15 10:51:05 -07:00
kshitij12345	1d733d660d	[docs] torch.min/max: remove incorrect warning from docs (#44615 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44195 cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/44615 Reviewed By: ngimel Differential Revision: D23703525 Pulled By: mruberry fbshipit-source-id: 471ebd764be667e29c03a30f3ef341440adc54d2	2020-09-15 10:42:08 -07:00
Xiang Gao	6bc77f4d35	Use amax/maximum instead of max in optimizers (#43797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43797 Reviewed By: malfet Differential Revision: D23406641 Pulled By: mruberry fbshipit-source-id: 0cd075124aa6533b21375fe2c90c44a5d05ad6e6	2020-09-15 10:39:40 -07:00
Muthu Arivoli	9c364da9b9	Fix doc builds for bool kwargs (#44686 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43669 The bool will still link to https://docs.python.org/3/library/functions.html#bool. Tested using bmm: ![image](https://user-images.githubusercontent.com/16063114/93156438-2ad11080-f6d6-11ea-9b81-96e02ee68d90.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44686 Reviewed By: ngimel Differential Revision: D23703823 Pulled By: mruberry fbshipit-source-id: 7286afad084f5ab24a1254ad84e5d01907781c85	2020-09-15 10:34:58 -07:00
Yanli Zhao	f5d231d593	move rebuild buckets from end of first iteration to beginning of second iteration (#44326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44326 Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration ghstack-source-id: 112011490 Test Plan: unit tests Reviewed By: mrshenli Differential Revision: D23583017 fbshipit-source-id: ef67f79437a820d9b5699b651803622418499a83	2020-09-15 09:51:33 -07:00
Vasiliy Kuznetsov	5f692a67db	qat conv_fused.py: one more patch for forward compatibility (#44671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44671 See comments inline - the FC between https://github.com/pytorch/pytorch/pull/38478 and https://github.com/pytorch/pytorch/pull/38820 was broken, patching it. Test Plan: Verified with customer hitting the issue that this fixes their issue. Reviewed By: jerryzh168 Differential Revision: D23694029 fbshipit-source-id: a5e1733334e22305a111df750b190776889705d0	2020-09-15 09:43:29 -07:00
pinzhenx	72b5665c4f	Upgrade oneDNN (mkl-dnn) to v1.6 (#44706 ) Summary: - Bump oneDNN (mkl-dnn) to 1.6 for bug fixes - Fixes https://github.com/pytorch/pytorch/issues/42446. RuntimeError: label is redefined for convolutions with large filter size on Intel AVX512 - Implemented workaround for internal compiler error when building oneDNN with Microsoft Visual Studio 2019 (https://github.com/pytorch/pytorch/pull/43169) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44706 Reviewed By: ngimel Differential Revision: D23705967 Pulled By: albanD fbshipit-source-id: 65e8fecc52a76c9f3324403a8b60ffa8a8948bc6	2020-09-15 09:30:01 -07:00
Mike Ruberry	7036e91abd	Revert D23323486: DPP Async Tracing Test Plan: revert-hammer Differential Revision: D23323486 (`71673b31f9`) Original commit changeset: 4b6ca6c0e320 fbshipit-source-id: c6bd6d277aca070bef2de3522c2a60e23b4395ad	2020-09-15 01:19:23 -07:00
Michael Carilli	2435d941b1	Fix FP16 fastAtomicAdd for one case where tensor start address is not 32 bit aligned (#44642 ) Summary: For https://github.com/pytorch/pytorch/issues/44206 and https://github.com/pytorch/pytorch/issues/42218, I'd like to update trilinear interpolate backward and grid_sample backward to use `fastAtomicAdd`. As a prelude, I spotted a UB risk in `fastAtomicAdd`. I think existing code incurs a misaligned `__half2` atomicAdd when `index` is odd and `tensor` is not 32-bit aligned (`index % 2 == 1` and `(reinterpret_cast<std::uintptr_t>(tensor) % sizeof(__half2) == 1`). In this case we think we're `!low_bit` and go down the `!low_bit` code path, but in fact we are `low_bit`. It appears the original [fastAtomicAdd PR](https://github.com/pytorch/pytorch/pull/21879#discussion_r295040377)'s discussion did not consider that case explicitly. I wanted to push my tentative fix for discussion ASAP. jjsjann123 and mkolod as original authors of `fastAtomicAdd`. (I'm also curious why we need to `reinterpret_cast<std::uintptr_t>(tensor...` for the address modding, but that's minor.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44642 Reviewed By: mruberry Differential Revision: D23699820 Pulled By: ngimel fbshipit-source-id: 0db57150715ebb45e6a1fb36897e46f00d61defd	2020-09-14 22:07:29 -07:00
Michael Carilli	2fd142a2ef	Small clarification to amp gradient penalty example (#44667 ) Summary: requested by https://discuss.pytorch.org/t/what-is-the-correct-way-of-computing-a-grad-penalty-using-amp/95827/3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44667 Reviewed By: mruberry Differential Revision: D23692768 Pulled By: ngimel fbshipit-source-id: 83c61b94e79ef9f86abed2cc066f188dce0c8456	2020-09-14 21:56:09 -07:00
Kyle Chen	aedce773ed	Deleted docker images for rocm 3.3 and rocm 3.5 (#44672 ) Summary: jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/44672 Reviewed By: malfet Differential Revision: D23694924 Pulled By: xw285cornell fbshipit-source-id: 0066dc4b36c366588e1f309c82e7e1dc2ce8eec1	2020-09-14 21:50:41 -07:00
Vitaliy Chiley	c71ce10cfc	add dilation to transposeconv's _output_padding method (#43793 ) Summary: This PR adds dilation to _ConvTransposeNd._output_padding method and tests using a bunch of different sized inputs. Fixes https://github.com/pytorch/pytorch/issues/14272 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43793 Reviewed By: zou3519 Differential Revision: D23493313 Pulled By: ezyang fbshipit-source-id: bca605c428cbf3a97d3d24316d8d7fde4bddb307	2020-09-14 21:28:27 -07:00
peter	ed862d3682	Split CUDA_NVCC_FLAGS by space (#44603 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44599 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44603 Reviewed By: albanD Differential Revision: D23692320 Pulled By: ezyang fbshipit-source-id: 6a63d94ab8b88e7a82f9d65f03523d6ef639c754	2020-09-14 20:25:37 -07:00
Mike Ruberry	2c4b4aa81b	Revert D23494065: Refactor CallbackManager as a nested class of RecordFunction. Test Plan: revert-hammer Differential Revision: D23494065 (`63105fd5b1`) Original commit changeset: 416d5bf6c942 fbshipit-source-id: 3b1ec928e3db0cc203bb63ec4db3da1584b9b884	2020-09-14 19:43:50 -07:00
Meghan Lele	e7d782e724	[JIT] Add property support for ScriptModules (#42390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42390 Summary This commit extends support for properties to include ScriptModules. Test Plan This commit adds a unit test that has a ScriptModule with a user-defined property. `python test/test_jit_py3.py TestScriptPy3.test_module_properties` Test Plan: Imported from OSS Reviewed By: eellison, mannatsingh Differential Revision: D22880298 Pulled By: SplitInfinity fbshipit-source-id: 74f6cb80f716084339e2151ca25092b6341a1560	2020-09-14 18:49:21 -07:00
Louis Feng	63105fd5b1	Refactor CallbackManager as a nested class of RecordFunction. (#44645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44645 Moved CallbackManager as a nested class of RecordFunction to allow private access to the call handles and context without exposing them publicly. It still hides the singleton instance of the CallbackManager inside record_function.cpp. Test Plan: Unit tests. Reviewed By: ilia-cher Differential Revision: D23494065 fbshipit-source-id: 416d5bf6c9426e112877fbd233a6f4dff7bef455	2020-09-14 18:44:40 -07:00
Louis Feng	71673b31f9	DPP Async Tracing (#44252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44252 Add tracing to DPP client. Because DPP requests are async, we need to be able to start a trace event in one thread and potentially end in a different thread. RecordFunction and LibgpumonObserver previously assume each trace event starts and finishes in the same thread. So they use a thread local context to track enter and exit call backs. Async events breaks this assumption. This change attaches the event context to the RecordFunction object so we do not need to use thread local context. Test Plan: Tested with dpp perf test and able to collect trace. {F307824044} Reviewed By: ilia-cher Differential Revision: D23323486 fbshipit-source-id: 4b6ca6c0e32028fb38a476cd1f44c17a001fc03b	2020-09-14 18:43:14 -07:00
Guilherme Leobas	e107ef5ca2	Add type annotations for torch.nn.utils.* (#43080 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43013 Redo of gh-42954 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43080 Reviewed By: albanD Differential Revision: D23681334 Pulled By: malfet fbshipit-source-id: 20ec78aa3bfecb7acffc12eb89d3ad833024394c	2020-09-14 17:52:37 -07:00
Elias Ellison	551494b01d	[JIT] Fix torch.tensor for empty multidimensional-typed lists (#44652 ) Summary: We were hitting an assert error when you passed in an empty `List[List[int]]` - this fixes that error by not recursing into 0-element tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44652 Reviewed By: ZolotukhinM Differential Revision: D23688247 Pulled By: eellison fbshipit-source-id: d48ea24893044fae96bc39f76c0f1f9726eaf4c7	2020-09-14 17:28:23 -07:00
Kurt Mohler	2254e5d976	Add note comments to enforce nondeterministic alert documentation (#44140 ) Summary: This PR fulfills Ed's request (https://github.com/pytorch/pytorch/pull/41692#discussion_r473122076) for a strategy to keep the functions that have nondeterministic alerts fully documented. Part of https://github.com/pytorch/pytorch/issues/15359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44140 Reviewed By: colesbury Differential Revision: D23644469 Pulled By: ezyang fbshipit-source-id: 60936ccced13f071c620f7d25ef6dcbca338de7f	2020-09-14 16:48:22 -07:00
Facebook Community Bot	a91c2be2a9	Automated submodule update: FBGEMM (#44647 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `1d710393d5` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44647 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: dskhudia Differential Revision: D23684528 fbshipit-source-id: 316ff2e448707a6e5a83248c9b22e58118bc8741	2020-09-14 16:43:59 -07:00
Mike Ruberry	686e281bcf	Updates div to perform true division (#42907 ) Summary: This PR: - updates div to perform true division - makes torch.true_divide an alias of torch.div This follows on work in previous PyTorch releases that first deprecated div performing "integer" or "floor" division, then prevented it by throwing a runtime error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42907 Reviewed By: ngimel Differential Revision: D23622114 Pulled By: mruberry fbshipit-source-id: 414c7e3c1a662a6c3c731ad99cc942507d843927	2020-09-14 15:50:38 -07:00
Jerry Zhang	e594c30bc2	[quant][graphmode][fx] Support fp16 dynamic quantization for linear (#44582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44582 Test Plan: test_quantize_fx.py Imported from OSS Reviewed By: vkuzo Differential Revision: D23665974 fbshipit-source-id: 19ba6c61a9c77ef570b00614016506e9a2729f7c	2020-09-14 15:43:08 -07:00
BowenBao	43406e218a	[ONNX] Update ONNX shape inference (#43929 ) Summary: * Support sequence type (de)serialization, enables onnx shape inference on sequence nodes. * Fix shape inference with block input/output: e.g. Loop and If nodes. * Fix bugs in symbolic discovered by coverage of onnx shape inference. * Improve debuggability: added more jit logs. For simplicity, the default log level, when jit log is enabled, will not dump ir graphs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43929 Reviewed By: albanD Differential Revision: D23674604 Pulled By: bzinodev fbshipit-source-id: ab6aacb16d0e3b9a4708845bce27c6d65e567ba7	2020-09-14 15:36:19 -07:00
Ivan Kobzarev	89aed1a933	[vulkan][op] avg_pool2d (#42675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42675 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22978765 Pulled By: IvanKobzarev fbshipit-source-id: 64938d8965aeeb408dd5c40d688eca13fb7ebb8a	2020-09-14 15:07:34 -07:00
Ivan Kobzarev	8f327cd6c5	[vulkan][op] add.Scalar, mul.Scalar (#42674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42674 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22978763 Pulled By: IvanKobzarev fbshipit-source-id: 9fd97d394205e3fa51992ee99d5bfafc33f75efa	2020-09-14 15:03:22 -07:00
Ksenija Stanojevic	f7cfbac89b	[ONNX] Update len symbolic (#43824 ) Summary: Update len symbolic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43824 Reviewed By: izdeby Differential Revision: D23575765 Pulled By: bzinodev fbshipit-source-id: 0e5c8c8d4a5297f65e2dc43168993350f784c776	2020-09-14 15:00:44 -07:00
shubhambhokare1	da11d932bc	[ONNX] Update arange op to support out argument (#43777 ) Summary: Update arange op to support out argument Pull Request resolved: https://github.com/pytorch/pytorch/pull/43777 Reviewed By: albanD Differential Revision: D23674583 Pulled By: bzinodev fbshipit-source-id: 6fb65e048c6b1a551569d4d2a33223522d2a960c	2020-09-14 14:56:17 -07:00
neginraoof	62ebad4ff9	[ONNX] Export new_empty and new_zeros (#43506 ) Summary: Adding symbolic to export new_empty and new_zeros Pull Request resolved: https://github.com/pytorch/pytorch/pull/43506 Reviewed By: houseroad Differential Revision: D23674574 Pulled By: bzinodev fbshipit-source-id: ecfcdbd4845fd3a3c6618a060129fbeee4df5dd7	2020-09-14 14:48:34 -07:00
Zafar	d0a56cab07	[quant] Fixing the output shape for the linear (#44513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44513 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23637508 Pulled By: z-a-f fbshipit-source-id: d19d4c1b234b05e8d9813e864863d937b6c35bf5	2020-09-14 14:31:00 -07:00
Zafar	742654d1b6	[quant] ConvTranspose1d / ConvTranspose2d (#40371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40371 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22158981 Pulled By: z-a-f fbshipit-source-id: defbf6fbe730a58d5b155dcb2460dd969797215c	2020-09-14 14:25:06 -07:00
Akihiro Nitta	84949672bf	Fix exception chaining in `test/` (#44193 ) Summary: ## Motivation This PR fixes https://github.com/pytorch/pytorch/issues/43770 and is the continuation of https://github.com/pytorch/pytorch/issues/43836. ## Description of the change This PR fixes exception chaining only in files under `test/` where appropriate. To fix exception chaining, I used either: 1. `raise new_exception from old_exception` where `new_exception` itself seems not descriptive enough to debug or `old_exception` delivers valuable information. 2. `raise new_exception from None` where raising both of `new_exception` and `old_exception` seems a bit noisy and redundant. ## List of lines containing `raise` in `except` clause: I wrote [this simple script](https://gist.github.com/akihironitta/4223c1b32404b36c1b349d70c4c93b4d) using [ast](https://docs.python.org/3.8/library/ast.html#module-ast) to list lines where `raise`ing in `except` clause. - [x] `f8f35fddd4/test/test_cpp_extensions_aot.py (L16)` - [x] `f8f35fddd4/test/test_jit.py (L2503)` - [x] `f8f35fddd4/test/onnx/model_defs/word_language_model.py (L22)` - [x] `f8f35fddd4/test/onnx/verify.py (L73)` - [x] `f8f35fddd4/test/onnx/verify.py (L110)` - [x] `f8f35fddd4/test/onnx/test_verify.py (L31)` - [x] `f8f35fddd4/test/distributed/test_c10d.py (L255)` - [x] `f8f35fddd4/test/distributed/test_c10d.py (L2992)` - [x] `f8f35fddd4/test/distributed/test_c10d.py (L3025)` - [x] `f8f35fddd4/test/distributed/test_c10d.py (L3712)` - [x] `f8f35fddd4/test/distributed/test_distributed.py (L3180)` - [x] `f8f35fddd4/test/distributed/test_distributed.py (L3198)` - [x] `f8f35fddd4/test/distributed/test_data_parallel.py (L752)` - [x] `f8f35fddd4/test/distributed/test_data_parallel.py (L776)` - [x] `f8f35fddd4/test/test_type_hints.py (L151)` - [x] `f8f35fddd4/test/test_jit_fuser.py (L771)` - [x] `f8f35fddd4/test/test_jit_fuser.py (L773)` - [x] `f8f35fddd4/test/test_dispatch.py (L105)` - [x] `f8f35fddd4/test/test_distributions.py (L4738)` - [x] `f8f35fddd4/test/test_nn.py (L9824)` - [x] `f8f35fddd4/test/test_namedtensor.py (L843)` - [x] `f8f35fddd4/test/test_jit_fuser_te.py (L875)` - [x] `f8f35fddd4/test/test_jit_fuser_te.py (L877)` - [x] `f8f35fddd4/test/test_dataloader.py (L31)` - [x] `f8f35fddd4/test/test_dataloader.py (L43)` - [x] `f8f35fddd4/test/test_dataloader.py (L365)` - [x] `f8f35fddd4/test/test_dataloader.py (L391)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44193 Reviewed By: albanD Differential Revision: D23681529 Pulled By: malfet fbshipit-source-id: 7c2256ff17334625081137b35baeb816c1e53e0b	2020-09-14 14:20:16 -07:00
Alex Suhan	a188dbdf3f	Check for index-rank consistency in FunctionInliner (#44561 ) Summary: When caller / callee pairs are inserted into the mapping, verify that the arity of the buffer access is consistent with its declared rank. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44561 Test Plan: CI, test_tensorexpr --gtest_filter=TensorExprTest.DetectInlineRankMismatch Reviewed By: albanD Differential Revision: D23684342 Pulled By: asuhan fbshipit-source-id: dd3a0cdd4c2492853fa68381468e0ec037136cab	2020-09-14 14:07:22 -07:00
Rong Rong	b5dd6e3e61	split torch.testing._internal.* and add type checking for torch.testing._internal.common_cuda (#44575 ) Summary: First step to fix https://github.com/pytorch/pytorch/issues/42969. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44575 Reviewed By: malfet Differential Revision: D23668740 Pulled By: walterddr fbshipit-source-id: eeb3650b1780aaa5727b525b4e6182e1bc47a83f	2020-09-14 14:04:02 -07:00
mariosasko	cfba33bde3	Fix the ELU formula in the docs (#43764 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43389. This PR replaces the old ELU formula from the docs that yields wrong results for negative alphas with the new one that fixes the issue and relies on the cases notation which makes the formula more straightforward. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43764 Reviewed By: ailzhang Differential Revision: D23425532 Pulled By: albanD fbshipit-source-id: d0931996e5667897d926ba4fc7a8cc66e8a66837	2020-09-14 14:01:56 -07:00
Zafar	9d4943daaf	[quant] conv_transpose1d / conv_transpose2d (#40370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40370 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22158979 Pulled By: z-a-f fbshipit-source-id: f5cb812c9953efa7608f06cf0188de447f73f358	2020-09-14 13:45:28 -07:00
Rong Rong	ecac8294a6	enable type checking for torch._classes (#44576 ) Summary: Fix https://github.com/pytorch/pytorch/issues/42980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44576 Reviewed By: malfet Differential Revision: D23668741 Pulled By: walterddr fbshipit-source-id: 4201ea3187a40051ebff53d28c8e571ea1a61126	2020-09-14 13:26:46 -07:00
Raghavan Raman	ad7a2eb1c9	Simplify nested Min and Max patterns. (#44142 ) Summary: Improve simplification of nested Min and Max patterns. Specifically, handles the following pattern simplications: * `Max(A, Max(A, Const)) => Max(A, Const)` * `Max(Min(A, B), Min(A, C)) => Min(A, Max(B, C))` * `Max(Const, Max(A, OtherConst) => Max(A, Max(Const, OtherConst))` - This case can have an arbitrarily long chain of Max ops. For example: `Max(5, Max(x, Max(y, Max(z, 8)))) => Max(Max(Max(x, 8), y), z)` Similarly, for the case of Min as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44142 Reviewed By: albanD Differential Revision: D23644486 Pulled By: navahgar fbshipit-source-id: 42bd241e6c2af820566744c8494e5dee172107f4	2020-09-14 13:24:46 -07:00
Heitor Schueroff de Souza	199435af90	Update median doc to note return value of even-sized input (#44562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44562 Add a note that torch.median returns the smaller of the two middle elements for even-sized input and refer user to torch.quantile for the mean of the middle values. fixes https://github.com/pytorch/pytorch/issues/39520 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23657208 Pulled By: heitorschueroff fbshipit-source-id: 2747aa652d1e7f10229d9299b089295aeae092c2	2020-09-14 13:18:33 -07:00
Bram Wasti	a475613d1d	[static runtime] Swap to out-variant compatible nodes (#44127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44127 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D23604306 Pulled By: bwasti fbshipit-source-id: 18ccfb9b466b822e28130be3d5c4fae36c76820b	2020-09-14 12:38:25 -07:00
Elias Ellison	856510c96d	[JIT] Dont optimize shape info in batch_mm (#44565 ) Summary: We run remove profile nodes and specialize types before batch_mm, so we cannot run peepholes on the type information of tensors since these properties have not been guarded to be guaranteed to be correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44565 Reviewed By: albanD Differential Revision: D23661538 Pulled By: eellison fbshipit-source-id: 0dd23a65714f047f49b4db4ec582b21870925fe1	2020-09-14 12:34:20 -07:00
Marcin Juszkiewicz	e261e0953e	Fix centos8 gcc (#44644 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44198 properly this time Pull Request resolved: https://github.com/pytorch/pytorch/pull/44644 Reviewed By: albanD Differential Revision: D23684909 Pulled By: malfet fbshipit-source-id: cea6f6e2ae28138f6b93a6513d1abd36d14ae573	2020-09-14 12:28:09 -07:00
Yi Wang	ace81b6794	Remove an extra empty line in the warning comments. (#44622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44622 Remove an extra empty line in the warning comments.Remove an extra empty line. Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D23674070 fbshipit-source-id: 4ee570590c66a72fb808e9ee034fb773b833efcd	2020-09-14 11:15:35 -07:00
Heitor Schueroff de Souza	21a09ba94d	Fix lerp.cu bug when given discontiguous out tensor (#44559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44559 Please refer to the discussion at the bottom of https://github.com/pytorch/pytorch/pull/43541 about the bug. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23655403 Pulled By: heitorschueroff fbshipit-source-id: 10e4ce5c2fe7bf6e95bcfac4033202430292b03f	2020-09-14 11:03:02 -07:00
Natalia Gimelshein	95a69a7d09	adds list_gpu_processes function (#44616 ) Summary: per title, to make it easier to track the creation of stray contexts: ``` python -c "import torch; a=torch.randn(1, device='cuda'); print(torch.cuda.memory.list_gpu_processes(0)); print(torch.cuda.memory.list_gpu_processes(1))" GPU:0 process 79749 uses 601.000 MB GPU memory GPU:1 no processes are running ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44616 Reviewed By: mruberry Differential Revision: D23675739 Pulled By: ngimel fbshipit-source-id: ffa14cad9d7144e883de13b1c2c6817bd432f53a	2020-09-14 09:54:32 -07:00
Rong Rong	105132b891	Move ONNX circle ci build to torch and remove all caffe2 CI job/workflows (#44595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44595 Reviewed By: seemethere Differential Revision: D23670280 Pulled By: walterddr fbshipit-source-id: b32633912f6c8b4606be36b90f901e636567b355	2020-09-14 09:50:13 -07:00
Thomas Viehmann	bd257a17a1	Add HIP/ROCm version to collect_env.py (#44106 ) Summary: This adds HIP version info to the `collect_env.py` output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44106 Reviewed By: VitalyFedyunin Differential Revision: D23652341 Pulled By: zou3519 fbshipit-source-id: a1f5bce8da7ad27a1277a95885934293d0fd43c5	2020-09-14 09:19:18 -07:00
Jeremy Lilley	7040a070e3	[torch] Minor: Avoid ostreamstring in Operator's canonicalSchemaString() (#44442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44442 I noticed lock contention on startup as lookupByLiteral() was calling registerPendingOperators() - some calls were holding the lock for 10+ ms, as operators were being registered. canonicalSchemaString() was using ostreamstring, which isn't typically particularly fast (partly because of c++ spec locale requirements). If we repalce with regular c++ string appends, it's somewhat faster (which isn't hard when comparing with stringstream; albeit a bit more codegen) Over the first minute or so, this cuts out 1.4 seconds under the OperatorRegistry lock (as part of registerPendingOperators) in the first couple minutes of run time (mostly front-loaded) when running sync sgd. As an example, before: registerPendingOperators 12688 usec for 2449 operators After: registerPendingOperators 6853 usec for 2449 operators ghstack-source-id: 111862971 Test Plan: buck test mode/dev-nosan caffe2/test/cpp/... Reviewed By: ailzhang Differential Revision: D23614515 fbshipit-source-id: e712f9dac5bca0b1876e11fb8f0850402f03873a	2020-09-14 08:24:16 -07:00
kshitij12345	c68a99bd61	[numpy] Add `torch.exp2` (#44184 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 TODO * [x] Add tests * [x] Add docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/44184 Reviewed By: ngimel Differential Revision: D23674237 Pulled By: mruberry fbshipit-source-id: 7f4fb1900fad3051cd7fc9d3d7f6d985c5fb093c	2020-09-14 04:05:37 -07:00
Facebook Community Bot	870f647040	Automated submodule update: FBGEMM (#44581 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `0725301da5` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44581 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: dskhudia, VitalyFedyunin Differential Revision: D23665173 fbshipit-source-id: 03cee22335eef0517e561827795bbe2036942ea0	2020-09-13 21:26:56 -07:00
Victor Bittorf	68a5c361ae	Adding Adapative Autorange to benchmark utils. (#44607 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44219 Rebasing https://github.com/pytorch/pytorch/pull/44288 and fixing the git history. This allows users to bencmark code without having to specify how long to run the benchmark. It runs the benchmark until the variance (IQR / Median) is low enough that we can be confident in the measurement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44607 Test Plan: There are unit tests, and we manually tested using Examples posted in git. Reviewed By: robieta Differential Revision: D23671208 Pulled By: bitfort fbshipit-source-id: d63184290b88b26fb81c2452e1ae701c7d513d12	2020-09-13 20:55:40 -07:00
Peter Bell	8daaa3bc7e	Fix latex error in heaviside docs (#44481 ) Summary: This fixes a `katex` error I was getting trying to build the docs: ``` ParseError: KaTeX parse error: Undefined control sequence: \0 at position 55: …gin{cases} ``` This failure was introduced in https://github.com/pytorch/pytorch/issues/42523. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44481 Reviewed By: colesbury Differential Revision: D23627700 Pulled By: mruberry fbshipit-source-id: 9cc09c687a7d9349da79a0ac87d6c962c9cfbe2d	2020-09-13 16:42:19 -07:00
Nikolay Korovaiko	fe26102a0e	Enable TE in test_jit.py (#44200 ) Summary: Enable TE in test_jit.py and adjust/fix tests accordingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44200 Reviewed By: SplitInfinity Differential Revision: D23673624 Pulled By: Krovatkin fbshipit-source-id: 5999725c7aacc6ee77885eb855a41ddfb4d9a8d8	2020-09-13 15:58:20 -07:00
Martin Yuan	7862827269	[pytorch] Add variadic run_method for lite intepreter (#44337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44337 Add a new run_method to mobile Module which is variadic (takes any number of arguments) to match full jit. ghstack-source-id: 111909068 Test Plan: Added new unit test to test_jit test suite Reviewed By: linbinyu, ann-ss Differential Revision: D23585763 fbshipit-source-id: 007cf852290f03615b78c35aa6f7a21287ccff9e	2020-09-13 13:26:30 -07:00
Mikhail Zolotukhin	bcf97b8986	[JIT] Cleanup some places where we log graphs in executors. (#44588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44588 1) SOURCE_DUMP crashes when invoked on a backward graph since `prim::GradOf` nodes can't be printed as sources (they don't have schema). 2) Dumping graph each time we execute an optimized plan produces lots of output in tests where we run the graph multiple times (e.g. benchmarks). Outputting that on the least level of verbosity seems like an overkill. 3) Duplicated log statement is removed. Differential Revision: D23666812 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: b9a30e34fd39c85f3e13c3f1e3594e157e1c130f	2020-09-13 11:31:02 -07:00
Mikhail Zolotukhin	82da6b3702	[JIT] Fix jit-log verbosity selection logic. (#44587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44587 Currently it's skewed by one. The following test demonstrates it: ``` $ cat test.py import torch def foo(a,b): return aab torch._C._jit_set_profiling_executor(True) torch._C._jit_set_profiling_mode(True) torch._C._jit_override_can_fuse_on_cpu(True) torch._C._jit_set_texpr_fuser_enabled(True) f = torch.jit.script(foo) for _ in range(10): f(torch.rand(10), torch.rand(10)) $ cat test_logging_levels.sh PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser" python test.py 2>&1 \| grep DUMP >& /dev/null && echo OK \|\| echo FAIL PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser" python test.py 2>&1 \| grep UPDATE >& /dev/null && echo FAIL \|\| echo OK PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser" python test.py 2>&1 \| grep DEBUG >& /dev/null && echo FAIL \|\| echo OK PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser" python test.py 2>&1 \| grep DUMP >& /dev/null && echo OK \|\| echo FAIL PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser" python test.py 2>&1 \| grep UPDATE >& /dev/null && echo OK \|\| echo FAIL PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser" python test.py 2>&1 \| grep DEBUG >& /dev/null && echo FAIL \|\| echo OK PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser" python test.py 2>&1 \| grep DUMP >& /dev/null && echo OK \|\| echo FAIL PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser" python test.py 2>&1 \| grep UPDATE >& /dev/null && echo OK \|\| echo FAIL PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser" python test.py 2>&1 \| grep DEBUG >& /dev/null && echo OK \|\| echo FAIL ``` Before this change: ``` OK FAIL OK OK OK FAIL OK OK OK ``` With this change everthing passes. Differential Revision: D23666813 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 4adaa5a3d06deadf54eae014a0d76588cdc5e20a	2020-09-13 11:29:25 -07:00
Bert Maher	6d4a605ce9	Fix bug simplifying if-then-else when it can be removed (#44462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44462 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23671157 Pulled By: bertmaher fbshipit-source-id: b9b92ad0de1a7bd9bc1fcac390b542d885d0ca58	2020-09-13 10:29:28 -07:00
Mike Ruberry	7e91728f68	Deprecates calling linspace and logspace without setting steps explicitly (#43860 ) Summary: BC-breaking note This change is BC-breaking for C++ callers of linspace and logspace if they were providing a steps argument that could not be converted to an optional. PR note This PR deprecates calling linspace and logspace wihout setting steps explicitly by: - updating the documentation to warn that not setting steps is deprecated - warning (once) when linspace and logspace are called without steps being specified A test for this behavior is added to test_tensor_creation_ops. The warning only appears once per process, however, so the test would pass even if no warning were thrown. Ideally there would be a mechanism to force all warnings, include those from TORCH_WARN_ONCE, to trigger. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43860 Reviewed By: izdeby Differential Revision: D23498980 Pulled By: mruberry fbshipit-source-id: c48d7a58896714d184cb6ff2a48e964243fafc90	2020-09-13 06:09:19 -07:00
Natalia Gimelshein	e703c17967	Revert D23584071: [dper3] Create dper LearningRate low-level module Test Plan: revert-hammer Differential Revision: D23584071 (`a309355be3`) Original commit changeset: f6656531b1ca fbshipit-source-id: b0a93f4286053fb8576a70278edca3a7d89c722b	2020-09-12 20:45:30 -07:00
Brandon Lin	a309355be3	[dper3] Create dper LearningRate low-level module Summary: As title; this will unblock migration of several modules that need learning rate functionality. Test Plan: ``` buck test //dper3/dper3/modules/low_level_modules/tests:learning_rate_test ``` WIP: need to add more learning rate tests for the different policies Reviewed By: yf225 Differential Revision: D23584071 fbshipit-source-id: f6656531b1caba38c3e3a7d6e16d9591563391e2	2020-09-12 15:33:29 -07:00
Hector Yuen	0743d013a6	fuse layernorm + quantize (#44232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44232 enhance layernorm to optionally quantize its output add fusion code to replace instances of layernorm +quantization Test Plan: tested layernorm net_runner P141557987 Reviewed By: venkatacrc Differential Revision: D23510893 fbshipit-source-id: 32f57ba2090d35d86dcc951e0f3f6a8901ab3153	2020-09-12 13:32:33 -07:00
Abhinav Garlapati	6f2c3c39d2	Add SNPE deps for caffe2 benchmark android binary Summary: Adding snpe dependencies to caffe2_benchmark so that this can benchmark SNPE models on portal devices. Also need to change ndk_libcxx to gnustl till snpe is updated to work with ndk. Test Plan: Tested on top of the stack. Reviewed By: linbinyu Differential Revision: D23569397 fbshipit-source-id: a6281832804ed4fbb5a8406f436caeae1ff4fd2b	2020-09-12 12:34:56 -07:00
Jeff Daily	05c1f1d974	[ROCm] remove thrust workaround in ScanKernels (#44553 ) Summary: Remove ROCm workaround added in https://github.com/pytorch/pytorch/issues/39180. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44553 Reviewed By: mruberry Differential Revision: D23663988 Pulled By: ngimel fbshipit-source-id: 71b2fd7db006d9d3459b908a996c4d96838ba742	2020-09-11 21:12:43 -07:00
Gao, Xiang	d191caa3e7	Cleanup workarounds for compiler bug of ROCm (#44579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44579 Reviewed By: mruberry Differential Revision: D23664481 Pulled By: ngimel fbshipit-source-id: ef698f26455e5827c5b5c0e5d42a1c95bcac8af4	2020-09-11 21:10:33 -07:00
lixinyu	8641b55158	fix dangling ptr in embedding_bag (#44571 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44571 Test Plan: Imported from OSS Reviewed By: malfet, ngimel Differential Revision: D23661007 Pulled By: glaringlee fbshipit-source-id: e4a54acd0de55f275828c1d1289a1f069de07291	2020-09-11 20:40:44 -07:00
Yi Wang	82b4477948	Pass the input tensor vector by const reference. (#44340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44340 Changed the constructor of GradBucket to pass the input by const reference and hence avoided unnecessary explicit move semantics. Since previously the declaration and definition are separated, passing the input tensor vector by value looks quite bizarre. Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest Reviewed By: pritamdamania87 Differential Revision: D23569939 fbshipit-source-id: db761d42e76bf938089a0b38e98e76a05bcf4162	2020-09-11 18:03:56 -07:00
Yi Wang	ab5fee2784	Move the inline implementations of GradBucket class to the header. (#44339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44339 Moved the inline implementations of GradBucket class to the header for succinctness and readability. This coding style is also consistent with reducer.h under the same directory. Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest Reviewed By: pritamdamania87 Differential Revision: D23569701 fbshipit-source-id: 237d9e2c5f63a6bcac829d0fcb4a5ba3bede75e5	2020-09-11 18:01:37 -07:00
Elias Ellison	1f0dcf39fc	[JIT] dont optimize device dtype on inline (#43363 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/36404 Adding prim::device and prim::dtype to list of skipped peepholes when we run inlining. In the long term another fix may not be to encode shape / dtype info on the traced graph, because it is not guaranteed to be correct. This is blocked by ONNX currently. Partial fix for https://github.com/pytorch/pytorch/issues/43134 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43363 Reviewed By: glaringlee Differential Revision: D23383987 Pulled By: eellison fbshipit-source-id: 2e9c5160d39d690046bd9904be979d58af8d3a20	2020-09-11 17:29:54 -07:00
Mikhail Zolotukhin	d729e2965e	[TensorExpr] Do not inline autodiff graphs if they contain prim::TypeCheck nodes. (#44564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44564 Before this change we sometimes inlined autodiff subgraph containing fusion groups. This happened because we didn't look for 'unsupported' nodes recursively (maybe we should), but fusion groups were inside if-nodes. The problem was detected by bertmaher in 'LearningToPaint' benchmark investigation where this bug caused us to keep constantly hitting fallback paths of the graph. Test Plan: Imported from OSS Reviewed By: bwasti Differential Revision: D23657049 Pulled By: ZolotukhinM fbshipit-source-id: 7c853424f6dce4b5c344d6cd9c467ee04a8f167e	2020-09-11 17:28:53 -07:00
Nick Gibson	64b4307d47	[NNC] Cuda Codegen - mask loops bound to block/thread dimensions (#44325 ) Summary: Fix an issue where loops of different sizes are bound to the same Cuda dimension / metavar. Coming soon more info and tests... Pull Request resolved: https://github.com/pytorch/pytorch/pull/44325 Reviewed By: colesbury Differential Revision: D23628859 Pulled By: nickgg fbshipit-source-id: 3621850a4cc38a790b62ad168d32e7a0e2462fad	2020-09-11 16:48:16 -07:00
Nikita Shulga	2ae74c0632	Compile less legacy code when BUILD_CAFFE2 is set to False (take 2) (#44453 ) Summary: 2nd attempt to land https://github.com/pytorch/pytorch/pull/44079 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44453 Reviewed By: walterddr, seemethere Differential Revision: D23619528 Pulled By: malfet fbshipit-source-id: c7c206ebd327dcf3994789bd47008b05ff862fe7	2020-09-11 16:27:47 -07:00
Marcin Juszkiewicz	566b8d0650	handle missing NEON vst1__x2 intrinsics (#44198 ) (#44199 ) Summary: CentOS 8 on AArch64 has vld1_ intrinsics but lacks vst1q_f32_x2 one. This patch checks for it and handle it separately to vld1_* ones. Fixes https://github.com/pytorch/pytorch/issues/44198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44199 Reviewed By: seemethere Differential Revision: D23641273 Pulled By: malfet fbshipit-source-id: c2053c8e0427705eaeeeb82ec030925bff22623a	2020-09-11 16:02:44 -07:00
Yujun	db24c5c582	Change code coverage option name (#43999 ) Summary: According to [documentation](https://github.com/pytorch/pytorch/blob/master/tools/setup_helpers/cmake.py#L265), only options starts with `BUILD_` / `USE_` / `CMAKE_` in `CMakeLists.txt` can be imported by environment variables. --- This diff is originally intended to enable `c++` source coverage with `CircleCI` and `codecov.io`, but we will finish it in the future. You can find the related information in the diff history. Following is the originally procedur: Based on [this pull request](`1bda5e480c`), life becomes much easier for this time. 1.in `build.sh` - Enable coverage builld option for c++ - `apt-get install lcov` 2.in `test.sh` - run `lcov` 3.in `pytorch-job-specs.yml` - copy coverage.info to `test/` folder and upload it to codecov.io Pull Request resolved: https://github.com/pytorch/pytorch/pull/43999 Test Plan: Test on github Reviewed By: malfet Differential Revision: D23464656 Pulled By: scintiller fbshipit-source-id: b2365691f04681d25ba5c00293fbcafe8e8e0745	2020-09-11 15:55:05 -07:00
Jerry Zhang	b6f0ea0c71	[quant][graphmode][fx][fix] Remove qconfig in convert (#44526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44526 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23641960 fbshipit-source-id: 546da1c16694d1e1dfb72629085acaae2165e759	2020-09-11 15:51:47 -07:00
kshitij12345	42f9f2f38f	[fix] ReduceOps throw error if dim is repeated (#44281 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44273 TODO * [x] Add test Pull Request resolved: https://github.com/pytorch/pytorch/pull/44281 Reviewed By: zhangguanheng66 Differential Revision: D23569004 Pulled By: ezyang fbshipit-source-id: 1ca6523fef168c8ce252aeb7ca418be346b297bf	2020-09-11 15:34:06 -07:00
Yujun Zhao	f3a79b881f	add `lcov` to oss for beautiful html report (#44568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44568 By `lcov`, we can generate beautiful html. It's better than current file report and line report. Therefore in oss gcc, remove `export` code and `file/line level report` code, only use the html report. But in clang, since such tool is not available, we will still use file-report and line-report generated by ourself. Test Plan: Test in docker ubuntu machine. ## Mesurement 1. After running `atest`, it takes about 15 mins to collect code coverage and genrate the report. ``` # gcc code coverage python oss_coverage.py --run-only=atest ``` ## Presentation The html result looks like: Top Level: {F328330856} File Level: {F328336709} Reviewed By: malfet Differential Revision: D23550784 fbshipit-source-id: 1fff050e7f7d1cc8e86a6a200fd8db04b47f5f3e	2020-09-11 15:29:24 -07:00
Yujun Zhao	c2b40b056a	Filter default tests for `clang` coverage in oss Summary: Some tests like `test_dataloader.py` are not able to run under `clang` in oss, because it generates too large intermediate files (~40G) that can't be merged by `llvm`. Skip them when user doesn't specify the `--run-only` option Test Plan: Test locally. But still, not recomend user to run `clang` coverage in default mode, because it takes too much space. Reviewed By: malfet Differential Revision: D23549829 fbshipit-source-id: 0737e6e9dcbe3f38de00580ee6007906e743e52f	2020-09-11 15:28:15 -07:00
Jerry Zhang	a82ea6a91f	[quant][graphmode][fx][fix] Support None qconfig in convert (#44524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44524 None qconfig is not handled previously closes: https://github.com/pytorch/pytorch/issues/44438 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23640269 fbshipit-source-id: 8bfa88c8c78d4530338d9d7fa9669876c386d91f	2020-09-11 15:22:25 -07:00
Zafar	1fb5883072	removing conv filters from conv pattern matching (#44512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44512 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23637409 Pulled By: z-a-f fbshipit-source-id: ad5be0fa6accfbcceaae9171bf529772d87b4098	2020-09-11 15:16:29 -07:00
Caleb Thomas	dd4bbe1a79	Add iterator like functionality for DispatchKeySet (#44066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44066 Add STL Input iterator to DispatchKeySet: * Iterator is able to iterate from first not undefined DispatchKey to NumDispatchKeys. * Iterator is invalidated once underlying DispatchKeySet is invalidated Note see http://www.cplusplus.com/reference/iterator/ for comparisons of different iterators. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23611405 Pulled By: linux-jedi fbshipit-source-id: 131b287d60226a1d67a6ee0f88571f8c4d29f9c3	2020-09-11 15:08:15 -07:00
Richard Zou	e2bb34e860	Batched grad support for: slice, select, diagonal (#44505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44505 Added batching rules for slice_backward, select_backward, and diagonal_backward. Test Plan: - new tests: `pytest test/test_vmap.y -v -k "BatchedGrad"` Reviewed By: agolynski, anjali411 Differential Revision: D23650409 Pulled By: zou3519 fbshipit-source-id: e317609d068c88ee7bc07fab88b2b3acb8fad7e1	2020-09-11 14:59:58 -07:00
Richard Zou	7632484000	Add some batched gradient tests (#44494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44494 These tests check (most) operations that are useful for bayesian logistic regression (BLR) models. Said operators are basically those found in the log_prob functions of Distributions objects. This PR is not a general, structured solution for testing batched gradients (see "Alternative solution" for that), but I wanted to test a small subset of operations to confirm that the BLR use case works. There will be follow-up PRs implementing support for some missing operations for the BLR use case. Alternative solution ===================== Ideally, and in the future, I want to autogenerate tests from common_method_invocations and delete all of the manual tests introduced by this PR. However, if we were to do this now, we would need to store the following additional metadata somewhere: - operator name, supports_batched_grad, allow_vmap_fallback_usage We could store that metadata as a separate table from common_method_invocations, or add two columns to common_method_invocations. Either way that seems like a lot of work and the situation will get better once vmap supports batched gradients for all operators (on the fallback path). I am neutral between performing the alternative approach now v.s. just manually writing out some tests for these operations, so I picked the easier approach. Please let me know if you think it would be better to pursue the alternative approach now. Test Plan: - `pytest test/test_vmap.py -v -k "BatchedGrad"` Reviewed By: anjali411 Differential Revision: D23650408 Pulled By: zou3519 fbshipit-source-id: 2f26c7ad4655318a020bdaab5c767cd3956ea5eb	2020-09-11 14:59:54 -07:00
Wanchao Liang	ab6126b50e	[rpc][jit] support remote call in TorchScript (#43046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43046 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23621108 Pulled By: wanchaol fbshipit-source-id: e8152c6cdd3831f32d72d46ac86ce22f3f13c651	2020-09-11 14:59:51 -07:00
Wanchao Liang	3e5df5f216	[rpc][jit] support rpc_sync in TorchScript (#43043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43043 This add the support for rpc_sync in TorchScript in a way similar to rpc_async Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23252039 Pulled By: wanchaol fbshipit-source-id: 8a05329cb8a24079b2863178b73087d47273914c	2020-09-11 14:59:47 -07:00
Wanchao Liang	8bec7cfa91	[rpc] rename some functions (#43042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43042 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23228894 Pulled By: wanchaol fbshipit-source-id: 3702b7826ecb455073fabb9dc5dca804c0e092b2	2020-09-11 14:58:39 -07:00
Vasiliy Kuznetsov	70dfeb44bd	MinMax based observers: respect device affinity for state_dict (#44537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44537 Originally, the `min_val`, `max_val`, `min_vals`, `max_vals` attributes of observers were Tensors but not buffers. They had custom state_dict save/load code to ensure their state was saved. At some point, these attributes became buffers, and the custom save/load code remained. This introduced a subtle bug: * create model A, move it to a device (cpu/cuda) and save its state_dict * create model B, load its state dict. * `min_val\|min_vals\|max_val\|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device * the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices) In practice, the case people would sometimes hit is: * model A is on CPU, state dict is saved * model B is created and moved to GPU, state_dict from model A is loaded * assertions throw when operations are attempted across different devices This PR fixes the behavior by removing the custom save/load where possible and letting the default `nn.Module` save/load code handle device assignment. We special case `PerChannelMinMaxObserver` and its children to allow for loading buffers or different size, which is normal. There are some followups to also enable this for HistogramObserver and FakeQuantize, which can be done in separate PRs due to higher complexity. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23644493 fbshipit-source-id: 0dbb6aa309ad569a91a663b9ee7e44644080032e	2020-09-11 14:48:56 -07:00
Gregory Chanan	192c4111a3	Simplify target handling in nn gradcheck. (#44507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44507 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23635799 Pulled By: gchanan fbshipit-source-id: 75090d6a48771e5c92e737a0829fbfa949f7c8a7	2020-09-11 13:25:59 -07:00
Nikita Shulga	8a574c7104	[Cmake] Drop quotation marks around `$ENV{MAX_JOBS}` (#44557 ) Summary: Solves `the '-j' option requires a positive integer argument` error on some systems when MAX_JOBS is not defined Pull Request resolved: https://github.com/pytorch/pytorch/pull/44557 Reviewed By: vkuzo Differential Revision: D23653511 Pulled By: malfet fbshipit-source-id: 7d86fb7fb6c946c34afdc81bf2c3168a74d00a1f	2020-09-11 12:57:11 -07:00
Danny Huang	2b8f0b2023	[caffe2] adds Cancel to OperatorBase and NetBase (#44145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44145 ## Motivation * To be able to make C2 ops cancellable so we can safely exit. * Some C2 operators are now blocking thus being non-cancellable. If an error occurs we need to be able to safely stop all net execution so we can throw the exception to the caller. ## Summary * Adds `NetBase::Cancel()` to NetBase which iterates over the entire list of operators and call Cancel. * Cancel on all ops was added to Net since there's nothing Asyc specific about it. * `AsyncSchedulingNet` calls parent Cancel. * To preserve backwards compatibility, `AsyncSchedulingNet`'s Cancel still calls `CancelAndFinishAsyncTasks` . * Adds `Cancel()` to `OperatorBase`. Reviewed By: dzhulgakov Differential Revision: D23279202 fbshipit-source-id: e1bb0ff04a4e1393f935dbcac7c78c0baf728550	2020-09-11 12:50:26 -07:00
Gregory Chanan	5579b53a7f	Fix SmoothL1Loss when target.requires_grad is True. (#44486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44486 SmoothL1Loss had a completely different (and incorrect, see #43228) path when target.requires_grad was True. This PR does the following: 1) adds derivative support for target via the normal derivatives.yaml route 2) kill the different (and incorrect) path for when target.requires_grad was True 3) modify the SmoothL1Loss CriterionTests to verify that the target derivative is checked. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23630699 Pulled By: gchanan fbshipit-source-id: 0f94d1a928002122d6b6875182867618e713a917	2020-09-11 12:13:36 -07:00
Cheng Chang	b7ef4eec46	[NNC] Add loop slicing transforms (#43854 ) Summary: Add new transforms `sliceHead` and `sliceTail` to `LoopNest`, for example: Before transformation: ``` for x in 0..10: A[x] = x2 ``` After `sliceHead(x, 4)`: ``` for x in 0..4: A[x] = x2 for x in 4..10: A[x] = x2 ``` After `sliceTail(x, 1)`: ``` for x in 0..4: A[x] = x2 for x in 4..9: A[x] = x2 for x in 9..10: A[x] = x2 ``` `sliceHead(x, 10)` and `sliceTail(x, 10)` is no-op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43854 Test Plan: Tests are added in `test_loopnest.cpp`, the tests cover the basic transformations, and also tests the combination with other transformations such as `splitWithTail`. Reviewed By: nickgg Differential Revision: D23417366 Pulled By: cheng-chang fbshipit-source-id: 06c6348285f2bafb4be3286d1642bfbe1ea499bf	2020-09-11 12:09:12 -07:00
Ailing Zhang	39bb455e36	Update fallback kernel for Autograd keys. (#44349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44349 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23589807 Pulled By: ailzhang fbshipit-source-id: 0e4b0bf3e07bb4e35cbf1bda22f7b03193eb3dc4	2020-09-11 12:04:52 -07:00
Jerry Zhang	11fb51d093	[quant][graphmode][fx][fix] Support dictionary output (#44508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44508 Bug fix for dictionary output Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D23636182 fbshipit-source-id: 0c00cd6b9747fa3f8702d7f7a0d5edb31265f466	2020-09-11 11:29:20 -07:00
Ann Shan	442957d8b6	[pytorch] Remove mobile nonvariadic run_method (#44235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44235 Removes nonvariadic run_method() from mobile Module entirely (to be later replaced by a variadic version). All use cases should have been migrated to use get_method() and Method::operator() in D23436351 ghstack-source-id: 111848220 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D23484577 fbshipit-source-id: 602fcde61e13047a34915b509da048b9550103b1	2020-09-11 10:23:08 -07:00
Ann Shan	a61318a535	[pytorch] Replace mobile run_method with get_method and operator() (#44202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44202 In preparation for changing mobile run_method() to be variadic, this diff: * Implements get_method() for mobile Module, which is similar to find_method but expects the method to exist. * Replaces calls to the current nonvariadic implementation of run_method() by calling get_method() and then invoking the operator() overload on Method objects. ghstack-source-id: 111848222 Test Plan: CI, and all the unit tests which currently contain run_method that are being changed. Reviewed By: iseeyuan Differential Revision: D23436351 fbshipit-source-id: 4655ed7182d8b6f111645d69798465879b67a577	2020-09-11 10:23:06 -07:00
Guilherme Leobas	cdf5e2ae86	add typing annotations for a few torch.utils.* modules (#43806 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43431. Depends on [gh-43862](https://github.com/pytorch/pytorch/pull/43862) (EDIT: now merged) Modules: - torch.utils.mkldnn - torch.utils.mobile_optimizer - torch.utils.bundled_inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/43806 Reviewed By: gmagogsfm Differential Revision: D23635151 Pulled By: SplitInfinity fbshipit-source-id: a85b75a7927dde6cc55bcb361f8ff601ffb0b2a1	2020-09-11 10:20:55 -07:00
David Reiss	7d78a6fcdd	Update interpolate to use new upsample overloads (#43025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43025 - Use new overloads that better reflect the arguments to interpolate. - More uniform interface for upsample ops allows simplifying the Python code. - Also reorder overloads in native_functions.yaml to give them priority. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37177 ghstack-source-id: 106938111 Test Plan: test_nn has pretty good coverage. Relying on CI for ONNX, etc. Didn't test FC because this change is not forward compatible. To ensure backwards compatibility, I ran this code before this change ```python def test_func(arg): interp = torch.nn.functional.interpolate with_size = interp(arg, size=(16,16)) with_scale = interp(arg, scale_factor=[2.1, 2.2], recompute_scale_factor=False) with_compute = interp(arg, scale_factor=[2.1, 2.2]) return (with_size, with_scale, with_compute) traced_func = torch.jit.trace(test_func, torch.randn(1,1,1,1)) sample = torch.randn(1, 3, 7, 7) output = traced_func(sample) assert not torch.allclose(output[1], output[2]) torch.jit.save(traced_func, "model.pt") torch.save((sample, output), "data.pt") ``` then this code after this change ```python model = torch.jit.load("model.pt") sample, golden = torch.load("data.pt") result = model(sample) for r, g in zip(result, golden): assert torch.allclose(r, g) ``` Reviewed By: AshkanAliabadi Differential Revision: D21209991 fbshipit-source-id: 5b2ebb7c3ed76947361fe532d1dbdd6faa3544c8	2020-09-11 09:59:14 -07:00
David Reiss	df6ea62526	Add nondeterministic check to new upsample overloads Summary: I think these were missed due to a code landing race condition. Test Plan: Fixes CUDA tests with PR 43025 applied. Reviewed By: iseeyuan, AshkanAliabadi Differential Revision: D23639566 fbshipit-source-id: 1322d7708e246b075a66588e7e54f4e12092477f	2020-09-11 09:58:07 -07:00
Gregory Chanan	3de2c0b42f	Fix L1Loss when target.requires_grad is True. (#44471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44471 L1Loss had a completely different (and incorrect, see #43228) path when target.requires_grad was True. This PR does the following: 1) adds derivative support for target via the normal derivatives.yaml route 2) kill the different (and incorrect) path for when target.requires_grad was True 3) modify the L1Loss CriterionTests to verify that the target derivative is checked. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23626008 Pulled By: gchanan fbshipit-source-id: 2828be16b56b8dabe114962223d71b0e9a85f0f5	2020-09-11 09:51:16 -07:00
Brandon Lin	ea55820606	[dper3] Export PackSegments and UnpackSegments to Pytorch Summary: As title. Test Plan: ``` buck test //caffe2/caffe2/python/operator_test/:torch_integration_test -- test_pack_segments ``` Reviewed By: yf225 Differential Revision: D23610495 fbshipit-source-id: bd8cb61f2284a08a54091a4f982f01fcf681f215	2020-09-11 09:29:24 -07:00
Martin Yuan	b73b44f976	[PyTorch Mobile] Move some string ops to register_prim_ops.cpp and make them selective (#44500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44500 Some user models are using those operators. Unblock them while keep the ops selective. Test Plan: CI Reviewed By: linbinyu Differential Revision: D23634769 fbshipit-source-id: 55841d1b07136b6a27b6a39342f321638dc508cd	2020-09-11 09:24:35 -07:00
Rohan Varma	567c51cce9	In common_distributed, fix TEST_SKIPS multiprocessing manager (#44525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44525 Since `TEST_SKIPS` is a global multiprocessing.manager, this was causing issues when one test would fail and make the rest of the tests fail during setup due to networking errors. See the failed CI job: https://app.circleci.com/pipelines/github/pytorch/pytorch/212491/workflows/0450151d-ca09-4cf6-863d-272de6ed917f/jobs/7389065 for an example, where `test_ddp_backward` failed but then caused the rest of the tests to fail at the line `test_skips.update(TEST_SKIPS)`. To fix this issue, at the end of every test we revert `TEST_SKIPS` back to a regular dict, and redo the conversion to a `mulitiprocessing.Manager` in the next test, which prevents these errors. ghstack-source-id: 111844724 Test Plan: CI Reviewed By: malfet Differential Revision: D23641618 fbshipit-source-id: 27ce823968ece9804bb4dda898ffac43ef732b89	2020-09-11 09:16:33 -07:00
Gregory Chanan	d07d25a8c5	Fix MSELoss when target.requires_grad is True. (#44437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44437 MSELoss had a completely different (and incorrect, see https://github.com/pytorch/pytorch/issues/43228) path when target.requires_grad was True. This PR does the following: 1) adds derivative support for target via the normal derivatives.yaml route 2) kill the different (and incorrect) path for when target.requires_grad was True 3) modify the MSELoss CriterionTests to verify that the target derivative is checked. TODO: 1) do we still need check_criterion_jacobian when we run grad/gradgrad checks? 2) ensure the Module tests check when target.requires_grad 3) do we actually test when reduction='none' and reduction='mean'? Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23612166 Pulled By: gchanan fbshipit-source-id: 4f74d38d8a81063c74e002e07fbb7837b2172a10	2020-09-11 08:51:28 -07:00
gunandrose4u	9a3b83cbf2	Update submodule gloo to have latest commits to enable it can work on Windows (#44529 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/44529 Reviewed By: rohan-varma Differential Revision: D23650123 Pulled By: mrshenli fbshipit-source-id: b5b891cbcec51a14379d6604af63c714c32d93e7	2020-09-11 08:47:02 -07:00
guol-fnst	b6b1c01adf	torch.view_as_complex fails with segfault for a zero dimensional tensor (#44175 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44061 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44175 Reviewed By: colesbury Differential Revision: D23628103 Pulled By: anjali411 fbshipit-source-id: 6f70b5824150121a1617c0757499832923ae02b5	2020-09-11 08:35:49 -07:00
Shen Li	a9754fb860	Use TP Tensor.metadata to carry device info (#44396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44396 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D23602576 Pulled By: mrshenli fbshipit-source-id: c639789979b2b71fc165efbcf70f37b4c39469df	2020-09-11 08:33:22 -07:00
Shen Li	f44de7cdc3	Add missing rpc.shutdown() (#44417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44417 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D23626208 Pulled By: mrshenli fbshipit-source-id: 4ff8cad0e1193f99518804c21c9dd26ae718f4eb	2020-09-11 08:32:15 -07:00
lixinyu	77cc7d1ecd	C++ APIs Transformer NN Module Top Layer (#44333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44333 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23584010 Pulled By: glaringlee fbshipit-source-id: 990026e3f1b5ae276776e344ea981386cb7528fe	2020-09-11 08:25:27 -07:00
Tongzhou Wang	09892de815	Clarify track_running_stats docs; Make SyncBatchNorm track_running_stats behavior consistent (#44445 ) Summary: context: https://github.com/pytorch/pytorch/pull/38084 Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/44445 Reviewed By: colesbury Differential Revision: D23634216 Pulled By: mrshenli fbshipit-source-id: d1242c694dec0e7794651f8031327625eb9989ee	2020-09-11 08:20:34 -07:00
Nick Gibson	30fccc53a9	[NNC] Don't attempt to refactor conditional scalars (#44223 ) Summary: Fixes a bug in the NNC registerizer for Cuda where it would hoist reads out of a conditional context when trying to cache them. As a quick fix, prevent scalar replacement if a usage is within a condition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44223 Reviewed By: gchanan Differential Revision: D23551247 Pulled By: nickgg fbshipit-source-id: 17a7bf2be4c8c3dd8a9ab7997dce9aea200c3685	2020-09-11 04:22:16 -07:00
Zafar	c967e7724e	[quant] conv_transpose1d_prepack / conv_transpose1d_unpack (#40360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40360 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22158982 Pulled By: z-a-f fbshipit-source-id: 844d02806554aaa68b521283703e630cc544d419	2020-09-11 04:12:28 -07:00
Elias Ellison	8b8986662f	[JIT] Remove profiling nodes in autodiff forward graph (#44420 ) Summary: Previously we were not removing profiling nodes in graphs that required grad and contained diff graphs Pull Request resolved: https://github.com/pytorch/pytorch/pull/44420 Reviewed By: bertmaher Differential Revision: D23607482 Pulled By: eellison fbshipit-source-id: af095f3ed8bb3c5d09610f38cc7d1481cbbd2613	2020-09-11 02:59:39 -07:00
Mikhail Zolotukhin	c6febc6480	[JIT] Add a python hook for a function to interpret JIT graphs. (#44493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44493 This function allows to execute a graph exactly as it is, without going through a graph executor which would run passes on the graph before interpreting it. I found this feature extremely helpful when I worked on a stress-testing script to shake out bugs from the TE fuser: I needed to execute a very specific set of passes on a graph and nothing else, and then execute exactly it. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23632505 Pulled By: ZolotukhinM fbshipit-source-id: ea81fc838933743e2057312d3156b77284d832ef	2020-09-11 02:55:26 -07:00
Pritam Damania	51ed31269e	Replace FutureMessage with c10::ivalue::Future in DistEngine. (#44239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44239 As part of https://github.com/pytorch/pytorch/issues/41574, use c10::ivalue::Future everywhere in DistEngine. ghstack-source-id: 111645070 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D23553507 fbshipit-source-id: 1b51ba13d1ebfa6c5c70b12028e9e96ce8ba51ff	2020-09-11 01:03:42 -07:00
Xiao Wang	b5d75dddd9	Enable lerp on half type; fix output memory format (#43541 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43541 Reviewed By: zou3519 Differential Revision: D23499592 Pulled By: ezyang fbshipit-source-id: 9efdd6cbf0a334ec035ddd467667ba874b892549	2020-09-10 21:50:35 -07:00
Jerry Zhang	0c58a017bd	[quant][eagermode][refactor] Add set/get method for quantization and fusion mappings (#43990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43990 Allow user to register custom quantization and fusion patterns Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D23485344 fbshipit-source-id: 4f0174ee6d8000d83de0f73cb370e9a1941d54aa	2020-09-10 21:29:39 -07:00
Omkar Salpekar	f7278473d3	[NCCL] Fix NCCL_BLOCKING_WAIT functionality with Async Error Handling (#44411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44411 This basically aborts errored NCCL communicators if either blocking wait or async error handling is enabled. Otherwise we may abort nccl communicators where neither are enabled, and this may result in subsequent GPU operations using corrupted data. ghstack-source-id: 111839264 Test Plan: Succesful Flow run: f217591683 Reviewed By: jiayisuse Differential Revision: D23605382 fbshipit-source-id: 6c16f9626362be3b0ce2feaf0979b2dff97ce61b	2020-09-10 20:57:55 -07:00
Nikita Shulga	6ee41974e3	Speedup Linux nightly builds (#44532 ) Summary: `stdbuf` affects not only the process it launches, but all of its subprocessed, which have a very negative effects on the IPC communication between nvcc and c++ preprocessor, which results in 2x slowdown, for example: ``` $ time /usr/local/cuda/bin/nvcc /pytorch/aten/src/THC/generated/THCTensorMathPointwiseByte.cu -c ... real 0m34.623s user 0m31.736s sys 0m2.825s ``` but ``` time stdbuf -i0 -o0 -e0 /usr/local/cuda/bin/nvcc /pytorch/aten/src/THC/generated/THCTensorMathPointwiseByte.cu -c ... real 1m14.113s user 0m37.989s sys 0m36.104s ``` because OS spends lots of time transferring preprocessed source back to nvcc byte by byte, as requested via stdbuf call Pull Request resolved: https://github.com/pytorch/pytorch/pull/44532 Reviewed By: ngimel Differential Revision: D23643411 Pulled By: malfet fbshipit-source-id: 9fdaf8b8a49574e6b281f68a5dd9ba9d33464dff	2020-09-10 20:32:08 -07:00
Richard Zou	69f6d94caa	Register diag_backward, diagonal_backward, infinitetely...gelu_backward as operators (#44422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44422 See #44052 for context. Test Plan: - `pytest test/test_autograd.py -v` - `pytest test/test_nn.py -v` Reviewed By: mrshenli Differential Revision: D23607691 Pulled By: zou3519 fbshipit-source-id: 09fbcd66b877af4fa85fd9b2f851ed3912ce84d6	2020-09-10 18:43:18 -07:00
Richard Zou	7ff7e6cfc8	Register cummaxmin_backward, cumprod_backward as operators (#44410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44410 See #44052 for context. One of the cumprod_backward overloads was unused so I just deleted it. Test Plan: - `pytest test/test_autograd.py -v` Reviewed By: mrshenli Differential Revision: D23605503 Pulled By: zou3519 fbshipit-source-id: f9c5b595e62d2d6e71f26580ba96df15cc9de4f7	2020-09-10 18:43:15 -07:00
Richard Zou	08b431f54c	Add trace_backward, masked_select_backward, and take_backward as ops (#44408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44408 See #44052 for context. Test Plan: - `pytest test/test_autograd.py -v` Reviewed By: mrshenli Differential Revision: D23605504 Pulled By: zou3519 fbshipit-source-id: b9b1646d13caa6e536d08669c29bfc2ad8ff89a3	2020-09-10 18:41:07 -07:00
Rohan Varma	41f62b17e7	Fix DDP join() API in the case of model.no_sync() (#44427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44427 Closes https://github.com/pytorch/pytorch/issues/44425 DDP join API currently does not work properly with `model.no_sync()`, see https://github.com/pytorch/pytorch/issues/44425 for details. This PR fixes the problem via the approach mentioned in the issue, namely scheduling an allreduce that tells joined ranks whether to sync in the backwards pass or not. Tests are added for skipping gradient synchronization for various `sync_interval`s. ghstack-source-id: 111786479 Reviewed By: pritamdamania87 Differential Revision: D23609070 fbshipit-source-id: e8716b7881f8eee95e3e3499283e716bd3d7fe76	2020-09-10 18:31:40 -07:00
Peter Bell	129d52aef2	Fix uniqueness check in movedim (#44307 ) Summary: Noticed this bug in `torch.movedim` (https://github.com/pytorch/pytorch/issues/41480). [`std::unique`](https://en.cppreference.com/w/cpp/algorithm/unique) only guarantees uniqueness for _sorted_ inputs. The current check lets through non-unique values when they aren't adjacent to each other in the list, e.g. `(0, 1, 0)` wouldn't raise an exception and instead the algorithm fails later with an internal assert. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44307 Reviewed By: mrshenli Differential Revision: D23598311 Pulled By: zou3519 fbshipit-source-id: fd6cc43877c42bb243cfa85341c564b6c758a1bf	2020-09-10 17:41:07 -07:00
Mike Ruberry	c48f511c7e	Moves some of TestTorchMathOps to OpInfos (#44277 ) Summary: This PR fixes three OpInfo-related bugs and moves some functions from TestTorchMathOps to be tested using the OpInfo pattern. The bugs are: - A skip test path in test_ops.py incorrectly formatted its string argument - Decorating the tests in common_device_type.py was incorrectly always applying decorators to the original test, not the op-specific variant of the test. This could cause the same decorator to be applied multiple times, overriding past applications. - make_tensor was incorrectly constructing tensors in some cases The functions moved are: - asin - asinh - sinh - acosh - tan - atan - atanh - tanh - log - log10 - log1p - log2 In a follow-up PR more or all of the remaining functions in TestTorchMathOps will be refactored as OpInfo-based tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44277 Reviewed By: mrshenli, ngimel Differential Revision: D23617361 Pulled By: mruberry fbshipit-source-id: edb292947769967de9383f6a84eb327f027509e0	2020-09-10 17:31:50 -07:00
Mehdi Mirzazadeh	2e744b1820	Support work.result() to get result tensors for allreduce for Gloo, NCCL backends (#43970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43970 It is resubmition of #43386 Original commit changeset: 27fbeb161706 ghstack-source-id: 111775070 Test Plan: Added checks to existing unit test and ran it on gpu devserver. Verified the test that was failing in original diff also passes: https://app.circleci.com/pipelines/github/pytorch/pytorch/210229/workflows/86bde47b-f2da-48e3-a618-566ae2713102/jobs/7253683 Reviewed By: pritamdamania87 Differential Revision: D23455047 fbshipit-source-id: b8dc4a30b95570d68a482c19131674fff2a3bc7c	2020-09-10 17:13:37 -07:00
Nikita Shulga	91b16bff1e	Disable PyTorch iOS ARM64 builds until cert problem is fixed (#44499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44499 Reviewed By: seemethere, xta0 Differential Revision: D23634961 Pulled By: malfet fbshipit-source-id: e32ae29c42c351bcb4f48bc52d4082ae56545e5b	2020-09-10 16:24:11 -07:00
Ann Shan	1dd3fae3d2	[pytorch] Add logging to mobile Method run (#44234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44234 Changes mobile Method to point to a mobile Module directly instead of the Module ivalue in order to access metadata for logging/debugging, and then adds said logging. ghstack-source-id: 111775806 Test Plan: CI/existing unit tests to test BC Testing fb4a logging: Built fb4a on D23436351 (because usage of run_method isn't replaced yet in this diff), and then checked the Scuba logs to see that the appropriate ad clicks were logged (one ad for Buzzfeed shopping and another about Netflix from Bustle) {F328510687} {F328511201} [Scuba sample of QPL metrics](https://www.internalfb.com/intern/scuba/query/?dataset=qpl_metrics%2Fpytorch_employee&pool=uber&view=samples_client&drillstate=%7B%22sampleCols%22%3A[%22device_model%22%2C%22instance_id_sampled%22%2C%22method%22%2C%22ios_device_class%22%2C%22points_path%22%2C%22userid_sampled%22%2C%22client_sample_rate%22%2C%22browser_name%22%2C%22ios_device_name%22%2C%22points%22%2C%22is_employee%22%2C%22is_test_user%22%2C%22network_only_queries%22%2C%22annotations%22%2C%22oncall_shortname%22%2C%22environment_tags%22%2C%22revoked_queries%22%2C%22annotations_bool%22%2C%22points_data%22%2C%22annotations_double_array%22%2C%22annotations_string_array%22%2C%22revoked_steps%22%2C%22points_set%22%2C%22device_os_version%22%2C%22ota_version_rollout%22%2C%22steps%22%2C%22vadar_calculation_result%22%2C%22app_name%22%2C%22client_push_phase%22%2C%22vadar%22%2C%22release_channel%22%2C%22interaction_class%22%2C%22exposures%22%2C%22annotations_double%22%2C%22deviceid_sampled%22%2C%22is_logged_in%22%2C%22device_os%22%2C%22time%22%2C%22major_os_ver%22%2C%22annotations_int_array%22%2C%22duration_ns%22%2C%22app_build%22%2C%22bucket_id%22%2C%22cache_and_network_queries%22%2C%22value%22%2C%22vadar_v2%22%2C%22quicklog_event%22%2C%22unixname%22%2C%22vadar_calculation_result_v2%22%2C%22trace_tags%22%2C%22annotations_int%22%2C%22quicklog_module%22%2C%22push_phase%22%2C%22year_class%22%2C%22country%22%2C%22capped_duration%22%2C%22ram_class%22%2C%22weight%22%2C%22carrier%22%2C%22app_id%22%2C%22app_version%22%2C%22react_bundle_version%22%2C%22logging_source%22%2C%22is_unsampled_for_scuba%22%2C%22instrumentation_errors%22%2C%22android_cpu_abi_list%22%2C%22days_after_release%22%2C%22cpu_cores%22%2C%22user_bucket%22%2C%22quicklog_action%22%2C%22server_scuba_sample_rate%22%2C%22points_vector%22%2C%22annotations_bool_array%22%2C%22android_device_class%22%2C%22browser_full_version%22%2C%22major_app_ver%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22hideEmptyColumns%22%3Afalse%2C%22focused_event%22%3A%22%22%2C%22show_metadata%22%3A%22false%22%2C%22start%22%3A%222020-09-08%2011%3A27%3A00%22%2C%22end%22%3A%22start%20%2B%201%20minute%22%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22samplingRatio%22%3A%221%22%2C%22num_samples%22%3A%22100%22%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[]%2C%22modifiers%22%3A[]%2C%22order%22%3A%22none%22%2C%22order_desc%22%3Atrue%2C%22filterMode%22%3A%22DEFAULT%22%2C%22constraints%22%3A[[%7B%22column%22%3A%22quicklog_event%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22MOBILE_MODULE_STATS%5C%22]%22]%7D%2C%7B%22column%22%3A%22userid_sampled%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22100013484978975%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22metrik_view_params%22%3A%7B%22should_use_legacy_colors%22%3Afalse%2C%22columns_skip_formatting%22%3A[]%2C%22view%22%3A%22samples_client%22%2C%22width%22%3A%221358%22%2C%22height%22%3A%22912%22%2C%22tableID%22%3A%22qpl_metrics%2Fpytorch_employee%22%2C%22fitToContent%22%3Afalse%2C%22format_tooltip_in_percent%22%3Afalse%2C%22use_y_axis_hints_as_limits%22%3Atrue%2C%22has_dynamic_context_menu%22%3Atrue%2C%22has_context_menu%22%3Afalse%2C%22legend_mode%22%3A%22nongrid%22%2C%22connect_nulls%22%3Atrue%2C%22timezone_offset%22%3A420%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22y_min_hint%22%3A0%2C%22should_render_plugins_menu%22%3Afalse%7D%7D&normalized=1599581160) [Scuba sample showing ad source; just the bottom two results](https://www.internalfb.com/intern/scuba/query/?dataset=business_integrity_webpage_semantic&pool=uber&drillstate=%7B%22sampleCols%22%3A[%22from_custom_sampling%22%2C%22data_version%22%2C%22scribe_category_type%22%2C%22page_id%22%2C%22name%22%2C%22source_url%22%2C%22time%22%2C%22title_semantic%22%2C%22major_version%22%2C%22server_protocol%22%2C%22custom_sampling_enabled%22%2C%22ad_id%22%2C%22appversion%22%2C%22clienttime%22%2C%22isemployee%22%2C%22title%22%2C%22images%22%2C%22weight%22%2C%22carrier%22%2C%22is_ad%22%2C%22locale%22%2C%22appid%22%2C%22ip_country%22%2C%22iab_models%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22return_remainder%22%3Afalse%2C%22should_pivot%22%3Afalse%2C%22is_timeseries%22%3Afalse%2C%22hideEmptyColumns%22%3Afalse%2C%22main_dimension%22%3A%22time%22%2C%22start%22%3A%22-5%20minutes%22%2C%22samplingRatio%22%3A%221%22%2C%22compare%22%3A%22none%22%2C%22axes%22%3A%22linked%22%2C%22overlay_types%22%3A[]%2C%22minBucketSamples%22%3A%22%22%2C%22dimensions%22%3A[]%2C%22scale_type%22%3A%22absolute%22%2C%22num_samples%22%3A%22100%22%2C%22metric%22%3A%22avg%22%2C%22fill_missing_buckets%22%3A%22connect%22%2C%22smoothing_bucket%22%3A%221%22%2C%22top%22%3A%227%22%2C%22markers%22%3A%22%22%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22end%22%3A%22now%22%2C%22show_p95_ci%22%3Afalse%2C%22time_bucket%22%3A%22auto%22%2C%22compare_mode%22%3A%22normal%22%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[]%2C%22modifiers%22%3A[]%2C%22order%22%3A%22none%22%2C%22order_desc%22%3Atrue%2C%22filterMode%22%3A%22DEFAULT%22%2C%22constraints%22%3A[[%7B%22column%22%3A%22major_version%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22288%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22metrik_view_params%22%3A%7B%22should_use_legacy_colors%22%3Afalse%2C%22columns_skip_formatting%22%3A[]%2C%22view%22%3A%22time_view%22%2C%22width%22%3A%221358%22%2C%22height%22%3A%22912%22%2C%22tableID%22%3A%22business_integrity_webpage_semantic%22%2C%22fitToContent%22%3Afalse%2C%22format_tooltip_in_percent%22%3Afalse%2C%22use_y_axis_hints_as_limits%22%3Atrue%2C%22has_dynamic_context_menu%22%3Atrue%2C%22has_context_menu%22%3Afalse%2C%22legend_mode%22%3A%22nongrid%22%2C%22connect_nulls%22%3Atrue%2C%22timezone_offset%22%3A420%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22y_min_hint%22%3A0%2C%22should_render_plugins_menu%22%3Afalse%7D%7D&view=samples_client&normalized=1599587280) Reviewed By: iseeyuan Differential Revision: D23548687 fbshipit-source-id: 3e63085663f5fd8de90a4c7dbad0a17947aee973	2020-09-10 15:26:33 -07:00
Pritam Damania	a2a81e1335	Add a CONTRIBUTING.md for the distributed package. (#44224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44224 The purpose of this file is to help developers on PT distributed get upto speed on the code structure and layout for PT Distributed. ghstack-source-id: 111644842 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D23548377 fbshipit-source-id: 561d5b8e257642de172def8fdcc1311fae20690b	2020-09-10 14:58:00 -07:00
Nikita Shulga	4bead6438a	Enable torch.autograd typechecks (#44451 ) Summary: To help with further typing, move dynamically added native contributions from `torch.autograd` to `torch._C._autograd` Fix invalid error handling pattern in `89ac30afb8/torch/csrc/autograd/init.cpp (L13-L15)` `PyImport_ImportModule` already raises Python exception and nullptr should be returned to properly propagate the to Python runtime. And all native methods/types in `torch/autograd/__init.py` after `torch._C._init_autograd()` has been called Use f-strings instead of `.format` in test_type_hints.py Fixes https://github.com/pytorch/pytorch/issues/44450 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44451 Reviewed By: ezyang Differential Revision: D23618261 Pulled By: malfet fbshipit-source-id: fa5f739d7cff8410641128b55b810318c5f636ae	2020-09-10 13:37:29 -07:00
Elias Ellison	cc5a1cf616	[JIT] Erase shapes before fallback graph (#44434 ) Summary: Previously the specialized types were copied over to the fallback function, although the tensors in the fallback type were not of that type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44434 Reviewed By: SplitInfinity Differential Revision: D23611943 Pulled By: eellison fbshipit-source-id: 2ea88a97529409f6c5c4c1f59a14b623524933de	2020-09-10 12:07:31 -07:00
Vasiliy Kuznetsov	b3f0297a94	ConvPackedParams: remove legacy format (#43651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43651 This is a forward compatibility follow-up to https://github.com/pytorch/pytorch/pull/43086/. We switch the conv serialization to output the v2 format instead of the v1 format. The plan is to land this 1 - 2 weeks after the base PR. Test Plan: ``` python test/test_quantization.py TestSerialization.test_conv2d_graph_v2 python test/test_quantization.py TestSerialization.test_conv2d_nobias_graph_v2 ``` Imported from OSS Reviewed By: z-a-f Differential Revision: D23355480 fbshipit-source-id: 4cb04ed8b90a0e3e452297a411d641a15f6e625f	2020-09-10 11:47:34 -07:00
Colin L Reliability Rice	d232fec1f1	Partly fix cuda builds of dper broken by caffe2 c++ Summary: cuda builds using clang error out when building caffe2 due to an incorrect std::move This does not fix all known errors, but it's a step in the right direction. Differential Revision: D23626667 fbshipit-source-id: 7d9df886129f671ec430a166dd22e4af470afe1e	2020-09-10 11:37:49 -07:00
Yi Wang	38c10b4f30	[NCCL] Fix the initialization of futureNCCLCallbackStreams (#44347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44347 Cloned from Pull Request resolved: https://github.com/pytorch/pytorch/pull/44097, because the original author Sinan has completed the internship and now is unable to submit this diff. As johnsonpaul mentioned in D23277575 (`7d517cf96f`). It looks like all processes were allocating memory on GPU-ID=0. I was able to reproduce it by running `test_ddp_comm_hook_allreduce_with_then_hook_nccl` unit test of `test_c10d.py` and running `nvidia-smi` while test was running. The issue was reproduced as: ``` +-----------------------------------------------------------------------------+ \| Processes: GPU Memory \| \| GPU PID Type Process name Usage \| \|=============================================================================\| \| 0 3132563 C python 777MiB \| \| 0 3132564 C python 775MiB \| \| 4 3132564 C python 473MiB \| +-----------------------------------------------------------------------------+ ``` I realized that as we initialize ProcessGroupNCCL both processes were initially allocating memory on GPU 0. We later also realized that I forgot `isHighPriority` input of `getStreamFromPool` and `futureNCCLCallbackStreams_.push_back(std::make_shared<at::cuda::CUDAStream>(at::cuda::getStreamFromPool(device_index)));` was just creating a vector of GPU 0 streams. As i changed `at::cuda::getStreamFromPool(device_index)` to `at::cuda::getStreamFromPool(false, device_index)`. `nvidia-smi` looked like: ``` +-----------------------------------------------------------------------------+ \| Processes: GPU Memory \| \| GPU PID Type Process name Usage \| \|=============================================================================\| \| 0 673925 C python 771MiB \| \| 0 673926 C python 771MiB \| \| 1 673925 C python 771MiB \| \| 1 673926 C python 771MiB \| \| 2 673925 C python 771MiB \| \| 2 673926 C python 771MiB \| \| 3 673925 C python 771MiB \| \| 3 673926 C python 771MiB \| \| 4 673925 C python 771MiB \| \| 4 673926 C python 771MiB \| \| 5 673925 C python 771MiB \| \| 5 673926 C python 771MiB \| \| 6 673925 C python 771MiB \| \| 6 673926 C python 771MiB \| \| 7 673925 C python 707MiB \| \| 7 673926 C python 623MiB \| +-----------------------------------------------------------------------------+ ``` This confirms that we were just getting GPU 0 streams for the callback. I think this does not explain the `fp16_compress` stability issue, because we were able to reproduce that even without any then callback and just calling copy from fp32 to fp16 before allreduce. However, this can explain other issues where `allreduce` was not on par with `no_hook`. I'll run some additional simulations with this diff. I tried to to replace `getStreamFromPool` by `getDefaultCUDAStream(deviceIndex)` and it wasn't causing additional memory usage. In this diff, I temporarily solved the issue by just initializing null pointers for each device in the constructor and setting the callback stream for corresponding devices inside `ProcessGroupNCCL::getNCCLComm`. After the fix it looks like the memory issue was resolved: ``` +-----------------------------------------------------------------------------+ \| Processes: GPU Memory \| \| GPU PID Type Process name Usage \| \|=============================================================================\| \| 0 2513142 C python 745MiB \| \| 4 2513144 C python 747MiB \| +-----------------------------------------------------------------------------+ ``` I could use a dictionary instead of a vector for `futureNCCLCallbackStreams_`, but since number of devices is fixed, I think it isn't necessary. Please let me know what you think in the comments. ghstack-source-id: 111485483 Test Plan: `test_c10d.py` and some perf tests. Also check `nvidia-smi` while running tests to validate memory looks okay. This diff also fixes the regression in HPC tests as we register a hook: {F322730175} See https://fb.quip.com/IGuaAbD8 (`474fdd7e2d`)bnvy for details. Reviewed By: pritamdamania87 Differential Revision: D23495436 fbshipit-source-id: ad08e1d94343252224595d7c8a279fe75e244822	2020-09-10 11:25:38 -07:00
Kenichi Maehashi	cb90fef770	Fix return value of PyErr_WarnEx ignored (SystemError) (#44371 ) Summary: This PR fixes unexpected `SystemError` when warnings are emitted and warning filters are set. ## Current behavior ``` $ python -Werror >>> import torch >>> torch.range(1, 3) UserWarning: torch.range is deprecated in favor of torch.arange and will be removed in 0.5. Note that arange generates values in [start; end), not [start; end]. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> SystemError: <built-in method range of type object at 0x7f38c7703a60> returned a result with an error set ``` ## Expected behavior ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> UserWarning: torch.range is deprecated and will be removed in a future release because its behavior is inconsistent with Python's range builtin. Instead, use torch.arange, which produces values in [start, end). ``` ## Note Python exception must be raised if `PyErr_WarnEx` returns `-1` ([python docs](https://docs.python.org/3/c-api/exceptions.html#issuing-warnings)). This PR fixes warnings raised in the following code: ```py import torch torch.range(1, 3) torch.autograd.Variable().volatile torch.autograd.Variable().volatile = True torch.tensor(torch.tensor([])) torch.tensor([]).new_tensor(torch.tensor([])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44371 Reviewed By: mrshenli Differential Revision: D23598410 Pulled By: albanD fbshipit-source-id: 2fbcb13fe4025dbebaf1fd837d4c8e0944e05010	2020-09-10 10:15:21 -07:00
Hameer Abbasi	f9a0d0c21e	Allow Tensor-likes in torch.autograd.gradcheck (#43877 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42942 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43877 Reviewed By: zou3519 Differential Revision: D23493257 Pulled By: ezyang fbshipit-source-id: 6cdaabe17157b484e9491189706ccc15420ac239	2020-09-10 09:02:17 -07:00
Gregory Chanan	c8914afdfa	Merge criterion_tests and new_criterion_tests. (#44398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44398 These end up executing the same tests, so no reason to have them separate. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23600855 Pulled By: gchanan fbshipit-source-id: 0952492771498bf813f1bf8e1d7c8dce574ec965	2020-09-10 08:29:59 -07:00
Gregory Chanan	fa158c4ca6	Combine criterion and new criterion tests in test_jit. (#43958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43958 There is not any difference between these tests (I'm merging them), so let's merge them in the JIT as well. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23452337 Pulled By: gchanan fbshipit-source-id: e6d13cdb164205eec3dbb7cdcd0052b02c961778	2020-09-10 08:28:14 -07:00
Gregory Chanan	af9cad761a	Stop ignoring NotImplementedErrors in cuda CriterionTests. (#44381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44381 Perhaps this was necessary when the test was originally introduced, but it's difficult to figure out what is actually tested. And I don't think we actually use NotImplementedErorrs. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23598646 Pulled By: gchanan fbshipit-source-id: aa18154bfc4969cca22323e61683a301198823be	2020-09-10 08:18:33 -07:00
Alex	208ad45b4b	fix scripts (#44464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44464 Reviewed By: agolynski Differential Revision: D23624921 Pulled By: colesbury fbshipit-source-id: 72bed69edcf467a99eda9a3b97e894015c992dce	2020-09-10 08:13:48 -07:00
generatedunixname89002005287564	356aa54694	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D23621463 fbshipit-source-id: 1cd7e94e480c7073c9a0aad55aeba98de4b96164	2020-09-10 04:24:43 -07:00
Hector Yuen	6c98d904c0	handle the case of -0.0 on tanh quantization (#44406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44406 this fix makes fakelowp identical to hw - mask out the floating point number with 0x7fff so we are always dealing with positive numbers - dsp implementation is correct, ice-ref suffers from this same problem Test Plan: - tested with test_fusions.py, can't enable the test until the fix in ice-ref appears Reviewed By: venkatacrc Differential Revision: D23603878 fbshipit-source-id: a72d93a4bc811f98d1b5e82ddb204be028addfeb	2020-09-10 01:18:45 -07:00
Kurt Mohler	28a23fce4c	Deprecate torch.norm and torch.functional.norm (#44321 ) Summary: Part of https://github.com/pytorch/pytorch/issues/24802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44321 Reviewed By: mrshenli Differential Revision: D23617273 Pulled By: mruberry fbshipit-source-id: 6f88b5cb097fd0acb9cf0e415172c5a86f94e9f2	2020-09-10 01:16:41 -07:00
Chris Huynh	7b547f086f	To fix extra memory allocation when using circular padding (#39273 ) Summary: For fixing https://github.com/pytorch/pytorch/issues/39256 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39273 Reviewed By: anjali411 Differential Revision: D23471811 Pulled By: mruberry fbshipit-source-id: fb324b51baea765311715cdf14642b334f335733	2020-09-10 00:15:31 -07:00
Jeff Daily	65d4a6b7c0	[ROCm] fix cub hipify mappings (#44431 ) Summary: Fixes ROCm-specific workarounds introduced by https://github.com/pytorch/pytorch/issues/44259. This adds new hipify mappings that properly handle cub outside of caffe2 sources. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44431 Reviewed By: mrshenli Differential Revision: D23617417 Pulled By: ngimel fbshipit-source-id: 5d16afb6b8e6ec5ed049c51571866b0878d534ca	2020-09-09 23:39:25 -07:00
Cheng Chang	28bd4929bd	[NNC] Make it able to normalize loop with variable start (#44133 ) Summary: Loops with variable start can also be normalized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44133 Test Plan: updated testNormalizeStartVariable. Reviewed By: navahgar Differential Revision: D23507097 Pulled By: cheng-chang fbshipit-source-id: 4e9aad1cd4f4a839f59a00bf8ddf97637a1a6648	2020-09-09 23:05:57 -07:00
taiyuanz	c515881137	Add reset_grad() function (#44423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44423 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42754 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23010859 Pulled By: ngimel fbshipit-source-id: 56eec43eba88b98cbf714841813977c68f983564	2020-09-09 22:05:45 -07:00
Giuseppe Ottaviano	6324ef4ced	[caffe2] Speed up compilation of aten-op.cc (#44440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44440 `aten-op.cc` takes a long time to compile due to the large generated constructor. For each case, the `std::function` constructor and the initialization functions are inlined, producing a huge amount of intermediate code that takes a long time to optimize, given that many compiler optimization passes are superlinear in the function size. This diff moves each case to a separate function, so that each one is cheap to optimize, and the constructor is just a large jump table, which is easy to optimize. Reviewed By: dzhulgakov Differential Revision: D23593741 fbshipit-source-id: 1ce7a31cda10d9b0c9d799716ea312a291dc0d36	2020-09-09 21:21:48 -07:00
Meghan Lele	89ac30afb8	[JIT] Propagate type sharing setting to submodule compilation (#44226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44226 Summary At present, the `share_types` argument to `create_script_module` is used to decide whether to reuse a previously created type for a top-level module that has not yet been compiled. However, that setting does not apply to the compilation of submodules of the top-level module; types are still reused if possible. This commit modifies `create_script_module` so that the `share_types` flag is honoured during submodule compilation as well. Test Plan This commit adds a unit test to `TestTypeSharing` that checks that submodule types are not shared or reused when `share_types` is set to `False`. Fixes This commit fixes #43605. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23602371 Pulled By: SplitInfinity fbshipit-source-id: b909b8b6abbe3b4cb9be8319ac263ade90e83bd3	2020-09-09 20:06:35 -07:00
Meghan Lele	d3b6d5caf1	[JIT] Add support for del to TS classes (#44352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44352 Summary This commit adds support for `del` with class instances. If a class implements `__delitem__`, then `del class_instance[key]` is syntactic sugar for `class_instance.__delitem__[key]`. Test Plan This commit adds a unit test to TestClassTypes to test this feature. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23603102 Pulled By: SplitInfinity fbshipit-source-id: 28ad26ddc9a693a58a6c48a0e853a1c7cf5c9fd6	2020-09-09 19:52:35 -07:00
Gang Shen	058d7228ec	Expose the interface of nesterov of SGD Optimizer from caffe2 to dper Summary: Expose the interface of `nesterov` of SGD Optimizer from caffe2 to dper. dper sgd optimizer (https://fburl.com/diffusion/chpobg0h) has referred to NAG sgdoptimizer in caffe2: https://fburl.com/diffusion/uat2lnan. So just need to add the parameter 'nesterov' in dper sgd optimizer. Analysis of run resutls: N345540. - train_ne increases as momentum (m) decreases. - for m=0.95, 0.9: eval_ne is lower with NAG than production (no NAG, m = 0.95). - for m=0.99: eval_ne with or without NAG is higher than production. It indicates larger variance in validation and overfit in training (lower train_ne). Test Plan: 1. unit tests: `buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_sgd_without_nesterov` `buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_sgd_with_nesterov` . 1. build dper front end package: `flow-cli canary ads.dper3.workflows.sparse_nn.train --mode opt --entitlement ads_global --run-as-secure-group team_ads_ml_ranking`. The build result (refreshed) is here https://www.internalfb.com/intern/buck/build/2a368b55-d94b-45c1-8617-2753fbce994b. Flow package version is ads_dper3.canary:856b545cc6b249c0bd328f845adeb0d2. . 2. To build dper back end package: `flow-cli canary dper.workflows.dper3.train --mode opt --entitlement ads_global --run-as-secure-group team_ads_ml_ranking`. The build result (refreshed) is here: https://www.internalfb.com/intern/buck/build/70fa91cd-bf6e-4a08-8a4d-41e41a77fb52. Flow package version is aml.dper2.canary:84123a34be914dfe86b1ffd9925869de. . 3. Compare prod with NAG-enabled runs: a) refreshed prod run (m=0.95): f213877098 NAG enabled run (m=0.95): f213887113 . b) prod run (m=0.9): f214065288 NAG enabled run (m=0.9): f214066319 . c) prod run (m=0.99): f214065804 NAG enabled run (m=0.99): f214066725 . d) change date type of nestrov to `bool` and launched a validation run NAG enabled (m=0.95): f214500597 Reviewed By: ustctf Differential Revision: D23152229 fbshipit-source-id: 61703ef6b4e72277f4c73171640fb8afc6d31f3c	2020-09-09 19:37:00 -07:00
Danny Huang	5ee31308e6	[caffe2] exposes Net cancellation through pybind state (#44043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44043 To invoke `cancel` from the net instance in Python, we expose it through pybind state. Reviewed By: dzhulgakov Differential Revision: D23249660 fbshipit-source-id: 45a1e9062dca811746fcf2e5e42199da8f76bb54	2020-09-09 18:13:13 -07:00
Omkar Salpekar	e028ad0762	Fix HashStoreTests and move to Gtest (#43384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43384 Much like the FileStoreTests, the HashStoreTests were also run in a single blob and threw exceptions upon failure. This modularizes the test by separating each function into separate gtest test cases. ghstack-source-id: 111690834 Test Plan: Confirmed that the tests pass on devvm. Reviewed By: jiayisuse Differential Revision: D23257579 fbshipit-source-id: 7e821f0e9ee74c8b815f06facddfdb7dc2724294	2020-09-09 17:56:33 -07:00
Omkar Salpekar	69a3ff005d	Modularize FileStoreTest and move to Gtest (#43383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43383 FileStore Test currently has a large blob of tests that throw exceptions upon failure. This PR modularizes each test so they can run independently, and migrates the framework to gtest. ghstack-source-id: 111690831 Test Plan: Confirmed tests pass on devvm Reviewed By: jiayisuse Differential Revision: D22879473 fbshipit-source-id: 6fa5468e594a53c9a6b972757068dfc41645703e	2020-09-09 17:56:30 -07:00
Omkar Salpekar	a7fba7de22	Convert StoreTestUtils to Gtest (#43382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43382 StoreTestCommon defines standard helper functions that are used by all of our Store tests. These helpers currently throw exceptions upon failure, this PR changes them to use gtest assertions instead. ghstack-source-id: 111690833 Test Plan: Tested the 2 PR's above this on devvm Reviewed By: jiayisuse Differential Revision: D22828156 fbshipit-source-id: 9e116cf2904e05ac0342a441e483501e00aad3dd	2020-09-09 17:55:25 -07:00
Elias Ellison	b69c28d02c	Improving ModuleList indexing error msg (#43361 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/41946/, to suggest enumerating a module as an alternative if a user tries indexing into a modulelist/sequential with a non-integer literal Pull Request resolved: https://github.com/pytorch/pytorch/pull/43361 Reviewed By: mrshenli Differential Revision: D23602388 Pulled By: eellison fbshipit-source-id: 51fa28d5bc45720529b3d45e92d367ee6c9e3316	2020-09-09 16:22:57 -07:00
Rong Rong	c010ef7f0c	use non-overflowing divide in cuda kernel util GET_BLOCKS (#44391 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43476. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44391 Reviewed By: mrshenli Differential Revision: D23602424 Pulled By: walterddr fbshipit-source-id: 40ed81547f933194ce5bf4a5bcebdb3434298bc1	2020-09-09 16:20:41 -07:00
Xiaodong Wang	ba6ddaf04c	[pyper] export caffe2 bucketize GPU operator to pytorch Summary: Exporting the Bucketize operator on CUDA. Also adding unit test. Test Plan: buck test mode/dev-nosan caffe2/torch/fb/sparsenn:gpu_test -- test_bucketize Differential Revision: D23581321 fbshipit-source-id: 7f21862984c04d840410b8718db93006f526938a	2020-09-09 16:08:53 -07:00
Elias Ellison	e0c65abd38	Revert D23568330: [pytorch][PR] Moves some of TestTorchMathOps to OpInfos Test Plan: revert-hammer Differential Revision: D23568330 (`a953a825cc`) Original commit changeset: 03e69fccdbfd fbshipit-source-id: 04ec6843c5eb3c84ddf226dad0088172d9bed84d	2020-09-09 15:48:56 -07:00
Nikita Shulga	fc51047af5	Small fixes in Dependency.cmake and run_test.py (#44414 ) Summary: Do not add gencode flags to NVCC_FLAGS twice: First time they are added in `cmake/public/cuda.cmake` no need to do it again in `cmake/Dependencies.cmake` Copy `additional_unittest_args` before appending local options to it in `run_test()` method Pull Request resolved: https://github.com/pytorch/pytorch/pull/44414 Reviewed By: seemethere Differential Revision: D23605733 Pulled By: malfet fbshipit-source-id: 782a0da61650356a978a892fb03c66cb1a1ea26b	2020-09-09 15:09:33 -07:00
Lillian Johnson	b0bcdbb1ab	[JIT] Support partially specified sizes/strides in IRParser (#44113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44113 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23508149 Pulled By: Lilyjjo fbshipit-source-id: b6b2d32109fae599bc5347dae742b67a2e4a0a49	2020-09-09 14:45:51 -07:00
Zafar	3674264947	[quant] quantized path for ConstantPadNd (#43304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43304 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23231946 Pulled By: z-a-f fbshipit-source-id: 8c77f9a81f5a36c268467a190b5b954df0a8f5a4	2020-09-09 14:04:41 -07:00
lixinyu	032480d365	fix typo in embedding_bag_non_contiguous_weight test (#44382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44382 This is to fix a typo that introduced in #44032. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23601316 Pulled By: glaringlee fbshipit-source-id: 17d6de5900443ea46c7a6ee9c7614fe6f2d92890	2020-09-09 13:30:36 -07:00
Yuchen Huang	a00d36b0e7	[PyTorch][Mobile] Insert the module name as `name()` to metadata dict if metadata doesn't contain "model_name" (#44400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44400 This diff does the identical thing as D23549149 (`398409f072`) does. A fix included for OSS CI: pytorch_windows_vs2019_py36_cuda10.1_test1 ghstack-source-id: 111679745 Test Plan: - CI - OSS CI Reviewed By: xcheng16 Differential Revision: D23601050 fbshipit-source-id: 8ebdcd8fdc5865078889b54b0baeb397a90ddc40	2020-09-09 13:01:17 -07:00
Ailing Zhang	24efd29d19	Check commutativity for computed dispatch table and add a test to check entries. (#44088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44088 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23492793 Pulled By: ailzhang fbshipit-source-id: 37502f2a8a4d755219b400fcbb029e49d6cdb6e9	2020-09-09 12:48:34 -07:00
Omkar Salpekar	48c47db8fe	[NCCL] Add Environment Variable to guard Async Error Handling feature (#44163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44163 In this PR, we introduce a new environment variable (NCCL_ASYNC_ERROR_HANDLING), which guards the asynchronous error handling feature. We intend to eventually turn this feature on by default for all users, but this is a temporary solution so the change in behavior from hanging to crashing is not the default for users all of a sudden. ghstack-source-id: 111637788 Test Plan: CI/Sandcastle. We will turn on this env var by default in torchelastic and HPC trainer soon. Reviewed By: jiayisuse Differential Revision: D23517895 fbshipit-source-id: e7cd244b2ddf2dc0800ff7df33c73a6f00b63dcc	2020-09-09 12:26:25 -07:00
Omkar Salpekar	211ece7267	[NCCL] ProcessGroupNCCL Destructor Blocks on WorkNCCL Completion (#41054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41054 This Commit: ProcessGroupNCCL destructor now blocks until all WorkNCCL objects have either been aborted or completed and removed from the work vector. This Stack: The purpose of this stack is to fix the hanging behavior observed in when using PyTorch DDP training with NCCL. In various situations (desynchronization, high GPU utilization, etc.), NCCL collectives may hang due to waiting on an unresponsive worker. This stack detects such hanging behavior and aborts timed-out collectives by throwing a user-visible exception, all with minimal perf regression. Training can then be restarted from a previous checkpoint with something like torchelastic. ghstack-source-id: 111614314 Test Plan: 1. DDP Sanity Check: First we have a sanity check based on the PyTorch DDP benchmark. This verifies that the baseline DDP training with NCCL for standard CU workloads works well (esp. with standard models like Resnet50 and BERT). Here is a sample Flow: f213293473 1. HPC Performance Benchmarks: This stack has undergone thorough testing and profiling on the Training Cluster with varying number of nodes. This introduces 1-1.5% QPS regression only (~200-400 QPS regression for 8-64 GPUs). 1. HPC Accuracy Benchmarks: We've confirmed NE parity with the existing NCCL/DDP stack without this change. 1. Kernel-Specific Benchmarks: We have profiled other approaches for this system (such as cudaStreamAddCallback) and performed microbenchmarks to confirm the current solution is optimal. 1. Sandcastle/CI: Apart from the recently fixed ProcessGroupNCCL tests, we will also introduce a new test for desynchronization scenarios. Reviewed By: jiayisuse Differential Revision: D22054298 fbshipit-source-id: 2b95a4430a4c9e9348611fd9cbcb476096183c06	2020-09-09 12:26:22 -07:00
Omkar Salpekar	afbf2f140b	[NCCL] WorkNCCL Helper Functions (#41053 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41053 This Commit: Some minor refactoring - added helper to check if `WorkNCCL` objects have timed out. Adding a new finish function to ProcessGroupNCCL::WorkNCCL that avoids notifying CV and uses `lock_guard`. Also renaming the timeoutCVMutex mutex to be more descriptive. This Stack: The purpose of this stack is to fix the hanging behavior observed in when using PyTorch DDP training with NCCL. In various situations (desynchronization, high GPU utilization, etc.), NCCL collectives may hang due to waiting on an unresponsive worker. This stack detects such hanging behavior and aborts timed-out collectives by throwing a user-visible exception, all with minimal perf regression. Training can then be restarted from a previous checkpoint with something like torchelastic. ghstack-source-id: 111614315 Test Plan: See D22054298 for verification of correctness and performance Reviewed By: jiayisuse Differential Revision: D21943520 fbshipit-source-id: b27ee329f0da6465857204ee9d87953ed6072cbb	2020-09-09 12:26:18 -07:00
Omkar Salpekar	f8f7b7840d	[NCCL] Abort Errored and Timed Out NCCL Communicators from Watchdog Thread (#41052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41052 This Commit: Watchdog Thread checks for error-ed or timed out `WorkNCCL` objects and aborts all associated NCCL Communicators. For now, we also process these aborted communicators as with the existing Watchdog logic (by adding them to abortedCommIds and writing aborted communicator ids to the store.) This Stack: The purpose of this stack is to fix the hanging behavior observed in when using PyTorch DDP training with NCCL. In various situations (desynchronization, high GPU utilization, etc.), NCCL collectives may hang due to waiting on an unresponsive worker. This stack detects such hanging behavior and aborts timed-out collectives by throwing a user-visible exception, all with minimal perf regression. Training can then be restarted from a previous checkpoint with something like torchelastic. ghstack-source-id: 111614313 Test Plan: See D22054298 for verification of correctness and performance Reviewed By: jiayisuse Differential Revision: D21943151 fbshipit-source-id: 337bfcb8af7542c451f1e4b3dcdfc5870bdec453	2020-09-09 12:26:15 -07:00
Omkar Salpekar	4e5c55ef69	[NCCL] Use cudaEventQuery to Poll for GPU operation errors (#41051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41051 This Commit: In the workCleanupThread, we process completion and exception handling for workNCCL objects corresponding to collective calls that have either completed GPU Execution, or have already thrown an exception. This way, we throw an exception from the workCleanupThread for failed GPU operations. This approach replaces the previous (and lower performance) approach of enqueuing a callback on the CUDA stream to process failures. This Stack: The purpose of this stack is to fix the hanging behavior observed in when using PyTorch DDP training with NCCL. In various situations (desynchronization, high GPU utilization, etc.), NCCL collectives may hang due to waiting on an unresponsive worker. This stack detects such hanging behavior and aborts timed-out collectives by throwing a user-visible exception, all with minimal perf regression. Training can then be restarted from a previous checkpoint with something like torchelastic. ghstack-source-id: 111614319 Test Plan: See D22054298 for verification of correctness and performance Reviewed By: jiayisuse Differential Revision: D21938498 fbshipit-source-id: df598365031ff210afba57e0c7be865e3323ca07	2020-09-09 12:26:12 -07:00
Omkar Salpekar	1df24fd457	[NCCL] Timeout Loop Thread for Async Error Handling (#41050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41050 This Commit: We introduce a workVector to track live workNCCL objects corresponding to collective operations. Further, we introduce a workCleanupLoop, which busy-polls the vector of workNCCL objects and removes them upon completion. This Stack: The purpose of this stack is to fix the hanging behavior observed in when using PyTorch DDP training with NCCL. In various situations (desynchronization, high GPU utilization, etc.), NCCL collectives may hang due to waiting on an unresponsive worker. This stack detects such hanging behavior and aborts timed-out collectives by throwing a user-visible exception, all with minimal perf regression. Training can then be restarted from a previous checkpoint with something like torchelastic. Test Plan: See D22054298 for verification of correctness and performance Reviewed By: jiayisuse Differential Revision: D21916637 fbshipit-source-id: f8cadaab0071aaad1c4e31f9b089aa23cba0cfbe	2020-09-09 12:25:06 -07:00
Nikita Shulga	15cbd1cf4b	Preserve .ninja_log in build artifacts (#44390 ) Summary: Helpful for later analysis on the build time trends Also, same .whl files out of regular linux build job Pull Request resolved: https://github.com/pytorch/pytorch/pull/44390 Reviewed By: walterddr Differential Revision: D23602049 Pulled By: malfet fbshipit-source-id: 4d55c9aa2d161a7998ad991a3da0436da83f70ad	2020-09-09 12:19:46 -07:00
Xiao Wang	ef4475f902	[Reland] Optimize code path for adaptive_avg_pool2d when output size is (1, 1) (#44211 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/43986 DO NOT MERGE YET. XLA failure seems real. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44211 Reviewed By: mrshenli Differential Revision: D23590505 Pulled By: ngimel fbshipit-source-id: 6ee516b0995bfff6efaf740474c82cb23055d274	2020-09-09 12:08:14 -07:00
Mikhail Zolotukhin	37093f4d99	Benchmarks: make fuser and executor configurable from command line. (#44291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44291 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23569089 Pulled By: ZolotukhinM fbshipit-source-id: ec25b2f0bba303adaa46c3e85b1a9ce4fa3cf076	2020-09-09 11:59:35 -07:00
Venkata Chintapalli	364d03a67c	Misc. FakeLowP OSS cleanup (#44331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44331 [10:22 AM] Cherckez, Tal summary of issues(just to have a clear list): * std::clamp forces the user to use c++17 * using setting without given fails the test * avoid using max_examples for tests (Note: this ignores all push blocking failures!) Test Plan: https://www.internalfb.com/intern/testinfra/testconsole/testrun/6192449509073222/ Reviewed By: hyuen Differential Revision: D23581440 fbshipit-source-id: fe9fbc341f8fca02352f531cc622fc1035d0300c	2020-09-09 11:53:43 -07:00
mattip	758c2b96f5	BUG: make cholesky_solve_out do broadcast, error checking (#43137 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42695 test, fix `cholesky_solve_out` to use error checking and broadcasting from `cholesky_solve`. Test segfaults before, passes after the fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43137 Reviewed By: izdeby Differential Revision: D23568589 Pulled By: malfet fbshipit-source-id: 41b67ba964b55e59f1897eef0d96e0f6e1725bef	2020-09-09 11:38:36 -07:00
Nikita Shulga	683380fc91	Use compile time cudnn version if linking with it statically (#44402 ) Summary: This should prevent torch_python from linking the entire cudnn library statically just to query its version Pull Request resolved: https://github.com/pytorch/pytorch/pull/44402 Reviewed By: seemethere Differential Revision: D23602720 Pulled By: malfet fbshipit-source-id: 185b15b789bd48b1df178120801d140ea54ba569	2020-09-09 11:33:41 -07:00
Bert Maher	6ec8fabc29	Fix frac in CUDA fuser (#44152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44152 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23528506 fbshipit-source-id: bfd468d72fa55ce317f88ae83e1f2d5eee041aa0	2020-09-09 11:10:08 -07:00
Bert Maher	350130a69d	Prevent the TE fuser from getting datatypes it can't handle (#44160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44160 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D23528508 Pulled By: bertmaher fbshipit-source-id: 03b22725fb2666f441cb504b35397ea6d155bb85	2020-09-09 11:10:04 -07:00
Bert Maher	960c088a58	[te] Fix casting of unsigned char, and abs(int) (#44157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44157 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D23528507 Pulled By: bertmaher fbshipit-source-id: c5ef0422a91a4665b616601bed8b7cd137be39f9	2020-09-09 11:08:36 -07:00
Omkar Salpekar	7c464eed16	Skipping CUDA tests in ProcessGroupGloo and logs (#42488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42488 Currently, ProcessGroupGloo tests do not emit logs if the test was skipped due CUDA not being available/not enough CUDA devices. This PR clarifies the reason for skipping through these logs. ghstack-source-id: 111638111 Test Plan: tested on devvm and devgpu Reviewed By: jiayisuse Differential Revision: D22879396 fbshipit-source-id: d483ca46b5e22ed986521262c11a1c6dbfbe7efd	2020-09-09 10:52:52 -07:00
Michael Carilli	2a87742ffa	Autocast wrappers for RNN cell apis (#44296 ) Summary: Should fix https://github.com/pytorch/pytorch/issues/42605. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44296 Reviewed By: izdeby Differential Revision: D23580447 Pulled By: ezyang fbshipit-source-id: 86027b693fd2b648f043ab781b84ffcc1f72854d	2020-09-09 09:44:59 -07:00
Mike Ruberry	a953a825cc	Moves some of TestTorchMathOps to OpInfos (#44277 ) Summary: This PR fixes three OpInfo-related bugs and moves some functions from TestTorchMathOps to be tested using the OpInfo pattern. The bugs are: - A skip test path in test_ops.py incorrectly formatted its string argument - Decorating the tests in common_device_type.py was incorrectly always applying decorators to the original test, not the op-specific variant of the test. This could cause the same decorator to be applied multiple times, overriding past applications. - make_tensor was incorrectly constructing tensors in some cases The functions moved are: - asin - asinh - sinh - acosh - tan - atan - atanh - tanh - log - log10 - log1p - log2 In a follow-up PR more or all of the remaining functions in TestTorchMathOps will be refactored as OpInfo-based tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44277 Reviewed By: ngimel Differential Revision: D23568330 Pulled By: mruberry fbshipit-source-id: 03e69fccdbfd560217c34ce4e9a5f20e10d05a5e	2020-09-09 09:41:03 -07:00
Nikolay Korovaiko	f044b17ae2	Disable a test (#44348 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/44348 Reviewed By: mrshenli Differential Revision: D23592524 Pulled By: Krovatkin fbshipit-source-id: 349057606ce39dd5de24314c9ba8f40516d2ae1c	2020-09-09 08:36:19 -07:00
peter	cfd3620b76	Don't use VCOMP if Intel OMP is used (#44280 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44096. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44280 Reviewed By: malfet Differential Revision: D23568557 Pulled By: ezyang fbshipit-source-id: bd627e497a9f71be9ba908852bf3ae437b1a5c94	2020-09-09 08:12:34 -07:00
Alexander Grund	d23f3170ef	Remove pybind11 from required submodules (#44278 ) Summary: This can be taken from the system in which case it is not used from the submodule. Hence the check here limits the usage unnecessarily ccing malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/44278 Reviewed By: malfet Differential Revision: D23568552 Pulled By: ezyang fbshipit-source-id: 7fd2613251567f649b12eca0b1fe7663db9cb58d	2020-09-09 08:07:13 -07:00
Bert Maher	8acce55015	Dump optimized graph when logging in already-optimized PE (#44315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44315 I find it more intuitive to dump the optimized graph if we have one; when I first saw the unoptimized graph being dumped I thought we had failed to apply any optimizations. Test Plan: Observe output by hand Reviewed By: Lilyjjo Differential Revision: D23578813 Pulled By: bertmaher fbshipit-source-id: e2161189fb0e1cd53aae980a153aea610871662a	2020-09-09 01:28:48 -07:00
Taewook Oh	7a64b0c27a	Export Node::isBefore/isAfter for PythonAPI (#44162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44162 This diff exports Node::isBefore/isAfter method to PythonAPI. Test Plan: Tested locally. Please let me know if there is a set of unit tests to be passed. Reviewed By: soumith Differential Revision: D23514448 fbshipit-source-id: 7ef709b036370217ffebef52fd93fbd68c464e89	2020-09-09 00:57:08 -07:00
Xiaomeng Yang	135ebbde6d	[Caffe2] Add RMSNormOp (#44338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44338 Add RMSNormOp in Caffe2 Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:rms_norm_op_test Reviewed By: houseroad Differential Revision: D23546424 fbshipit-source-id: 8f3940a0bb42230bfa647dc66b5e359cc84491c6	2020-09-08 23:50:44 -07:00
Rohan Varma	106459acac	Rename test_distributed to test_distributed_fork (#42932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42932 Follow up from https://github.com/pytorch/pytorch/pull/41769, rename `test_distributed` to `test_distributed_fork` to make it explicit that it forks. New command to run test: `python test/run_test.py -i distributed/test_distributed_fork -v` ghstack-source-id: 111632568 Test Plan: `python test/run_test.py -i distributed/test_distributed_fork -v` Reviewed By: izdeby Differential Revision: D23072201 fbshipit-source-id: 48581688b6c5193a309e803c3de38e70be980872	2020-09-08 23:13:37 -07:00
Rohan Varma	b22abbe381	Enable test_distributed to work with spawn mode (#41769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41769 Currently the tests in `test_distributed` only work with the `fork` mode multiprocessing, this PR introduces support for `spawn` mode multiprocessing as well (while keeping the `fork` mode intact). Motivations for the change: 1) Spawn multiprocessing is the default on MacOS, so it better emulates how MacOS users would use distributed 2) With python 3.8+, spawn is the default on linux, so we should have test coverage for this 3) PT multiprocessing suggests using spawn/forkserver over fork, for sharing cuda tensors: https://pytorch.org/docs/stable/multiprocessing.html 4) Spawn is better supported with respect to certain sanitizers such as TSAN, so adding this sanitizer coverage may help us uncover issues. How it is done: 1) Move `test_distributed` tests in `_DistTestBase` class to a shared file `distributed_test` (similar to how the RPC tests are structured) 2) For `Barrier`, refactor the setup of temp directories, as the current version did not work with spawn, each process would get a different randomly generated directory and thus would write to different barriers. 3) Add all the relevant builds to run internally and in OSS. Running test_distributed with spawn mode in OSS can be done with: `python test/run_test.py -i distributed/test_distributed_spawn -v` Reviewed By: izdeby Differential Revision: D22408023 fbshipit-source-id: e206be16961fd80438f995e221f18139d7e6d2a9	2020-09-08 23:11:12 -07:00
Zafar	1d01fcdc24	[quant] fill_ path for quantized tensors (#43303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43303 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23231947 Pulled By: z-a-f fbshipit-source-id: fd5110ff15a073f326ef590436f8c6e5a2608324	2020-09-08 21:34:06 -07:00
Ailing Zhang	4aacfab221	Resolve Autograd key for disable_variable_dispatch flag. (#44268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44268 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23561042 Pulled By: ailzhang fbshipit-source-id: 6f35cd9a543bea3f9e294584f1db7c3622ebb741	2020-09-08 21:27:52 -07:00
Natalia Gimelshein	ecc6358dbe	Port nonzero cuda from THC to ATen (#44259 ) Summary: 1) Ports nonzero from THC to ATen 2) replaces most thrust uses with cub, to avoid synchronization and to improve performance. There is still one necessary synchronization point, communicating number of nonzero elements from GPU to CPU 3) slightly changes algorithm, now we first compute the number of nonzeros, and then allocate correct-sized output, instead of allocating full-sized output as was done before, to account for possibly all elements being non-zero 4) unfortunately, since the last transforms are still done with thrust, 2) is slightly beside the point, however it is a step towards a future without thrust 4) hard limits the number of elements in the input tensor to MAX_INT. Previous implementation allocated a Long tensor with the size ndimnelements, so that would be at least 16 GB for a tensor with MAX_INT elements. It is reasonable to say that larger tensors could not be used anyway. Benchmarking is done for tensors with approximately half non-zeros <details><summary>Benchmarking script</summary> <p> ``` import torch from torch.utils._benchmark import Timer from torch.utils._benchmark import Compare import sys device = "cuda" results = [] for numel in (1024 128,):#, 1024 * 1024, 1024 * 1024 * 128): inp = torch.randint(2, (numel,), device="cuda", dtype=torch.float) for ndim in range(2,3):#(1,4): if ndim == 1: shape = (numel,) elif ndim == 2: shape = (1024, numel // 1024) else: shape = (1024, 128, numel // 1024 // 128) inp = inp.reshape(shape) repeats = 3 timer = Timer(stmt="torch.nonzero(inp, as_tuple=False)", label="Nonzero", sub_label=f"number of elts {numel}", description = f"ndim {ndim}", globals=globals()) for i in range(repeats): results.append(timer.blocked_autorange()) print(f"\rnumel {numel} ndim {ndim}", end="") sys.stdout.flush() comparison = Compare(results) comparison.print() ``` </p> </details> ### Results Before: ``` [--------------------------- Nonzero ---------------------------] \| ndim 1 \| ndim 2 \| ndim 3 1 threads: ------------------------------------------------------ number of elts 131072 \| 55.2 \| 71.7 \| 90.5 number of elts 1048576 \| 113.2 \| 250.7 \| 497.0 number of elts 134217728 \| 8353.7 \| 23809.2 \| 54602.3 Times are in microseconds (us). ``` After: ``` [-------------------------- Nonzero --------------------------] \| ndim 1 \| ndim 2 \| ndim 3 1 threads: ---------------------------------------------------- number of elts 131072 \| 48.6 \| 79.1 \| 90.2 number of elts 1048576 \| 64.7 \| 134.2 \| 161.1 number of elts 134217728 \| 3748.8 \| 7881.3 \| 9953.7 Times are in microseconds (us). ``` There's a real regression for smallish 2D tensor due to added work of computing number of nonzero elements, however, for other sizes there are significant gains, and there are drastically lower memory requirements. Perf gains would be even larger for tensors with fewer nonzeros. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44259 Reviewed By: izdeby Differential Revision: D23581955 Pulled By: ngimel fbshipit-source-id: 0b99a767fd60d674003d83f0848dc550d7a363dc	2020-09-08 20:52:51 -07:00
Mikhail Zolotukhin	bd8e38cd88	[TensorExpr] Fuser: check node inputs' device before merging the node into a fusion group. (#44241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44241 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23554192 Pulled By: ZolotukhinM fbshipit-source-id: fb03262520303152b83671603e08e7aecc24f5f2	2020-09-08 19:32:23 -07:00
Supriya Rao	646ffd4886	[quant] Move EmbeddingBag eager quantization to static (#44217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44217 Move the tests to static ones as well Test Plan: python test/test_quantization.py TestStaticQuantizedModule.test_embedding_bag_api Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23547386 fbshipit-source-id: 41f81c31e1613098ecf6a7eff601c7dcd4b09c76	2020-09-08 19:05:02 -07:00
Supriya Rao	57b87aaf59	[quant] Add quantized Embedding module (#44208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44208 Add quantized module in static quantization namespace. Embedding quantization requires only weights to be quantized so it is static. Internally it calls the embedding_bag_byte op with the offsets set corresponding to the indices. Future PR will move EmbeddingBag quantization from dynamic to static as well. Test Plan: python test/test_quantization.py test_embedding_api Imported from OSS Reviewed By: vkuzo Differential Revision: D23547384 fbshipit-source-id: eddc6fb144b4a771060e7bab5853656ccb4443f0	2020-09-08 19:04:59 -07:00
Supriya Rao	6013a29fc0	[quant] Support quantization of embedding lookup operators (#44207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44207 Use existing embedding_bag operator but set offsets to [0, 1, .. len(indices)] Test Plan: python test/test_quantization.py TestEmbeddingOps.test_embedding_byte Imported from OSS Reviewed By: vkuzo Differential Revision: D23547385 fbshipit-source-id: ccce348bc192c6a4a65a8eca4c8b90f99f40f1b1	2020-09-08 19:03:59 -07:00
Jongsoo Park	f27be2f781	[caffe2] fix wrong comment (#42735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42735 We use reduced precision only for embedding table (not for momentum) in RowWiseSparseAdagrad Test Plan: . Reviewed By: jianyuh Differential Revision: D23003939 fbshipit-source-id: 062290d94b160100bc4c2f48b797833819f8e88a	2020-09-08 18:54:24 -07:00
Elias Ellison	f9146b4598	fix lint (#44346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44346 Reviewed By: jamesr66a Differential Revision: D23589324 Pulled By: eellison fbshipit-source-id: a4e22b69196909ec200ac3e262f04d2aaf78e9cf	2020-09-08 18:29:44 -07:00
Jerry Zhang	6269b6e0f0	[quant][graphmode][fx][api] Call fuse in prepare (#43984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43984 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23459261 fbshipit-source-id: 6b56b0916d76df67b9cc2f4be1fcee905d604019	2020-09-08 18:09:26 -07:00
Nick Gibson	be94dba429	[NNC] fix support for FP16 in CudaCodgen (#44209 ) Summary: Fixes a bug where FP16 values could be incorrectly cast to a half type that doesn't have a cast operator by inserting the cuda specific cast to float during handling of the Cast node, not as a wrapper around printing Loads and Stores. Two main changes: the HalfChecker now inserts the casts to float explicitly in the IR, and the PrioritizeLoad mutator now consumes both Loads and a Cast which immediately preceded a load. Tested with test_jit_fuser_te.py and test_tensorexpr.py, plus C++ tests obv. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44209 Reviewed By: izdeby Differential Revision: D23575577 Pulled By: nickgg fbshipit-source-id: 808605aeb2af812758f96f9fdc11b07e08053b46	2020-09-08 18:00:39 -07:00
Jerry Zhang	9f54bcc522	[quant][graphmode][fx] Support inplace option (#43983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43983 Support inplace option in apis Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23459260 fbshipit-source-id: 80409c7984f17d1a4e13fb1eece8e18a69ee43b3	2020-09-08 17:39:13 -07:00
Rong Rong	0351d31722	add rocm nightly build (#44250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44250 Reviewed By: izdeby Differential Revision: D23585431 Pulled By: walterddr fbshipit-source-id: c798707f5cb55f720e470bc40f30ab82718e0ddf	2020-09-08 17:09:32 -07:00
Iurii Zdebskyi	40d138f7c1	Added alpha overloads for add/sub ops with lists (#43413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43413 Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D23331896 Pulled By: izdeby fbshipit-source-id: 2e7484339fec533e21224f18979fddbeca649d2c	2020-09-08 17:02:08 -07:00
Vasiliy Kuznetsov	00b5bd536f	fx quant: add docblocks to _find_matches and _find_quants (#43928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43928 Improving readability, no logic change. Test Plan: CI Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23440249 fbshipit-source-id: a7ebfc7ad15c73e26b9a94758e7254413cc17d29	2020-09-08 16:13:11 -07:00
kshitij12345	6dd53fb58d	[fix] output of `embedding_bag` with non-contiguous weight (#44032 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43723 use weight.contiguous on fast-path as it expects contiguous tensor. TODO: * [x] Add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/44032 Reviewed By: izdeby Differential Revision: D23502200 Pulled By: glaringlee fbshipit-source-id: 4a7b546b3e8b1ad35c287a634b4e990a1ccef874	2020-09-08 16:07:13 -07:00
Jerry Zhang	43e38d60d6	[quant][graphmode][fx] Support quantize per channel in all cases (#44042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44042 Missed one case last time Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23479345 fbshipit-source-id: 30e6713120c494e9fab5584de4df9b25bec83d32	2020-09-08 15:45:14 -07:00
Yujun Zhao	49e979bfde	Set default compiler differently according to platform (#43890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43890 1. auto-detect `CXX` default compiler type in oss, and `clang` as default compiler type in fbcode (because auto-detecting will say `gcc` is the default compiler on devserver). 2. change `compiler type` from str `"CLANG" "GCC"` to enum type 3. rename function `get_cov_type` to `detect_compiler_type` 4. auto-set the default pytorch folder for users in oss Test Plan: on devserver: ``` buck run :coverage //caffe2/c10: ``` on oss: ``` python oss_coverage.py --run-only=atest ``` Reviewed By: malfet Differential Revision: D23420034 fbshipit-source-id: c0ea88188578bb1343a286f2090eb8a74cdf3982	2020-09-08 14:57:35 -07:00
James Reed	1fcccd6a18	[FX] Minor fixups in Graph printout (#44214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44214 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D23545501 Pulled By: jamesr66a fbshipit-source-id: dabb3b051ed4da213b2087979ade8a649288bd5d	2020-09-08 14:45:32 -07:00
Nikolay Korovaiko	47ac9bb105	Enable temp disabled tests in test_jit_fuser_te.py (#44222 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/44222 Reviewed By: izdeby Differential Revision: D23582214 Pulled By: Krovatkin fbshipit-source-id: 27caa3ea02ce10b163212f6a45a81b446898953d	2020-09-08 14:40:32 -07:00
Sujoy Saraswati	54931ebb7b	Release saved variable from DifferentiableGraphBackward (#42994 ) Summary: When the backward ops execute via the autograd engine evaluate_function(), the fn.release_variables() is called to release the SavedVariables. For the eager mode ops, this releases the saved inputs that was required for backward grad function. However, with TorchScript, we get a DifferentableGraph and the DifferentiableGraphBackward() doesn't implement a release_variables(). This leads to the SavedVariables to be alive longer. Implement release_variables() for DifferentiableGraphBackward to release these SavedVariables early. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42994 Reviewed By: izdeby Differential Revision: D23503172 Pulled By: albanD fbshipit-source-id: d87127498cfa72883ae6bb31d0e6c7056c4c36d4	2020-09-08 14:36:52 -07:00
Mike Ruberry	63d62d3e44	Skips test_addcmul_cuda if using ROCm (#44304 ) Summary: This test is failing consistently on linux-bionic-rocm3.7-py3.6-test2. Relevant log snippet: ``` 03:43:11 FAIL: test_addcmul_cuda_float16 (__main__.TestForeachCUDA) 03:43:11 ---------------------------------------------------------------------- 03:43:11 Traceback (most recent call last): 03:43:11 File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 818, in wrapper 03:43:11 method(args, kwargs) 03:43:11 File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 258, in instantiated_test 03:43:11 result = test(self, args) 03:43:11 File "test_foreach.py", line 83, in test_addcmul 03:43:11 self._test_pointwise_op(device, dtype, torch._foreach_addcmul, torch._foreach_addcmul_, torch.addcmul) 03:43:11 File "test_foreach.py", line 58, in _test_pointwise_op 03:43:11 self.assertEqual(tensors, expected) 03:43:11 File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1153, in assertEqual 03:43:11 exact_dtype=exact_dtype, exact_device=exact_device) 03:43:11 File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1127, in assertEqual 03:43:11 self.assertTrue(result, msg=msg) 03:43:11 AssertionError: False is not true : Tensors failed to compare as equal! With rtol=0.001 and atol=1e-05, found 10 element(s) (out of 400) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.00048828125 (-0.46484375 vs. -0.46533203125), which occurred at index (11, 18). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44304 Reviewed By: malfet, izdeby Differential Revision: D23578316 Pulled By: mruberry fbshipit-source-id: 558eecf42677383e7deaa4961e12ef990ffbe28c	2020-09-08 13:14:25 -07:00
Nikita Shulga	de89261abe	Reduce `sccache` log levels for RocM to a default state (#44310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44310 Reviewed By: walterddr Differential Revision: D23576966 Pulled By: malfet fbshipit-source-id: c7fa063ec2be92de8f3768aaa3e6a032913004f7	2020-09-08 12:55:23 -07:00
Edward Yang	477f489137	Don't register a fallback for private use to let extensions do it themselves (#44149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44149 Thanks Christian Puhrsch for reporting. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D23574739 Pulled By: ezyang fbshipit-source-id: 8c9d0d78e6970139e0103cd1e0004b743e3c7f9e	2020-09-08 12:30:26 -07:00
Meghan Lele	caf23d110f	[JIT] Unshare types for modules that define() in __init__ (#44233 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44233 Summary By default, scripting tries to share concrete and JIT types across compilations. However, this can lead to incorrect results if a module extends `torch.jit.ScriptModule`, and injects instance variables into methods defined using `define`. This commit detects when this has happened and disables type sharing for the compilation of the module that uses `define` in `__init__`. Test Plan This commit adds a test to TestTypeSharing that tests this scenario. Fixes This commit fixes #43580. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23553870 Pulled By: SplitInfinity fbshipit-source-id: d756e87fcf239befa0012998ce29eeb25728d3e1	2020-09-08 12:16:45 -07:00
James Reed	4e0ac120e9	[FX] Only copy over training attr if it\'s there (#44314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44314 Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D23578189 Pulled By: jamesr66a fbshipit-source-id: fb7643f28582bd5009a826663a937fbe188c50bc	2020-09-08 11:50:08 -07:00
Vasiliy Kuznetsov	fd8e2064e0	quant: switch observers to use min_max (#42957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42957 Switches observers to use the new min_max function to calculate min and max at the same time. We see around 45-50% speedup on representative input shapes on the microbenchmarks for all observers except `HistogramObserver`. Test Plan: CI for correctness performance: ``` cd benchmarks/operator_benchmark // repeat (before diff, after diff) x (cpu, cuda) python -m pt.qobserver_test --tag_filter all --device cpu /* * before, cpu: https://our.intern.facebook.com/intern/paste/P138633280/ * before, cuda: https://our.intern.facebook.com/intern/paste/P138639473/ * after, cpu: https://our.intern.facebook.com/intern/paste/P138635458/ * after, cuda: https://our.intern.facebook.com/intern/paste/P138636344/ */ ``` Imported from OSS Reviewed By: supriyar Differential Revision: D23093995 fbshipit-source-id: 9f416d144109b5b80baf089eb4bcfabe8fe358d5	2020-09-08 11:39:44 -07:00
Huamin Li	de980f937b	skip test_tanhquantize for now (#44312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44312 This test is failing now when running on card. Let's disable it while Intel is investigating the issue. Test Plan: Sandcastle Reviewed By: hyuen Differential Revision: D23577475 fbshipit-source-id: 84f957c69ed75e0e0f563858b8b8ad7a2158da4e	2020-09-08 11:21:41 -07:00
Rong Rong	8d212d3f7a	add 'run_duration' stats for binary builds to scuba (#44251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44251 Reviewed By: seemethere Differential Revision: D23575312 Pulled By: walterddr fbshipit-source-id: 29d737f5bee1540d6595d4d0ca1386b9ce5ab2ee	2020-09-08 11:13:00 -07:00
Facebook Community Bot	1130de790c	Automated submodule update: FBGEMM (#44177 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `d5ace7ca70` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44177 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: dskhudia Differential Revision: D23533561 fbshipit-source-id: 9e580f8dbfb83e57bebc28f8e459caa0c5fc7317	2020-09-08 10:12:21 -07:00
Brandon Lin	5de805d8a7	[dper3] Export Caffe2 operator LearningRate to PyTorch Summary: Exports the operator to PyTorch, to be made into a low-level module. Test Plan: ``` buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_learning_rate ``` Reviewed By: yf225 Differential Revision: D23545582 fbshipit-source-id: 6b6d9aa6a47b2802ccef0f87c1263c6cc2d2fdf6	2020-09-08 08:50:09 -07:00
Iurii Zdebskyi	cce5982c4c	Add unary ops: exp and sqrt (#42537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42537 [First PR: Add private API to support tensor lists: _foreach_add(TensorList tensors, Scalar scalar)](https://github.com/pytorch/pytorch/pull/41554). Motivation [GitHub issue](https://github.com/pytorch/pytorch/issues/38655) Current PyTorch optimizer implementations are not efficient in cases when we work with a lot of small feature tensors. Starting a lot of kernels slows down the whole process. We need to reduce the number of kernels that we start. As an example, we should be looking at [NVIDIAs Apex](https://github.com/NVIDIA/apex). In order to track progress, we will pick PyTorchs DCGAN model with Adam optimizer and once the optimizer is reimplemented with tensor lists, benchmark the model performance against original model version, Apexs version with original Adam optimizer and it’s FusedAdam optimizer. Current API restrictions - List can't be empty (will fixed in upcoming PRs). - All tensors in the list must have the same dtype, device and size. Broadcasting At this point we don't support broadcasting. What is 'Fast' and 'Slow' route In particular cases, we cant process an op with a fast list CUDA kernel. Still, we can do with a regular for-loop where the op will be applied to each tensor individually through the dispatch mechanisms. There are a few checks that decide whether the op will be performed via a 'fast' or 'slow' path. To go the fast route, - All tensors must have strided layout - All tensors must be dense and not have overlapping memory - The resulting tensor type must be the same. ---------------- In this PR Adding APIs: ``` torch._foreach_exp(TensorList tl1) torch._foreach_exp_(TensorList tl1) torch._foreach_sqrt(TensorList tl1) torch._foreach_sqrt_(TensorList tl1) ``` Tests Tested via unit tests TODO 1. Properly handle empty lists 2. Properly handle bool tensors Plan for the next PRs 1. APIs - Pointwise Ops 2. Complete tasks from TODO 3. Rewrite PyTorch optimizers to use for-each operators for performance gains. Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D23331889 Pulled By: izdeby fbshipit-source-id: 8b04673b8412957472ed56361954ca3884eb9376	2020-09-07 19:57:34 -07:00
Mikhail Zolotukhin	6134ac17ba	Revert D23561500: Benchmarks: re-enable profiling-te configuration (try 2). Test Plan: revert-hammer Differential Revision: D23561500 (`589a2024c8`) Original commit changeset: 7fe86d34afa4 fbshipit-source-id: 10e48f230402572fcece56662ad4413ac0bd3cb5	2020-09-07 19:10:30 -07:00
Thomas Viehmann	7c61f57bec	test_ops: skipTest only takes a single argument (#44181 ) Summary: Fixes a broken skipTest from https://github.com/pytorch/pytorch/issues/43451, e.g. in the ROCm CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44181 Reviewed By: ngimel Differential Revision: D23568608 Pulled By: malfet fbshipit-source-id: 557048bd5f0086ffac38d1c48255badb63869899	2020-09-07 18:32:59 -07:00
Nikita Shulga	0e64b02912	FindCUDA error handling (#44236 ) Summary: Check return code of `nvcc --version` and if it's not zero, print warning and mark CUDA as not found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44236 Test Plan: Run `CUDA_NVCC_EXECUTABLE=/foo/bar cmake ../` Reviewed By: ezyang Differential Revision: D23552336 Pulled By: malfet fbshipit-source-id: cf9387140a8cdbc8dab12fcc4bfaf55ae8e6a502	2020-09-07 18:17:55 -07:00
Mikhail Zolotukhin	5d748e6d22	[TensorExpr] Re-enable tests. (#44218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44218 Differential Revision: D23546100 Test Plan: Imported from OSS Reviewed By: ngimel Pulled By: ZolotukhinM fbshipit-source-id: 4c4c5378ec9891ef72b60ffb59081a009e0df049	2020-09-07 15:52:03 -07:00
Mikhail Zolotukhin	589a2024c8	Benchmarks: re-enable profiling-te configuration (try 2). (#44270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44270 The previous PR (#44212) was reverted since I didn't update the `upload_scribe.py` script and it was looking for 'executor_and_fuser' field in the json which now is replaced with two separate fields: 'executor' and 'fuser'. Differential Revision: D23561500 Test Plan: Imported from OSS Reviewed By: ngimel Pulled By: ZolotukhinM fbshipit-source-id: 7fe86d34afa488a0e43d5ea2aaa7bc382337f470	2020-09-07 15:50:39 -07:00
Iurii Zdebskyi	10dd25dcd1	Add binary ops for _foreach APIs (#42536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42536 [First PR: Add private API to support tensor lists: _foreach_add(TensorList tensors, Scalar scalar)](https://github.com/pytorch/pytorch/pull/41554). Motivation [GitHub issue](https://github.com/pytorch/pytorch/issues/38655) Current PyTorch optimizer implementations are not efficient in cases when we work with a lot of small feature tensors. Starting a lot of kernels slows down the whole process. We need to reduce the number of kernels that we start. As an example, we should be looking at [NVIDIAs Apex](https://github.com/NVIDIA/apex). In order to track progress, we will pick PyTorchs DCGAN model with Adam optimizer and once the optimizer is reimplemented with tensor lists, benchmark the model performance against original model version, Apexs version with original Adam optimizer and it’s FusedAdam optimizer. Current API restrictions - List can't be empty (will fixed in upcoming PRs). - All tensors in the list must have the same dtype, device and size. Broadcasting At this point we don't support broadcasting. What is 'Fast' and 'Slow' route In particular cases, we cant process an op with a fast list CUDA kernel. Still, we can do with a regular for-loop where the op will be applied to each tensor individually through the dispatch mechanisms. There are a few checks that decide whether the op will be performed via a 'fast' or 'slow' path. To go the fast route, - All tensors must have strided layout - All tensors must be dense and not have overlapping memory - The resulting tensor type must be the same. ---------------- In this PR Adding APIs: ``` torch._foreach_sub(TensorList tl1, TensorList tl2) torch._foreach_sub_(TensorList self, TensorList tl2) torch._foreach_mul(TensorList tl1, TensorList tl2) torch._foreach_mul_(TensorList self, TensorList tl2) torch._foreach_div(TensorList tl1, TensorList tl2) torch._foreach_div_(TensorList self, TensorList tl2) torch._foreach_sub(TensorList tl1, Scalar scalar) torch._foreach_sub_(TensorList self, Scalar scalar) torch._foreach_mul(TensorList tl1, Scalar scalar) torch._foreach_mul_(TensorList self, Scalar scalar) torch._foreach_div(TensorList tl1, Scalar scalar) torch._foreach_div(TensorList self, Scalar scalar) ``` Tests Tested via unit tests TODO 1. Properly handle empty lists 2. Properly handle bool tensors Plan for the next PRs 1. APIs - Unary Ops for list - Pointwise Ops 2. Complete tasks from TODO 3. Rewrite PyTorch optimizers to use for-each operators for performance gains. Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D23331891 Pulled By: izdeby fbshipit-source-id: 18c5937287e33e825b2e391e41864dd64e226f19	2020-09-07 10:29:32 -07:00
Natalia Gimelshein	626e410e1d	Revert D23544563: Benchmarks: re-enable profiling-te configuration. Test Plan: revert-hammer Differential Revision: D23544563 (`ac1f471fe2`) Original commit changeset: 98659e8860fa fbshipit-source-id: 5dab7044699f59c709e64d178758f5f462ebb788	2020-09-06 21:01:19 -07:00
Ailing Zhang	1b2da9ed82	Expose alias key info in dumpState and update test_dispatch. (#44081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44081 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23492794 Pulled By: ailzhang fbshipit-source-id: 27a2978591900463bda2e92e0201c9fd719f9792	2020-09-06 18:43:05 -07:00
Yangxin Zhong	514f20ea51	Histogram Binning Calibration Summary: Adding a calibration module called histogram binning: Divide the prediction range (e.g., [0, 1]) into B bins. In each bin, use two parameters to store the number of positive examples and the number of examples that fall into this bucket. So we basically have a histogram for the model prediction. As a result, for each bin, we have a statistical value for the real CTR (num_pos / num_example). We use this statistical value as the final calibrated prediction if the pre-cali prediction falls into the corresponding bin. In this way, the predictions within each bin should be well-calibrated if we have sufficient examples. That is, we have a fine-grained calibrated model by this calibration module. Theoretically, this calibration layer can fix any uncalibrated model or prediction if we have sufficient bins and examples. It provides the potential to use any kind of training weight allocation to our training data, without worrying about the calibration issue. Test Plan: buck test dper3/dper3/modules/calibration/tests:calibration_test -- test_histogram_binning_calibration buck test dper3/dper3_models/ads_ranking/tests:model_paradigm_e2e_tests -- test_sparse_nn_histogram_binning_calibration All tests passed. Example workflows: f215431958 {F326445092} f215445048 {F326445223} Reviewed By: chenshouyuan Differential Revision: D23356450 fbshipit-source-id: c691b66c51ef33908c17575ce12e5bee5fb325ff	2020-09-06 17:11:16 -07:00
Mikhail Zolotukhin	ac1f471fe2	Benchmarks: re-enable profiling-te configuration. (#44212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44212 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23544563 Pulled By: ZolotukhinM fbshipit-source-id: 98659e8860fa951d142e0f393731c4a769463c6c	2020-09-06 10:22:16 -07:00
Mike Ruberry	bb861e1d69	Ports CUDA var and std reduce all (with no out argument) to ATen, fixes var docs (#43858 ) Summary: When var and std are called without args (other than unbiased) they currently call into TH or THC. This PR: - Removes the THC var_all and std_all functions and updates CUDA var and std to use the ATen reduction - Fixes var's docs, which listed its arguments in the incorrect order - Adds new tests comparing var and std with their NumPy counterparts Performance appears to have improved as a result of this change. I ran experiments on 1D tensors, 1D tensors with every other element viewed ([::2]), 2D tensors and 2D transposed tensors. Some notable datapoints: - torch.randn((8000, 8000)) - var measured 0.0022215843200683594s on CUDA before the change - var measured 0.0020322799682617188s on CUDA after the change - torch.randn((8000, 8000)).T - var measured .015128850936889648 on CUDA before the change - var measured 0.001912832260131836 on CUDA after the change - torch.randn(8000 ** 2) - std measured 0.11031460762023926 on CUDA before the change - std measured 0.0017833709716796875 on CUDA after the change Timings for var and std are, as expected, similar. On the CPU, however, the performance change from making the analogous update was more complicated, and ngimel and I decided not to remove CPU var_all and std_all. ngimel wrote the following script that showcases how single-threaded CPU inference would suffer from this change: ``` import torch import numpy as np from torch.utils._benchmark import Timer from torch.utils._benchmark import Compare import sys base = 8 multiplier = 1 def stdfn(a): meanv = a.mean() ac = a-meanv return torch.sqrt(((acac).sum())/a.numel()) results = [] num_threads=1 for _ in range(7): size = basemultiplier input = torch.randn(size) tasks = [("torch.var(input)", "torch_var"), ("torch.var(input, dim=0)", "torch_var0"), ("stdfn(input)", "stdfn"), ("torch.sum(input, dim=0)", "torch_sum0") ] timers = [Timer(stmt=stmt, num_threads=num_threads, label="Index", sub_label=f"{size}", description=label, globals=globals()) for stmt, label in tasks] repeats = 3 for i, timer in enumerate(timers * repeats): results.append( timer.blocked_autorange() ) print(f"\r{i + 1} / {len(timers) * repeats}", end="") sys.stdout.flush() multiplier =10 print() comparison = Compare(results) comparison.print() ``` The TH timings using this script on my devfair are: ``` [------------------------------ Index ------------------------------] \| torch_var \| torch_var0 \| stdfn \| torch_sum0 1 threads: ---------------------------------------------------------- 8 \| 16.0 \| 5.6 \| 40.9 \| 5.0 80 \| 15.9 \| 6.1 \| 41.6 \| 4.9 800 \| 16.7 \| 12.0 \| 42.3 \| 5.0 8000 \| 27.2 \| 72.7 \| 51.5 \| 6.2 80000 \| 129.0 \| 715.0 \| 133.0 \| 18.0 800000 \| 1099.8 \| 6961.2 \| 842.0 \| 112.6 8000000 \| 11879.8 \| 68948.5 \| 20138.4 \| 1750.3 ``` and the ATen timings are: ``` [------------------------------ Index ------------------------------] \| torch_var \| torch_var0 \| stdfn \| torch_sum0 1 threads: ---------------------------------------------------------- 8 \| 4.3 \| 5.4 \| 41.4 \| 5.4 80 \| 4.9 \| 5.7 \| 42.6 \| 5.4 800 \| 10.7 \| 11.7 \| 43.3 \| 5.5 8000 \| 69.3 \| 72.2 \| 52.8 \| 6.6 80000 \| 679.1 \| 676.3 \| 129.5 \| 18.1 800000 \| 6770.8 \| 6728.8 \| 819.8 \| 109.7 8000000 \| 65928.2 \| 65538.7 \| 19408.7 \| 1699.4 ``` which demonstrates that performance is analogous to calling the existing var and std with `dim=0` on a 1D tensor. This would be a significant performance hit. Another simple script shows the performance is mixed when using multiple threads, too: ``` import torch import time # Benchmarking var and std, 1D with varying sizes base = 8 multiplier = 1 op = torch.var reps = 1000 for _ in range(7): size = base multiplier t = torch.randn(size) elapsed = 0 for _ in range(reps): start = time.time() op(t) end = time.time() elapsed += end - start multiplier *= 10 print("Size: ", size) print("Avg. elapsed time: ", elapsed / reps) ``` ``` var cpu TH vs ATen timings Size: 8 Avg. elapsed time: 1.7853736877441406e-05 vs 4.9788951873779295e-06 (ATen wins) Size: 80 Avg. elapsed time: 1.7803430557250977e-05 vs 6.156444549560547e-06 (ATen wins) Size: 800 Avg. elapsed time: 1.8569469451904296e-05 vs 1.2302875518798827e-05 (ATen wins) Size: 8000 Avg. elapsed time: 2.8756141662597655e-05 vs. 6.97789192199707e-05 (TH wins) Size: 80000 Avg. elapsed time: 0.00026622867584228516 vs. 0.0002447957992553711 (ATen wins) Size: 800000 Avg. elapsed time: 0.0010556647777557374 vs 0.00030616092681884767 (ATen wins) Size: 8000000 Avg. elapsed time: 0.009990205764770508 vs 0.002938544034957886 (ATen wins) std cpu TH vs ATen timings Size: 8 Avg. elapsed time: 1.6681909561157225e-05 vs. 4.659652709960938e-06 (ATen wins) Size: 80 Avg. elapsed time: 1.699185371398926e-05 vs. 5.431413650512695e-06 (ATen wins) Size: 800 Avg. elapsed time: 1.768803596496582e-05 vs. 1.1279821395874023e-05 (ATen wins) Size: 8000 Avg. elapsed time: 2.7791500091552735e-05 vs 7.031106948852539e-05 (TH wins) Size: 80000 Avg. elapsed time: 0.00018650460243225096 vs 0.00024368906021118164 (TH wins) Size: 800000 Avg. elapsed time: 0.0010522041320800782 vs 0.0003039860725402832 (ATen wins) Size: 8000000 Avg. elapsed time: 0.009976618766784668 vs. 0.0029211788177490234 (ATen wins) ``` These results show the TH solution still performs better than the ATen solution with default threading for some sizes. It seems like removing CPU var_all and std_all will require an improvement in ATen reductions. https://github.com/pytorch/pytorch/issues/40570 has been updated with this information. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43858 Reviewed By: zou3519 Differential Revision: D23498981 Pulled By: mruberry fbshipit-source-id: 34bee046c4872d11c3f2ffa1b5beee8968b22050	2020-09-06 09:40:54 -07:00
Mike Ruberry	83a6e7d342	Adds inequality testing aliases for better NumPy compatibility (#43870 ) Summary: This PR adds the following aliaes: - not_equal for torch.ne - greater for torch.gt - greater_equal for torch.ge - less for torch.lt - less_equal for torch.le This aliases are consistent with NumPy's naming for these functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43870 Reviewed By: zou3519 Differential Revision: D23498975 Pulled By: mruberry fbshipit-source-id: 78560df98c9f7747e804a420c1e53fd1dd225002	2020-09-06 09:36:23 -07:00
Nikita Shulga	671160a963	Revert D23557576: Revert D23519521: [dper3] replace LengthsGather lowlevel module's PT implemetnatio to use caffe2 op Test Plan: revert-hammer Differential Revision: D23557576 Original commit changeset: 33631299eabe fbshipit-source-id: 704d36a16346f047b30e2da8be882062135f8617	2020-09-06 01:50:43 -07:00
Nikita Shulga	e358d516c8	Revert D23549149: [PyTorch][Mobile] Insert the module name as `name()` to metadata dict if metadata doesn't contain "model_name" Test Plan: revert-hammer Differential Revision: D23549149 (`398409f072`) Original commit changeset: fad742a8d4e6 fbshipit-source-id: bd92a2033a804d3e6a2747b4fda4ca527991a993	2020-09-06 00:06:35 -07:00
Martin Yuan	70c8daf439	Apply selective build on RNN operators (#44132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44132 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43985 Added ``` def(detail::SelectiveStr<true>, ...) impl(detail::SelectiveStr<true>, ...) ``` in torch/library, which can also be used for other templated selective registration. Size saves for this diff: fbios-pika: 78 KB igios: 87 KB Test Plan: Imported from OSS Reviewed By: ljk53, smessmer Differential Revision: D23459774 Pulled By: iseeyuan fbshipit-source-id: 86d34cfe8e3f852602f203db06f23fa99af2c018	2020-09-05 23:47:51 -07:00
Kurt Mohler	68297eeb1a	Add support for integer dim arg in `torch.linalg.norm` (#43907 ) Summary: Since PR https://github.com/pytorch/pytorch/issues/43262 is merged, this works now. Part of https://github.com/pytorch/pytorch/issues/24802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43907 Reviewed By: anjali411 Differential Revision: D23471964 Pulled By: mruberry fbshipit-source-id: ef2f11f78343fc866f752c9691b0c1fa687353ba	2020-09-05 23:16:36 -07:00
Muthu Arivoli	719d29dab5	Implement torch.i0 and torch.kaiser_window (#43132 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43132 Reviewed By: smessmer Differential Revision: D23479072 Pulled By: mruberry fbshipit-source-id: 4fb1de44830771c6a7222cf19f7728d9ac7c043b	2020-09-05 23:11:47 -07:00
Nikita Shulga	4fc29e9c43	Revert D23519521: [dper3] replace LengthsGather lowlevel module's PT implemetnatio to use caffe2 op Test Plan: revert-hammer Differential Revision: D23519521 (`8c64bb4f47`) Original commit changeset: ed9bd16a8af3 fbshipit-source-id: 33631299eabec05a1a272bfd0040d96203cf62a0	2020-09-05 20:43:04 -07:00
Yi Wang	396469f18c	Explicitly forbidden the other inherited methods of RemoteModule. (#43895 ) Summary: Throw exceptions when the methods except for forwardXXX are used. Original PR issue: RemoteModule enhancements #40550 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43895 Test Plan: buck test test/distributed/rpc:process_group_agent -- RemoteModule Reviewed By: rohan-varma Differential Revision: D23392842 Pulled By: SciPioneer fbshipit-source-id: 7c09a55a03f9f0b7e9f9264a42bfb907607f4651	2020-09-05 14:48:56 -07:00
Supriya Rao	199c73be0f	[quant][pyper] Support quantization of ops in fork-wait subgraph (#44048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44048 Inline the fork-wait calls to make sure we can see the ops to be quantized in the main graph Also fix the InlineForkWait JIT pass to account for the case where the aten::wait call isn't present in the main graph and we return future tensor from subgraph Example ``` graph(%self.1 : __torch__.dper3.core.interop.___torch_mangle_6325.DperModuleWrapper, %argument_1.1 : Tensor, %argument_2.1 : Tensor): %3 : Future[Tensor[]] = prim::fork_0(%self.1, %argument_1.1, %argument_2.1) # :0:0 return (%3) with prim::fork_0 = graph(%self.1 : __torch__.dper3.core.interop.___torch_mangle_5396.DperModuleWrapper, %argument_1.1 : Tensor, %argument_2.1 : Tensor): %3 : __torch__.dper3.core.interop.___torch_mangle_6330.DperModuleWrapper = prim::GetAttr[name="x"](%self.1) %4 : __torch__.dper3.core.interop.___torch_mangle_5397.DperModuleWrapper = prim::GetAttr[name="y"](%self.1) %5 : __torch__.dper3.core.interop.___torch_mangle_6327.DperModuleWrapper = prim::GetAttr[name="z"](%4) %6 : Tensor = prim::CallMethod[name="forward"](%5, %argument_1.1, %argument_2.1) # :0:0 %7 : None = prim::CallMethod[name="forward"](%3, %6) # :0:0 %8 : Tensor[] = prim::ListConstruct(%6) return (%8) ``` Test Plan: python test/test_quantization.py test_interface_with_fork Imported from OSS Reviewed By: vkuzo Differential Revision: D23481003 fbshipit-source-id: 2e756be73c248319da38e053f021888b40593032	2020-09-05 12:06:19 -07:00
Supriya Rao	164b96c34c	[quant][pyper] make embedding_bag quantization static (#44008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44008 embedding_bag requires only quantization of weights (no dynamic quantization of inputs) So the type of quantization is essentially static (without calibration) This will enable pyper to do fc and embedding_bag quantization using the same API call Test Plan: python test/test_quantization.py test_embedding_bag Imported from OSS Reviewed By: vkuzo Differential Revision: D23467019 fbshipit-source-id: 41a61a17ee34bcb737ba5b4e19fb7a576d4aeaf9	2020-09-05 12:06:16 -07:00
Supriya Rao	a0ae416d60	[quant] Support aten::embedding_bag quantization in graph mode (#43989 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43989 When we trace the model it produces aten::embedding_bag node in the graph, Add necessary passes in graph mode to help support quantizing it as well Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Imported from OSS Reviewed By: vkuzo Differential Revision: D23460485 fbshipit-source-id: 328c5e1816cfebb10ba951113f657665b6d17575	2020-09-05 12:05:06 -07:00
Yi Wang	15a7368115	Add const to getTensors method of GradBucket. (#44126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44126 Add const to getTensors method of GradBucket. Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest Reviewed By: sinannasir, jiayisuse Differential Revision: D23504088 fbshipit-source-id: 427d9591042e0c03cde02629c1146ff1e5e027f9	2020-09-05 09:19:42 -07:00
Elias Ellison	5bd2902796	[JIT] Remove references to no longer generated _tanh_backward and _sigmoid_backward (#44138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44138 If you look at the sigmoid and tanh backward they are composed of other ops: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/symbolic_script.cpp#L786 https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/symbolic_script.cpp#L164 So tanh_backward and sigmoid_backward are no longer generated / legacy ops. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23543603 Pulled By: eellison fbshipit-source-id: ce8353e53043cf969b536aac47c9576d66d4ce02	2020-09-05 01:41:36 -07:00
Elias Ellison	df67f0beab	[TensorExpr fuser] Guard nodes that have tensor output properties determined by non-tensor inputs (#44137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44137 We only insert guards on Tensor types, so we rely on the output of a node being uniquely determined by its input types. bail if any non-Tensor input affects the output type and cannot be reasoned about statically Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23543602 Pulled By: eellison fbshipit-source-id: abd6fe0b1fd7fe6fc251694d4cd442b19c032dd7	2020-09-05 01:40:18 -07:00
Gao, Xiang	5a0d65b06b	Further expand coverage of addmm/addmv, fix 0 stride (#43980 ) Summary: - test beta=0, self=nan - test transposes - fixes broadcasting of addmv - not supporting tf32 yet, will do it in future PR together with other testing fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/43980 Reviewed By: mruberry Differential Revision: D23507559 Pulled By: ngimel fbshipit-source-id: 14ee39d1a0e13b9482932bede3fccb61fe6d086d	2020-09-04 23:03:23 -07:00
Wanchao Liang	d07a36e0c1	Revert D23490149: [pytorch][PR] Compile less legacy code when BUILD_CAFFE2 is set to False Test Plan: revert-hammer Differential Revision: D23490149 (`15e99b6ff6`) Original commit changeset: a76382c30d83 fbshipit-source-id: 75057fa9af2c19eb976962552118bf0a99911b38	2020-09-04 22:59:39 -07:00
Vasiliy Kuznetsov	618b4dd763	fx quant prepare: clarify naming (#44125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44125 In `Quantizer._prepare`, `observed` was used for two different variables with different types. Making the names a bit cleaner and removing the name conflict. Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: dskhudia Differential Revision: D23504109 fbshipit-source-id: 0f73eac3d6dd5f72ad5574a4d47d33808a70174a	2020-09-04 21:29:56 -07:00
Vasiliy Kuznetsov	a940f5ea5d	torchscript graph mode quant: remove benchmark filter (#44165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44165 Allows convolutions to be quantized if `torch.cudnn.backends.benchmark` flag was set. Not for land yet, just testing. Test Plan: in the gist below, the resulting graph now has quantized convolutions https://gist.github.com/vkuzo/622213cb12faa0996b6700b08d6ab2f0 Imported from OSS Reviewed By: supriyar Differential Revision: D23518775 fbshipit-source-id: 294f678c6afbd3feeb89b7a6655bc66ac9f8bfbc	2020-09-04 21:25:35 -07:00
Ashish Shenoy	8c64bb4f47	[dper3] replace LengthsGather lowlevel module's PT implemetnatio to use caffe2 op Summary: Use a more efficient C++ implementation in a caffe2 op to get rid of control flow statements here. Test Plan: - Ran `buck test dper3/dper3/modules/low_level_modules/tests:single_operators_test` - Ran `buck-out/gen/dper3/dper3_models/experimental/pytorch/ads_model_generation_script.par --model_type="inline_cvr_post_imp" --model_version="april_2020" --gen_inference_model` and observed files getting generated: ``` [ashenoy@devbig086.ash8 ~/fbsource/fbcode] ls -l /tmp/ashenoy/inline_cvr_post_imp_april_2020/ total 278332 -rw-r--r--. 1 ashenoy users 71376941 Sep 3 23:10 serialized_inline_cvr_post_imp_april_2020_model_inference.pt -rw-r--r--. 1 ashenoy users 71437424 Sep 3 22:09 serialized_inline_cvr_post_imp_april_2020_model_inference_shrunk.pt -rw-r--r--. 1 ashenoy users 14952 Sep 3 22:38 serialized_inline_cvr_post_imp_april_2020_model_io_metadata_map.pt -rw-r--r--. 1 ashenoy users 14952 Sep 3 21:42 serialized_inline_cvr_post_imp_april_2020_model_io_metadata_map_shrunk.pt -rw-r--r--. 1 ashenoy users 67001662 Sep 3 22:38 serialized_inline_cvr_post_imp_april_2020_model_main.pt -rw-r--r--. 1 ashenoy users 67126415 Sep 3 21:42 serialized_inline_cvr_post_imp_april_2020_model_main_shrunk.pt -rw-r--r--. 1 ashenoy users 3945257 Sep 3 22:34 serialized_inline_cvr_post_imp_april_2020_model_preproc.pt -rw-r--r--. 1 ashenoy users 4077266 Sep 3 21:37 serialized_inline_cvr_post_imp_april_2020_model_preproc_shrunk.pt ``` - Ran `buck-out/gen/dper3/dper3_models/experimental/pytorch/ads_model_generation_script.par --model_type="ctr_mbl_feed" --model_version="april_2020" --gen_inference_model` and observed model files getting generated: ``` [ashenoy@devbig086.ash8 ~/fbsource/fbcode] ls -l /tmp/ashenoy/ctr_mbl_feed_april_2020/ total 170304 -rw-r--r--. 1 ashenoy users 2641870 Sep 3 23:06 ctr_mbl_feed_april_2020_prod_eval_training_options -rw-r--r--. 1 ashenoy users 2641870 Sep 3 23:06 ctr_mbl_feed_april_2020_prod_train_training_options -rw-r--r--. 1 ashenoy users 42225079 Sep 3 23:59 serialized_ctr_mbl_feed_april_2020_model_inference.pt -rw-r--r--. 1 ashenoy users 42576708 Sep 3 22:33 serialized_ctr_mbl_feed_april_2020_model_inference_shrunk.pt -rw-r--r--. 1 ashenoy users 11194 Sep 3 23:29 serialized_ctr_mbl_feed_april_2020_model_io_metadata_map.pt -rw-r--r--. 1 ashenoy users 11194 Sep 3 22:05 serialized_ctr_mbl_feed_april_2020_model_io_metadata_map_shrunk.pt -rw-r--r--. 1 ashenoy users 39239139 Sep 3 23:29 serialized_ctr_mbl_feed_april_2020_model_main.pt -rw-r--r--. 1 ashenoy users 39250842 Sep 3 22:05 serialized_ctr_mbl_feed_april_2020_model_main_shrunk.pt -rw-r--r--. 1 ashenoy users 2839097 Sep 3 23:24 serialized_ctr_mbl_feed_april_2020_model_preproc.pt -rw-r--r--. 1 ashenoy users 2944239 Sep 3 22:01 serialized_ctr_mbl_feed_april_2020_model_preproc_shrunk.pt ``` Reviewed By: houseroad Differential Revision: D23519521 fbshipit-source-id: ed9bd16a8af3cca3a865d9614d67d07f01d8b18a	2020-09-04 21:19:53 -07:00
Yuchen Huang	398409f072	[PyTorch][Mobile] Insert the module name as `name()` to metadata dict if metadata doesn't contain "model_name" (#44227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44227 As title ghstack-source-id: 111490242 Test Plan: CI Reviewed By: xcheng16 Differential Revision: D23549149 fbshipit-source-id: fad742a8d4e6f844f83495514cd60ff2bf0d5bcb	2020-09-04 21:18:12 -07:00
Nikita Shulga	15e99b6ff6	Compile less legacy code when BUILD_CAFFE2 is set to False (#44079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44079 Reviewed By: walterddr Differential Revision: D23490149 Pulled By: malfet fbshipit-source-id: a76382c30d83127d180ec63ac15093a7297aae53	2020-09-04 20:04:21 -07:00
shubhambhokare1	f3bf6a41ca	[ONNX] Update repeat op (#43430 ) Summary: Update repeat op so that the inputs to sizes argument can a mixture of dynamic and constant inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/43430 Reviewed By: houseroad Differential Revision: D23494257 Pulled By: bzinodev fbshipit-source-id: 90c5e90e4f73e98f3a9d5c8772850e72cecdf0d4	2020-09-04 18:53:31 -07:00
Chunli Fu	3699274ce2	[DPER3] AOT integration Summary: Integrate aot flow with model exporter. Test Plan: buck test dper3/dper3_backend/delivery/tests:dper3_model_export_test replayer test see D23407733 Reviewed By: ipiszy Differential Revision: D23313689 fbshipit-source-id: 39ae8d578ed28ddd6510db959b65974a5ff62888	2020-09-04 18:37:22 -07:00
Yi Wang	8b17fd2516	Add remote_parameters() into RemoteModule class. (#43906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43906 This method returns a list of RRefs of remote parameters that can be fed into the DistributedOptimizer. Original PR issue: RemoteModule enhancements #40550 Test Plan: buck test caffe2/test/distributed/rpc:process_group_agent -- RemoteModule Reviewed By: rohan-varma Differential Revision: D23399586 fbshipit-source-id: 4b0f1ccf2e47c8a9e4f79cb2c8668f3cdbdff820	2020-09-04 16:22:40 -07:00
Alex Borcan	8f37ad8290	[BUILD] Guard '#pragma unroll' with COMPILING_FOR_MIN_SIZE Summary: Disable unroll hints when COMPILING_FOR_MIN_SIZE is on. We were seeing hundreds of errors in the build because the optimization was not being performed. Test Plan: Smoke builds Differential Revision: D23513255 fbshipit-source-id: 87da2fdc3c1146e8ffcacf14a49d5151d313f367	2020-09-04 15:55:28 -07:00
neginraoof	3d7c22a2ce	[ONNX] Enable new scripting passes for functionalization and remove_mutation (#43791 ) Summary: Duplicate of https://github.com/pytorch/pytorch/issues/41413 This PR initiates the process of updating the torchsciprt backend interface used by ONNX exporter. Replace jit lower graph pass by freeze module pass Enable ScriptModule tests for ONNX operator tests (ORT backend) and model tests by default. Replace jit remove_inplace_ops pass with remove_mutation and consolidation all passes for handling inplace ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43791 Reviewed By: houseroad Differential Revision: D23421872 Pulled By: bzinodev fbshipit-source-id: a98710c45ee905748ec58385e2a232de2486331b	2020-09-04 15:21:45 -07:00
James Reed	70bbd08402	[FX] Fix forward merge conflict breakage (#44221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44221 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23547373 Pulled By: jamesr66a fbshipit-source-id: df47fce0f6ff2988093208fc8370544b7985288d	2020-09-04 15:12:33 -07:00
Jiyuan Qian	4562b212db	Fix potential divide by zero for CostInferenceForRowWiseSparseAdagrad Summary: Fix the potential divide by zero error in CostInferenceForRowWiseSparseAdagrad, when n has zero elements Test Plan: Ran buck test caffe2/caffe2/python/operator_test:adagrad_test Result: https://our.intern.facebook.com/intern/testinfra/testrun/562950122086369 Reviewed By: idning Differential Revision: D23520763 fbshipit-source-id: 191345bd24f5179a9dbdb41c6784eab102cfe89c	2020-09-04 14:14:49 -07:00
Zachary DeVito	2ad5a82c43	[fx] get rid of graph_module.root (#44092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44092 instead submodules and weights are installed directly on the graph_module by transferring the original modules. This makes it more likely that scripting will succeed (since we no longer have submodules that are not used in the trace). It also prevents layered transforms from having to special case handling of the `root` module. GraphModules can now be re-traced as part of the input to other transforms. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23504210 Pulled By: zdevito fbshipit-source-id: f79e5c4cbfc52eb0ffb5d6ed89b37ce35a7dc467	2020-09-04 11:35:32 -07:00
Natalia Gimelshein	0c2bc4fe20	Revert D23468286: [pytorch][PR] Optimize code path for adaptive_avg_pool2d when output size is (1, 1) Test Plan: revert-hammer Differential Revision: D23468286 (`f8f35fddd4`) Original commit changeset: cc181f705fea fbshipit-source-id: 3a1db0eef849e0c2f3c0c64040d2a8b799644fa3	2020-09-04 11:28:15 -07:00
Mikhail Zolotukhin	6474057c76	Revert D23503636: [pytorch][PR] [NNC] make inlining immediate (take 2) and fix bugs Test Plan: revert-hammer Differential Revision: D23503636 (`70aecd2a7f`) Original commit changeset: cdbdc902b7a1 fbshipit-source-id: b5164835f874a56213de4bed9ad690164eae9230	2020-09-04 10:58:23 -07:00
neginraoof	539d029d8c	[ONNX] Fix split export using slice (#43670 ) Summary: Fix for exporting split with fixed output shape using slice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43670 Reviewed By: houseroad Differential Revision: D23420318 Pulled By: bzinodev fbshipit-source-id: 09c2b58049fe32dca2f2977d91dd64de6ee9a72f	2020-09-04 10:52:44 -07:00
James Reed	af13faf18b	[FX] __str__ for GraphModule and Graph (#44166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44166 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23520801 Pulled By: jamesr66a fbshipit-source-id: f77e3466e435127ec01e66291964395f32a18992	2020-09-04 10:46:43 -07:00
Ann Shan	0e3cf6b8d2	[pytorch] remove code analyzer build folder between builds (#44148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44148 Automatically remove the build_code_analyzer folder each time build.sh is run ghstack-source-id: 111458413 Test Plan: Run build.sh with different options and compare the outputs (should be different). Ex: `ANALYZE_TORCH=1 DEPLOY=1 BASE_OPS_FILE=/path/to/baseops MOBILE_BUILD_FLAGS='-DBUILD_MOBILE_AUTOGRAD=OFF' tools/code_analyzer/build.sh ` should produce a shorter file than `ANALYZE_TORCH=1 DEPLOY=1 BASE_OPS_FILE=/path/to/baseops MOBILE_BUILD_FLAGS='-DBUILD_MOBILE_AUTOGRAD=ON' tools/code_analyzer/build.sh` Reviewed By: iseeyuan Differential Revision: D23503886 fbshipit-source-id: 9b95d4365540da0bd2d27760e1315caed5f44eec	2020-09-04 10:38:12 -07:00
Pruthvi Madugundu	f38e7aee71	Updates to SCCACHE for ROCm case (#44155 ) Summary: - Collecting sccache trace logs - Change the SCCACHE_IDLE_TIMEOUT to unlimited Pull Request resolved: https://github.com/pytorch/pytorch/pull/44155 Reviewed By: ngimel Differential Revision: D23516192 Pulled By: malfet fbshipit-source-id: aa93052d7b9a1832eeaa8e81ee8706aeb9f7a508	2020-09-04 10:11:18 -07:00
Vinod Kumar S	2a1fc56694	replace the white list from default mappings (#41802 ) Summary: Replaced "whitelist" from default_mappings.py Fixes https://github.com/pytorch/pytorch/issues/41756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41802 Reviewed By: ngimel Differential Revision: D23521452 Pulled By: malfet fbshipit-source-id: 019a2d5c06dc59dc53d6c48b70fb35b216299cf4	2020-09-04 10:04:28 -07:00
Nikita Shulga	4d431881d1	Control NCCL build parallelism via MAX_JOBS environment var (#44167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44167 Reviewed By: walterddr, ngimel Differential Revision: D23522419 Pulled By: malfet fbshipit-source-id: 31b25a71fef3e470bdf382eb3698e267326fa354	2020-09-04 10:02:53 -07:00
Nikita Shulga	6aba58cfd3	Limit MAX_JOBS to 18 for linux binary builds (#44168 ) Summary: Because those jobs are running in Docker2XLarge+ container that has 20 cores Unfortunately `nproc` returns number of cores available on the host rather than number of cores available to container Pull Request resolved: https://github.com/pytorch/pytorch/pull/44168 Reviewed By: walterddr, ngimel Differential Revision: D23539558 Pulled By: malfet fbshipit-source-id: 3df858722e153a8fcbe8ef6370b1a9c1993ada5b	2020-09-04 09:58:17 -07:00
yangu	6cecf7ec68	Enable test_cublas_config_deterministic_error for windows (#42796 ) Summary: test_cublas_config_deterministic_error can pass for windows, so enable it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42796 Reviewed By: seemethere Differential Revision: D23520002 Pulled By: malfet fbshipit-source-id: eccedbbf202b1cada795071a34e266b2c635c2cf	2020-09-04 09:52:57 -07:00
Richard Zou	9a5a732866	Register some backwards functions as operators (#44052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44052 Summary ======= This PR registers the following backwards functions as operators: - slice_backward - select_backward - gather_backward - index_select_backward (the backward function for index_select) - select_index_backward (prevously known as index_select_backward, but is actually the backward function for max.dim, min.dim, etc) In the future, I'd like to register more backward functions as operators so that we can write batching rules for the backward functions. Batching rules for backward functions makes it so that we can compute batched gradients. Motivation ========== The rationale behind this PR is that a lot of backwards functions (27 in total) are incompatible with BatchedTensor due to using in-place operations. Sometimes we can allow the in-place operations, but other times we can't. For example, consider select_backward: ``` Tensor select_backward(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t index) { auto grad_input = at::zeros(input_sizes, grad.options()); grad_input.select(dim, index).copy_(grad); return grad_input; } ``` and consider the following code: ``` x = torch.randn(5, requires_grad=True) def select_grad(v): torch.autograd.grad(x[0], x, v) vs = torch.randn(B0) batched_grads = vmap(select_grad)(vs) ``` For the batched gradient use case, `grad` is a BatchedTensor. The physical version of `grad` has size `(B0,)`. However, select_backward creates a `grad_input` of shape `(5)`, and tries to copy `grad` to a slice of it. Other approaches ================ I've considered the following: - register select_backward as an operator (this PR) - have a branch inside select_backward for if `grad` is batched. - this is OK, but what if we have more tensor extensions that want to override this? - modify select_backward to work with BatchedTensor, by creating a new operator for the "select + copy_ behavior". - select + copy_ isn't used elsewhere in derivative formulas so this doesn't seem useful Test Plan ========= - `pytest test/test_autograd.py -v` - Registering backward functions may impact performance. I benchmarked select_backward to see if registering it as an operator led to any noticable performance overheads: https://gist.github.com/zou3519/56d6cb53775649047b0e66de6f0007dc. The TL;DR is that the overhead is pretty minimal. Test Plan: Imported from OSS Reviewed By: ezyang, fbhuba Differential Revision: D23481183 Pulled By: zou3519 fbshipit-source-id: 125af62eb95824626dc83d06bbc513262ee27350	2020-09-04 08:30:39 -07:00
Nikita Shulga	0c01f136f3	[BE] Use f-string in various Python functions (#44161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44161 Reviewed By: seemethere Differential Revision: D23515874 Pulled By: malfet fbshipit-source-id: 868cf65aedd58fce943c08f8e079e84e0a36df1f	2020-09-04 07:38:25 -07:00
generatedunixname89002005287564	28b1360d24	[Codemod][FBSourceGoogleJavaFormatLinter] Daily `arc lint --take GOOGLEJAVAFORMAT` Reviewed By: zertosh Differential Revision: D23536088 fbshipit-source-id: d4c6c26ed5bad4e8c1b80ac1c05bd86b36cb6aaa	2020-09-04 07:30:50 -07:00
Xiao Wang	f8f35fddd4	Optimize code path for adaptive_avg_pool2d when output size is (1, 1) (#43986 ) Summary: Benchmark: code: https://github.com/xwang233/code-snippet/blob/master/adaptive-avg-pool2d-output-1x1/adap.ipynb \| shape \| time_before (ms) \| time_after (ms) \| \| --- \| --- \| --- \| \| (2, 3, 4, 4), torch.contiguous_format, cpu \| 0.035 \| 0.031 \| \| (2, 3, 4, 4), torch.contiguous_format, cuda \| 0.041 \| 0.031 \| \| (2, 3, 4, 4), torch.channels_last, cpu \| 0.027 \| 0.029 \| \| (2, 3, 4, 4), torch.channels_last, cuda \| 0.031 \| 0.034 \| \| (2, 3, 4, 4), non_contiguous, cpu \| 0.037 \| 0.026 \| \| (2, 3, 4, 4), non_contiguous, cuda \| 0.062 \| 0.033 \| \| (4, 16, 32, 32), torch.contiguous_format, cpu \| 0.063 \| 0.055 \| \| (4, 16, 32, 32), torch.contiguous_format, cuda \| 0.043 \| 0.031 \| \| (4, 16, 32, 32), torch.channels_last, cpu \| 0.052 \| 0.064 \| \| (4, 16, 32, 32), torch.channels_last, cuda \| 0.190 \| 0.033 \| \| (4, 16, 32, 32), non_contiguous, cpu \| 0.048 \| 0.035 \| \| (4, 16, 32, 32), non_contiguous, cuda \| 0.062 \| 0.033 \| \| (8, 128, 64, 64), torch.contiguous_format, cpu \| 0.120 \| 0.109 \| \| (8, 128, 64, 64), torch.contiguous_format, cuda \| 0.043 \| 0.044 \| \| (8, 128, 64, 64), torch.channels_last, cpu \| 1.303 \| 0.260 \| \| (8, 128, 64, 64), torch.channels_last, cuda \| 1.237 \| 0.049 \| \| (8, 128, 64, 64), non_contiguous, cpu \| 0.132 \| 0.128 \| \| (8, 128, 64, 64), non_contiguous, cuda \| 0.062 \| 0.031 \| \| (16, 256, 224, 224), torch.contiguous_format, cpu \| 17.232 \| 14.807 \| \| (16, 256, 224, 224), torch.contiguous_format, cuda \| 1.930 \| 1.930 \| \| (16, 256, 224, 224), torch.channels_last, cpu \| 245.025 \| 24.345 \| \| (16, 256, 224, 224), torch.channels_last, cuda \| 15.593 \| 1.944 \| \| (16, 256, 224, 224), non_contiguous, cpu \| 11.738 \| 6.460 \| \| (16, 256, 224, 224), non_contiguous, cuda \| 0.524 \| 0.251 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/43986 Reviewed By: anjali411 Differential Revision: D23468286 Pulled By: ngimel fbshipit-source-id: cc181f705feacb2f86df420d648cc59fda69fdb7	2020-09-04 03:37:33 -07:00
generatedunixname89002005287564	ef28ee50b0	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D23536086 fbshipit-source-id: 56e9c70a6998086515f59d74c5d8a2280ac2f669	2020-09-04 03:33:32 -07:00
Bert Maher	98ad5ff41f	[te] Disable reductions by default (#44122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44122 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D23504769 Pulled By: bertmaher fbshipit-source-id: 1889217cd22da529e46ab30c9319a5646267e4ec	2020-09-03 23:37:45 -07:00
Xianjie Chen	a37c199b8b	[c2][cuda] small improvement to dedup adagrad by avoiding recompute of x_ij (#44173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44173 it has small 10~15% speed improvement Test Plan: == Correctness == `buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient '` Reviewed By: jianyuh Differential Revision: D23494030 fbshipit-source-id: cdb7ee716a7e559903b72ed9f93bf106813f88fa	2020-09-03 22:50:53 -07:00
Jordan Fix	2f8a43341d	Add API for onnxifi with AOT Glow ONNX (#44021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44021 Pull Request resolved: https://github.com/pytorch/glow/pull/4854 Test Plan: Added `test_onnxifi_aot.py` Reviewed By: yinghai Differential Revision: D23307003 fbshipit-source-id: e6d4f3e394f96fd22f80eb2b8a686cf8171a54c0	2020-09-03 22:46:20 -07:00
Martin Yuan	d221256888	[Message] Add what to do for missing operators. Summary: As title. Test Plan: N/A Reviewed By: gaurav-work Differential Revision: D23502416 fbshipit-source-id: a341eb10030e3f319266019ba4c02d9d9a0a6298	2020-09-03 22:41:27 -07:00
Ailing Zhang	addfd7a9b9	Add tests against autograd precedence and multiple dispatch. (#44037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44037 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23480154 Pulled By: ailzhang fbshipit-source-id: 28b68e67975397c76ce6c73ceaeec9d5cc934635	2020-09-03 22:19:08 -07:00
Nikita Shulga	b60ffcdfdd	Enable typechecks for torch.nn.quantized.modules.linear (#44154 ) Summary: Also import `Optional` directly from `typing` rather than from `_jit_internal` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44154 Reviewed By: seemethere Differential Revision: D23511833 Pulled By: malfet fbshipit-source-id: f78c5fd679c002b218e4d287a9e56fa198171981	2020-09-03 19:52:49 -07:00
peter	538d3bd364	Enable CUDA 11 jobs for Windows nightly builds (#44086 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/43366/files#r474333051. Testing with https://github.com/pytorch/pytorch/pull/44007. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44086 Reviewed By: ezyang Differential Revision: D23493553 Pulled By: malfet fbshipit-source-id: 34b3e5b2e8dece5e97db9d507c34d61d33bd0863	2020-09-03 17:45:31 -07:00
Zafar	69e38828f5	[quant] conv_transpose2d_prepack/conv_transpose2d_unpack (#40351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40351 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22158983 Pulled By: z-a-f fbshipit-source-id: 3ca064c2d826609724b2740fcc9b9eb40556168d	2020-09-03 17:21:32 -07:00
Ivan Kobzarev	c40e3f9f98	[android][jni] Support Tensor MemoryFormat in java wrappers (#40785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40785 The main goal of this change is to support creating Tensors specifying blob in NHWC (ChannelsLast) format. ChannelsLast is supported only for 4-dim tensors, this is enforced on LibTorch side, I have not added asserts on java side in case that this limitation will be changed in future and not to have double asserts. Additional changes in `aten/src/ATen/templates/Functions.h`: `from_blob` creates `at::empty({0}, options)` tensor first and sets it Storage with sizes and strides afterwards. But as ChannelsLast is only for 4-dim tensors - it fails on that creation, as dim==1. I've added `zero_sizes()` function that returns `{0, 0, 0, 0}` for ChannelsLast and ChannelsLast3d. Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D22396244 Pulled By: IvanKobzarev fbshipit-source-id: 02582d748a554e0f859aefe71cd2c1e321fb8979	2020-09-03 17:01:35 -07:00
Nick Gibson	70aecd2a7f	[NNC] make inlining immediate (take 2) and fix bugs (#43885 ) Summary: A rework of `computeInline` which makes it work a bit better, particularly when combined with other transformations. Previously we stored Functions that were inlined and then deferred the actual inlining of the function body until prepareForCodgen was called. This has an issue when transformations are applied to the LoopNest: the function body can be different from what appears in the root_stmt and result in inlining that a) fails, b) reverses other transformations or c) a weird unpredictable combination of the two. This PR changes that behaviour so that the inlining occurs in the root stmt immediately, which means it reflects any previous transformations and any future transformations have a true view of the internal IR. It also has the benefit that inspecting the root statement gives an accurate view of it without needing to call prepareForCodgen. I also removed the difference between `computeInline` and `computeInlineWithRand` and we handle calls to `rand()` in all branches. This is a rework of https://github.com/pytorch/pytorch/issues/38696, with the agreed changes from ZolotukhinM and zheng-xq: we should only inline if the dimensions are trivial (ie. they are vars not exprs). This PR is mostly tests, and I fixed a bunch of bugs I found along the way. Partial list: * When inlining an expression involving rand, we would create random vars equal to the dimensionality of the enclosing Tensor not the produced Tensor - meaning we'd use an incorrect value if the inlined tensor was smaller. E.g: `X[i] = rand(); A[i, j] = X[i]` would produce a tensor where `A[0, 0] != A[0, 1]`. This is fixed by inserting the Let binding of the random variable at the correct loop body. * When inlining we'd replace all calls to `rand()` rather than just those present in the Tensor being inlined. * `rand()` was treated symbolically by the simplifier and we would aggregate or cancel calls to `rand()`. Have fixed the hasher to hash all calls to `rand()` distinctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43885 Reviewed By: gmagogsfm Differential Revision: D23503636 Pulled By: nickgg fbshipit-source-id: cdbdc902b7a14d269911d978a74a1c11eab004fa	2020-09-03 16:49:24 -07:00
Hao Lu	bc4a00c197	[TVM] Support Fused8BitRowwiseQuantizedToFloat op (#44098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44098 Reviewed By: yinghai Differential Revision: D23470129 fbshipit-source-id: 1959e2167859f7cbc16e1423b957072bbc743ece	2020-09-03 16:39:53 -07:00
Mikhail Zolotukhin	3105d8a9b2	[TensorExpr] Fuser: rely on input types when checking whether a device is supported. (#44139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44139 Also, make sure that we're checking that condition when we're starting a new fusion group, not only when we merge a node into an existing fusion group. Oh, and one more: add a test checking that we're rejecting graphs with unspecified shapes. Differential Revision: D23507510 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 9c268825ac785671d7c90faf2aff2a3e5985ac5b	2020-09-03 16:27:14 -07:00
Vasiliy Kuznetsov	71510c60ad	fx qat: respect device affinity (#44115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44115 Fixes device affinity in the FX prepare pass for QAT. Before this PR, observers were always created on CPU. After this PR, observers are created on the same device as the rest of the model. This will enable QAT prepare to work regardless of whether users move the model to cuda before or after calling this pass. Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_qat_prepare_device_affinity ``` Imported from OSS Reviewed By: supriyar Differential Revision: D23502291 fbshipit-source-id: ec4ed20c21748a56a25e3395b35ab8640d71b5a8	2020-09-03 16:16:59 -07:00
Meghan Lele	7816d53798	[JIT] Add mypy type annotations for JIT (#43862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43862 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23491151 Pulled By: SplitInfinity fbshipit-source-id: 88367b89896cf409bb9ac3db7490d6779efdc3a4	2020-09-03 15:09:24 -07:00
Michael Suo	9dd8670d7d	[jit] Better match behavior of loaded ScriptModules vs. freshly created ones (#43298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43298 IR emitter uses `ModuleValue` to represent ScriptModules and emit IR for attribute access, submodule access, etc. `ModuleValue` relies on two pieces of information, the JIT type of the module, and the `ConcreteModuleType`, which encapsulates Python-only information about the module. ScriptModules loaded from a package used to create a dummy ConcreteModuleType without any info in it. This led to divergences in behavior during compilation. This PR makes the two ways of constructing a ConcreteModuleType equivalent, modulo any py-only information (which, by definition, is never present in packaged files anyway). Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23228738 Pulled By: suo fbshipit-source-id: f6a660f42272640ca1a1bb8c4ee7edfa2d1b07cc	2020-09-03 15:03:39 -07:00
Michael Suo	74f18476a2	[jit] fix segfault in attribute lookup on loaded ScriptModules (#43284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43284 The IR emitter looks for attributes on modules like: 1. Check the JIT type for the attribute 2. Check the originating Python class, in order to fulfill requests for, e.g. static methods or ignored methods. In the case where you do: ``` inner_module = torch.jit.load("inner.pt") wrapped = Wrapper(inner_module) # wrap the loaded ScriptModule in an nn.Module torch.jit.script(wrapped) ``` The IR emitter may check for attributes on `inner_module`. There is no originating Python class for `inner_module`, since it was directly compiled from the serialized format. Due to a bug in the code, we don't guard for this case an a segfault results if the wrapper asks for an undefined attribute. The lookup in this case looks like: 1. Check the JIT type for the attribute (not there!) 2. Check the originating Python class (this is a nullptr! segfault!) This PR guards this case and properly just raises an attribute missing compiler error instead of segfaulting. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23224337 Pulled By: suo fbshipit-source-id: 0cf3060c427f2253286f76f646765ec37b9c4c49	2020-09-03 15:01:59 -07:00
Bram Wasti	e64879e180	[tensorexpr] Alias analysis tests (#44110 ) Summary: Some tests for alias analysis. The first aliases at the module level and the second at the input level. Please let me know if there are other alias situations! Pull Request resolved: https://github.com/pytorch/pytorch/pull/44110 Reviewed By: nickgg Differential Revision: D23509473 Pulled By: bwasti fbshipit-source-id: fbfe71a1d40152c8fbbd8d631f0a54589b791c34	2020-09-03 14:52:47 -07:00
Elias Ellison	6868bf95c6	[JIT] Fuser match on schemas not node kind (#44083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44083 Match on the complete schema of a node instead of its node kind when deciding to fuse it. Previously we matched on node kind, which could fail with something like `aten::add(int, int)` and if a new overload was added to an op without corresponding NNC support we would fuse it. Follow ups are: - bail when an output tensor type isnt uniquely determined by the input types (e.g. aten::add and the second input could be either a float or an int) - remove NNC lowering for _tanh_backward & _sigmoid_backward - Validate that we support all of the overloads here. I optimistically added ops that included Tensors, it's possible that we do not support every overload here. This isn't a regression, and this PR is at least improving our failures in that regard. I can do any of these as part of this PR if desired, but there are a number of failures people have run into that this PR fixes so I think it would be good to land this sooner than later. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23503704 Pulled By: eellison fbshipit-source-id: 3ce971fb1bc3a7f1cbaa38f1ed853e2db3d67c18	2020-09-03 14:47:19 -07:00
Ann Shan	9b3c72d46e	[pytorch] Make mobile find_method return an optional (#43965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43965 As part of a larger effort to unify the API between the lite interpreter and full JIT: - implement torch::jit::mobile::Method, a proxy for torch::jit::mobile::Function - add support for overloaded operator() to mobile Method and Function - mobile find_method now returns a c10::optional<Method> (so signature matches full jit) - moves some implementation of Function from module.cpp to function.cpp ghstack-source-id: 111161942 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D23330762 fbshipit-source-id: bf0ba0d711d9566c92af31772057ecd35983ee6d	2020-09-03 14:46:18 -07:00
Nikolay Korovaiko	f91bdbeabd	Enable function calls in TEFuser and SpecializeAutogradZero (#43866 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43866 Reviewed By: ezyang Differential Revision: D23452798 Pulled By: Krovatkin fbshipit-source-id: 2cff4c905bf1b5d9de56e7869458ffa6fce1f1b5	2020-09-03 14:42:52 -07:00
Zafar	e05fa2f553	[quant] Prep for conv_transpose packing (#39714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39714 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22087071 Pulled By: z-a-f fbshipit-source-id: 507f8a414026eb4c9926f68c1e94d2f56119bca6	2020-09-03 14:10:32 -07:00
Igor Sugak	352a32e7f3	[caffe2] fix clang build Summary: * multiple -Wpessimizing-moves * `static` within `__host__` `__device__` function Test Plan: ```lang=bash buck build -c fbcode.cuda_use_clang=true fblearner/flow/projects/dper:workflow ``` Reviewed By: andrewjcg Differential Revision: D23506573 fbshipit-source-id: 1490a1267e39e067d3ef836ef9b1cd5d7a28f724	2020-09-03 14:02:27 -07:00
Yanan Cao	f3da9e3b50	Enable Enum pickling/unpickling. (#43188 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/43188 Enable Enum pickling/unpickling. * https://github.com/pytorch/pytorch/issues/42963 Add Enum TorchScript serialization and deserialization support * https://github.com/pytorch/pytorch/issues/42874 Fix enum constant printing and add FileCheck to all Enum tests * https://github.com/pytorch/pytorch/issues/43121 Add Enum convert back to Python object support Pull Request resolved: https://github.com/pytorch/pytorch/pull/43188 Reviewed By: zdevito Differential Revision: D23365141 Pulled By: gmagogsfm fbshipit-source-id: f0c93d4ac614dec047ad8640eb6bd9c74159b558	2020-09-03 13:51:02 -07:00
Mikhail Zolotukhin	d0421ff1cc	Benchmarks: add scripts for FastRNNs results comparison. (#44134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44134 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23505810 Pulled By: ZolotukhinM fbshipit-source-id: d0b3d70d4c2a44a8c3773631d09a25a98ec59370	2020-09-03 13:44:42 -07:00
Rohan Varma	3806c939bd	Polish DDP join API docstrings (#43973 ) Summary: Polishes DDP join api docstrings and makes a few minor cosmetic changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43973 Reviewed By: zou3519 Differential Revision: D23467238 Pulled By: rohan-varma fbshipit-source-id: faf0ee56585fca5cc16f6891ea88032336b3be56	2020-09-03 13:39:45 -07:00
Nikita Shulga	442684cb25	Enable typechecks for torch.nn.modules.[activation\|upsampling] (#44093 ) Summary: Add missing `hardsigmoid`, `silu`, `hardswish` and `multi_head_attention_forward` to functional.pyi.in Embed some typing annotations into functional.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/44093 Reviewed By: ezyang Differential Revision: D23494384 Pulled By: malfet fbshipit-source-id: 27023c16ff5951ceaebb78799c4629efa25f7c5c	2020-09-03 13:20:04 -07:00
Kimish Patel	a153f69417	Fix replaceAtenConvolution for BC. (#44036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44036 Running replaceAtenConvolution on older traced model wont work as _convolution signature has changed and replaceAtenConvolution was changed to account for that. But we did not preserve the old behavior during that. This change restores the old behavior while keeing the new one. Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23476775 fbshipit-source-id: 73a0c2b7387f2a8d82a8d26070d0059972126836	2020-09-03 12:57:57 -07:00
Kimish Patel	ba65cce2a2	Fix transposed conv2d rewrite pattern to account for convolution api (#44035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44035 change Also added test so as to capture such cases for future. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Reviewed By: iseeyuan Differential Revision: D23476773 fbshipit-source-id: a62c4429351c909245106a70b4c60b1bacffa817	2020-09-03 12:55:43 -07:00
Bert Maher	55ff9aa185	Test TE fuser unary ops and fix sigmoid(half) (#44094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44094 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23494950 Pulled By: bertmaher fbshipit-source-id: 676c4e57267c4ad92065ea90b06323918dd5b0de	2020-09-03 12:48:46 -07:00
Nikita Shulga	bfa1fa5249	Update rocm-3.5.1 build job to rocm-3.7 (#44123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44123 Reviewed By: seemethere Differential Revision: D23504193 Pulled By: malfet fbshipit-source-id: 3570dc0aa879a3fdd43f3ecd41ee9e745006cfde	2020-09-03 12:39:30 -07:00
Gregory Chanan	49215d7f26	For CriterionTests, have check_gradgrad actually only affect gradgrad checks. (#44060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44060 Right now it skips grad checks as well. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23484018 Pulled By: gchanan fbshipit-source-id: 24a8f1af41f9918aaa62bc3cd78b139b2f8de1e1	2020-09-03 12:29:32 -07:00
Thomas Viehmann	42f9897983	Mark bucketize as not subject to autograd (#44102 ) Summary: Bucketize returns integers, currently this triggers an internal assert, so we apply the mechanism for this case (also used for argmax etc.). Pull Request resolved: https://github.com/pytorch/pytorch/pull/44102 Reviewed By: zou3519 Differential Revision: D23500048 Pulled By: albanD fbshipit-source-id: fdd869cd1feead6616b532b3e188bd5512adedea	2020-09-03 12:05:47 -07:00
Hector Yuen	91b0d1866a	add tanh + quantize unit test (#44076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44076 add fakelowp test for tanh + quantize Test Plan: net runner Reviewed By: venkatacrc Differential Revision: D23339662 fbshipit-source-id: 96c2cea12b41bf3df24aa46e601e053dca8e9481	2020-09-03 12:00:36 -07:00
Meghan Lele	de672e874d	[JIT] Improve error message for unsupported Optional types (#44054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44054 Summary This commit improves the error message that is printed when an `Optional` type annotation with an unsupported contained type is encountered. At present, the `Optional` is printed as-is, and `Optional[T]` is syntatic sugar for `Union[T, None]`, so that is what shows up in the error message and can be confusing. This commit modifies the error message so that it prints `T` instead of `Union[T, None]`. Test Plan Continuous integration. Example of old message: ``` AssertionError: Unsupported annotation typing.Union[typing.List, NoneType] could not be resolved. ``` Example of new message: ``` AssertionError: Unsupported annotation typing.Union[typing.List, NoneType] could not be resolved because typing.List could not be resolved. ``` Fixes This commit fixes #42859. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23490365 Pulled By: SplitInfinity fbshipit-source-id: 2aa9233718e78cf1ba3501ae11f5c6f0089e29cd	2020-09-03 11:55:06 -07:00
Mikhail Zolotukhin	d11603de38	[TensorExpr] Benchmarks: set number of profiling runs to 2 for PE. (#44112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44112 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23500904 Pulled By: ZolotukhinM fbshipit-source-id: d0dd54752b7ea5ae11f33e865c96d2d61e98d573	2020-09-03 11:29:35 -07:00
Jiakai Liu	b10c527a1f	[pytorch][bot] update mobile op deps (#44100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44100 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23496532 Pulled By: ljk53 fbshipit-source-id: 1e5b9059482e423960349d1361a7a98718c2d9ed	2020-09-03 11:24:26 -07:00
Jordan Fix	f96b91332f	[caffe2.proto] Add AOTConfig (#44020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44020 Pull Request resolved: https://github.com/pytorch/glow/pull/4853 Add AOT config Reviewed By: yinghai Differential Revision: D23414435 fbshipit-source-id: 3c48acf29889fcf63def37a48de382e675e0e1f3	2020-09-03 11:07:45 -07:00
Xingying Cheng	c59e11bfbb	Add soft error reporting to capture all the inference runtime failure. (#44078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44078 When PyTorch mobile inference failed and throw exception, if caller catch and not crash the app, we are not able to track all the inference failures. So we are adding native soft error reporting to capture all the failures occurring during module loading and running including both crashing and on-crashing failures. Since c10::Error has good error messaging stack handling (D21202891 (`a058e938f9`)), we are utilizing it for the error handling and message print out. ghstack-source-id: 111307080 Test Plan: Verified that the soft error reporting is sent through module.cpp when operator is missing, make sure a logview mid is generated with stack trace: https://www.internalfb.com/intern/logview/details/facebook_android_softerrors/5dd347d1398c1a9a73c804b20f7c2179/?selected-logview-tab=latest. Error message with context is logged below: ``` soft_error.cpp [PyTorchMobileInference] : Error occured during model running entry point: Could not run 'aten::embedding' with arguments from the 'CPU' backend. 'aten::embedding' is only available for these backends: [BackendSelect, Named, Autograd, Autocast, Batched, VmapMode]. BackendSelect: fallthrough registered at xplat/caffe2/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Named: registered at xplat/caffe2/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] Autograd: fallthrough registered at xplat/caffe2/aten/src/ATen/core/VariableFallbackKernel.cpp:31 [backend fallback] Autocast: fallthrough registered at xplat/caffe2/aten/src/ATen/autocast_mode.cpp:253 [backend fallback] Batched: registered at xplat/caffe2/aten/src/ATen/BatchingRegistrations.cpp:317 [backend fallback] VmapMode: fallthrough registered at xplat/caffe2/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] Exception raised from reportError at xplat/caffe2/aten/src/ATen/core/dispatch/OperatorEntry.cpp:261 (m ``` Reviewed By: iseeyuan Differential Revision: D23428636 fbshipit-source-id: 82d5d9c054300dff18d144f264389402d0b55a8a	2020-09-03 10:54:43 -07:00
Gregory Chanan	5973b44d9e	Rename NewCriterionTest to CriterionTest. (#44056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44056 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23482573 Pulled By: gchanan fbshipit-source-id: dde0f1624330dc85f48e5a0b9d98fb55fdb72f68	2020-09-03 10:29:20 -07:00
Daya Khudia	7d95eb8633	[fbgemm] manual submodule update (#44082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44082 Automated submodule is running into some test failures and I am not sure how can I rebase that. automated submodule update: https://github.com/pytorch/pytorch/pull/43817 Test Plan: CI tests Reviewed By: jianyuh Differential Revision: D23489240 fbshipit-source-id: a49b01786ebf0a59b719a0abf22398e1eafa90af	2020-09-03 10:07:46 -07:00
peter	c10f30647f	Fix CUDA debug nightly build failure (#44085 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43607. Tested in https://github.com/pytorch/pytorch/pull/44007. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44085 Reviewed By: malfet Differential Revision: D23493663 Pulled By: ezyang fbshipit-source-id: 4c01f3fc5a52814a23773a56b980c455851c2686	2020-09-03 09:12:52 -07:00
Sinan Nasir	98320061ad	DDP Communication hook: (Patch) Fix the way we pass future result to buckets. (#43734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43734 Following the additional GH comments on the original PR https://github.com/pytorch/pytorch/pull/43307. ghstack-source-id: 111327130 Test Plan: Run `python test/distributed/test_c10d.py` Reviewed By: smessmer Differential Revision: D23380288 fbshipit-source-id: 4b8889341c57b3701f0efa4edbe1d7bbc2a82ced	2020-09-03 08:59:10 -07:00
Gao, Xiang	768c2b0fb2	Fix THPVariable_float_scalar (#43842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43842 Reviewed By: ailzhang Differential Revision: D23426892 Pulled By: ezyang fbshipit-source-id: 63318721fb3f4a57d417f9a87e57c74f6d4e6e18	2020-09-03 08:39:41 -07:00
Richard Zou	b6e2b1eac7	BatchedFallback: stop emitting the entire schema in the fallback warning (#44051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44051 Instead, just emit the operator name. The entire schema is pretty wordy and doesn't add any additional information. Test Plan: - modified test: `pytest test/test_vmap.py -v` Reviewed By: ezyang Differential Revision: D23481184 Pulled By: zou3519 fbshipit-source-id: 9fbda61fc63565507b04c8b87e0e326a2036effa	2020-09-03 08:33:51 -07:00
Gregory Chanan	cae52b4036	Merge CriterionTest into NewCriterionTest. (#44055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44055 There is no functional change here. Another patch will rename NewCriterionTest to CriterionTest. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23482572 Pulled By: gchanan fbshipit-source-id: de364579067e2cc9de7df6767491f8fa3a685de2	2020-09-03 08:14:34 -07:00
Martin Yuan	15643de941	With fixes, Back out "Back out "Selective meta programming preparation for prim ops"" Summary: Original commit changeset: b2c712a512a2 Test Plan: CI Reviewed By: jiatongzhou Differential Revision: D23477710 fbshipit-source-id: 177ee56a82234376b7a5c3fc33441f8acfd59fea	2020-09-03 08:02:20 -07:00
Andrew Jones	24ca6aab02	Improves type-checking guards. (#43339 ) Summary: PR https://github.com/pytorch/pytorch/issues/38157 fixed type checking for mypy by including `if False` guards on some type-checker-only imports. However other typecheckers - [like pyright](https://github.com/microsoft/pylance-release/issues/262#issuecomment-677758245) - will respect this logic and ignore the imports. Using [`if TYPE_CHECKING`](https://docs.python.org/3/library/typing.html#typing.TYPE_CHECKING) instead means both mypy and pyright will work correctly. [For background, an example of where the current code fails](https://github.com/microsoft/pylance-release/issues/262) is if you make a file `tmp.py` with the contents ```python import torch torch.ones((1,)) ``` Then [`pyright tmp.py --lib`](https://github.com/microsoft/pyright#command-line) will fail with a `"ones" is not a known member of module` error. This is because it can't find the `_VariableFunctions.pyi` stub file, as pyright respects the `if False` logic. After adding the `TYPE_CHECKING` guard, all works correctly. Credit to erictraut for suggesting the fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43339 Reviewed By: agolynski Differential Revision: D23348142 Pulled By: ezyang fbshipit-source-id: c8a58122a7b0016845c311da39a1cc48748ba03f	2020-09-03 07:45:53 -07:00
Gao, Xiang	b6d5973e13	Delete THCStream.cpp (#43733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43733 Reviewed By: malfet Differential Revision: D23405121 Pulled By: ezyang fbshipit-source-id: 95fa80b5dcb11abaf4d2507af15646a98029c80d	2020-09-03 07:41:24 -07:00
Gregory Chanan	68a1fbe308	Allow criterion backwards test on modules requiring extra args (i.e. CTCLoss). (#44050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44050 We don't actually turn on the CTCLoss tests since they fail, but this allows you to toggle check_forward_only and for the code to actually run. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23481091 Pulled By: gchanan fbshipit-source-id: f2a3b0a2dee27341933c5d25f1e37a878b04b9f6	2020-09-03 07:41:21 -07:00
Gregory Chanan	5f89aa36cf	Actually run backward criterion tests. (#44030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44030 This looks to have been a mistake from https://github.com/pytorch/pytorch/pull/9287. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23476274 Pulled By: gchanan fbshipit-source-id: 81ed9d0c9a40d49153fc97cd69fdcd469bec0c73	2020-09-03 07:39:13 -07:00
Mike Ruberry	665feda15b	Adds opinfo-based autograd tests and (un)supported dtype tests (#43451 ) Summary: This PR adds a new test suite, test_ops.py, designed for generic tests across all operators with OpInfos. It currently has two kinds of tests: - it validates that the OpInfo has the correct supported dtypes by verifying that unsupported dtypes throw an error and supported dtypes do not - it runs grad and gradgrad checks on each op and its variants (method and inplace) that has an OpInfo This is a significant expansion and simplification of the current autogenerated autograd tests, which spend considerable processing their inputs. As an alternative, this PR extends OpInfos with "SampleInputs" that are much easier to use. These sample inputs are analogous to the existing tuples in`method_tests()`. Future PRs will extend OpInfo-based testing to other uses of `method_tests()`, like test_jit.py, to ensure that new operator tests can be implemented entirely using an OpInfo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43451 Reviewed By: albanD Differential Revision: D23481723 Pulled By: mruberry fbshipit-source-id: 0c2cdeacc1fdaaf8c69bcd060d623fa3db3d6459	2020-09-03 02:50:48 -07:00
Milind Yishu Ujjawal	ab7606702c	Rectified a few grammatical errors in documentation (#43695 ) Summary: Rectified a few grammatical errors in documentation of pytorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43695 Reviewed By: anjali411 Differential Revision: D23451600 Pulled By: ezyang fbshipit-source-id: bc7b34c240fde1b31cac811080befa2ff2989395	2020-09-02 23:59:45 -07:00
Mikhail Zolotukhin	40fec4e739	[TensorExpr] Fuser: do not fuse ops with 0-dim tensors. (#44073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44073 We don't have a proper support on NNC and JIT IR->NNC lowering side for it yet. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23487905 Pulled By: ZolotukhinM fbshipit-source-id: da0da7478fc8ce7b455176c95d8fd610c94352c1	2020-09-02 22:59:04 -07:00
Mikhail Zolotukhin	3da82aee03	[JIT] Remove profile nodes before BatchMM. (#43961 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43961 Currently we're removing prim::profile nodes and embed the type info directly in the IR right before the fuser, because it is difficult to fuse in a presence of prim::profile nodes. It turns out that BatchMM has a similar problem: it doesn't work when there are prim::profile nodes in the graph. These two passes run next to each other, so we could simply remove prim::profile nodes slightly earlier: before the BatchMM pass. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23453266 Pulled By: ZolotukhinM fbshipit-source-id: 92cb50863962109b3c0e0112e56c1f2cb7467ff1	2020-09-02 22:57:39 -07:00
Hong Xu	ae7699829c	Remove THC max and min, which are longer used (#43903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43903 Reviewed By: smessmer Differential Revision: D23493225 Pulled By: ezyang fbshipit-source-id: bc89d8221f3351da0ef3cff468ffe6a91dae96a6	2020-09-02 22:05:05 -07:00
Ksenija Stanojevic	32e0cedc53	[ONNX] Move tests to test_pytorch_onnx_onnxruntime (#42684 ) Summary: Move tests to test_pytorch_onnx_onnxruntime from test_utility_fun Pull Request resolved: https://github.com/pytorch/pytorch/pull/42684 Reviewed By: smessmer Differential Revision: D23480360 Pulled By: bzinodev fbshipit-source-id: 8876ba0a0c3e1d7104511de7a5cca5262b32f574	2020-09-02 21:47:38 -07:00
Xiang Gao	bc45c47aa3	Expand the coverage of test_addmm and test_addmm_sizes (#43831 ) Summary: - This test is very fast and very important, so it makes no sense in marking it as slowTest - This test is should also run on CUDA - This test should check alpha and beta support - This test should check `out=` support - manual computation should use list instead of index_put because list is much faster - precision for TF32 needs to be fixed. Will do it in future PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43831 Reviewed By: ailzhang Differential Revision: D23435032 Pulled By: ngimel fbshipit-source-id: d1b8350addf1e2fe180fdf3df243f38d95aa3f5a	2020-09-02 20:51:49 -07:00
Nikita Shulga	f5ba489f93	Move dependent configs to CUDA-10.2 (#44057 ) Summary: Move `multigpu`, `noavx` and `slow` test configs to CUDA-10.2, but keep them a master only tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/44057 Reviewed By: walterddr, seemethere Differential Revision: D23482732 Pulled By: malfet fbshipit-source-id: a6b050701cbc1d8f176ebb302f7f5076a78f1f58	2020-09-02 20:07:48 -07:00
X Wang	a76a56d761	Add "torch/testing/_internal/data/*.pt" to .gitignore (#43941 ) Summary: I usually get this extra "legacy_conv2d.pt" file in my git "changed files". I found that this is from tests with `download_file` `42c895de4d/test/test_nn.py (L410-L426)` and its definition (see `data_dir` for download output location) `f17d7a5556/torch/testing/_internal/common_utils.py (L1338-L1357)` I assume a file "generated" by test should not be tracked in VCS? Also, if the file is updated on the server, users may still use the old version of it if they have already downloaded that before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43941 Reviewed By: anjali411 Differential Revision: D23451264 Pulled By: ezyang fbshipit-source-id: 7fcdfb24685a7e483914cc46b3b024df798bf7f7	2020-09-02 20:00:31 -07:00
Gao, Xiang	37658b144b	Remove useless py2 compatibility import __future__, part 1 (#43808 ) Summary: To avoid conflicts, this PR does not remove all imports. More are coming in further PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43808 Reviewed By: wanchaol Differential Revision: D23436675 Pulled By: ailzhang fbshipit-source-id: ccc21a1955c244f0804277e9e47e54bfd23455cd	2020-09-02 19:15:11 -07:00
Hao Lu	b2a9c3baa9	[TVM] Support fp16 weights in c2_frontend (#44070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44070 Reviewed By: yinghai Differential Revision: D23444253 fbshipit-source-id: 0bfa98172dfae835eba5ca7cbe30383ba964c2a6	2020-09-02 19:07:35 -07:00
Mikhail Zolotukhin	b2aaf212aa	[TensorExpr] Add option to enforce TensorExprKernel fallbacks. (#43972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43972 It is useful when debugging a bug to disable NNC backend to see whether the bug is there or in the fuser logic. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23455624 Pulled By: ZolotukhinM fbshipit-source-id: f7c0452a29b860afc806e2d58acf35aa89afc060	2020-09-02 18:34:24 -07:00
Vasiliy Kuznetsov	6a6552576d	rename _min_max to _aminmax (#44001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44001 This is to align with the naming in numpy and in https://github.com/pytorch/pytorch/pull/43092 Test Plan: ``` python test/test_torch.py TestTorchDeviceTypeCPU.test_aminmax_cpu_float32 python test/test_torch.py TestTorchDeviceTypeCUDA.test_aminmax_cuda_float32 ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23465298 fbshipit-source-id: b599035507156cefa53942db05f93242a21c8d06	2020-09-02 18:07:55 -07:00
Vasiliy Kuznetsov	486a9fdab2	_min_max.dim: CUDA implementation (#42943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42943 Adds a CUDA kernel for _min_max_val.dim Test Plan: correctness: ``` python test/test_torch.py TestTorchDeviceTypeCUDA.test_minmax_cuda_float32 ``` performance: ~50% savings on a tensor representative of quantization workloads: https://gist.github.com/vkuzo/3e16c645e07a79dd66bcd50629ff5db0 Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23086797 fbshipit-source-id: 04a2d310f64a388d48ab8131538dbd287900ca4a	2020-09-02 18:07:51 -07:00
Vasiliy Kuznetsov	834279f4ab	_min_max_val.dim: CPU implementation (#42894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42894 Continuing the min_max kernel implementation, this PR adds the CPU path when a dim is specified. Next PR will replicate for CUDA. Note: after a discussion with ngimel, we are taking the fast path of calculating the values only and not the indices, since that is what is needed for quantization, and calculating indices would require support for reductions on 4 outputs which is additional work. So, the API doesn't fully match `min.dim` and `max.dim`. Flexible on the name, let me know if something else is better. Test Plan: correctness: ``` python test/test_torch.py TestTorchDeviceTypeCPU.test_minmax_cpu_float32 ``` performance: seeing a 49% speedup on a min+max tensor with similar shapes to what we care about for quantization observers (bench: https://gist.github.com/vkuzo/b3f24d67060e916128a51777f9b89326). For other shapes (more dims, different dim sizes, etc), I've noticed a speedup as low as 20%, but we don't have a good use case to optimize that so perhaps we can save that for a future PR. Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23086798 fbshipit-source-id: b24ce827d179191c30eccf31ab0b2b76139b0ad5	2020-09-02 18:07:47 -07:00
Vasiliy Kuznetsov	78994d165f	min_max kernel: add CUDA (#42868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42868 Adds a CUDA kernel for the _min_max function. Note: this is a re-submit of https://github.com/pytorch/pytorch/pull/41805, was faster to resubmit than to ressurect that one. Thanks to durumu for writing the original implementation! Future PRs will add index support, docs, and hook this up to observers. Test Plan: ``` python test/test_torch.py TestTorchDeviceTypeCUDA.test_minmax_cuda_float32 ``` Basic benchmarking shows a 50% reduction in time to calculate min + max: https://gist.github.com/vkuzo/b7dd91196345ad8bce77f2e700f10cf9 TODO Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23057766 fbshipit-source-id: 70644d2471cf5dae0a69343fba614fb486bb0891	2020-09-02 18:06:03 -07:00
Bert Maher	33d51a9b32	Respect canFuseOn{CPU,GPU} in TE fuser (#43967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43967 Test Plan: Imported from OSS Reviewed By: asuhan Differential Revision: D23469048 Pulled By: bertmaher fbshipit-source-id: 1005a7ae08974059ff9d467492caa3a388070eeb	2020-09-02 18:00:25 -07:00
Jiyuan Qian	041573c8cd	Add Cost Inference for AdaGrad and RowWiseSparseAdagrad Summary: Add cost inference for AdaGrad and RowWiseSparseAdagrad Test Plan: Ran `buck test caffe2/caffe2/python/operator_test:adagrad_test` Result: https://our.intern.facebook.com/intern/testinfra/testrun/5629499567799494 Reviewed By: bwasti Differential Revision: D23442607 fbshipit-source-id: 67800fb82475696512ad19a43067774247f8b230	2020-09-02 17:52:40 -07:00
iurii zdebskyi	2f044d4ee5	Fix CI build (#44068 ) Summary: Some of our machines have only 1 device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44068 Reviewed By: wanchaol Differential Revision: D23485730 Pulled By: izdeby fbshipit-source-id: df6bc0aba18feefc50c56a8f376103352fa2a2ea	2020-09-02 17:09:30 -07:00
anjali411	129f406062	Make torch.conj() a composite function and return self for real tensors (#43270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43270 `torch.conj` is a very commonly used operator for complex tensors, but it's mathematically a no op for real tensors. Switching to tensorflow gradients for complex tensors (as discussed in #41857) would involve adding `torch.conj()` to the backward definitions for a lot of operators. In order to preserve autograd performance for real tensors and maintain numpy compatibility for `torch.conj`, this PR updates `torch.conj()` which behaves the same for complex tensors but performs a view/returns `self` tensor for tensors of non-complex dtypes. The documentation states that the returned tensor for a real input shouldn't be mutated. We could perhaps return an immutable tensor for this case in future when that functionality is available (zdevito ezyang ). Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23460493 Pulled By: anjali411 fbshipit-source-id: 3b3bf0af55423b77ff2d0e29f5d2c160291ae3d9	2020-09-02 17:06:04 -07:00
Vasiliy Kuznetsov	f9efcb646b	fx quant: clarify state in Quantizer object (#43927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43927 Adds uninitialized placeholders for various state used throughout the Quantizer object, with documentation on what they are. No logic change. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23439473 fbshipit-source-id: d4ae83331cf20d81a7f974f88664ccddca063ffc	2020-09-02 16:34:00 -07:00
Lu Fang	f15e27265f	[torch.fx] Add support for custom op (#43248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43248 We add the support of __torch_function__ override for C++ custom op. The logic is the same as the other components, like torch.nn.Module. Refactored some code a little bit to make it reusable. Test Plan: buck test //caffe2/test:fx -- test_torch_custom_ops Reviewed By: bradleyhd Differential Revision: D23203204 fbshipit-source-id: c462a86e407e46c777171da32d7a40860acf061e	2020-09-02 16:08:37 -07:00
James Reed	7a77d1c5c2	[FX] Only copy over forward() from exec (#44006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44006 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23466542 Pulled By: jamesr66a fbshipit-source-id: 12a1839ddc65333e3e3d511eeb53206f06546a87	2020-09-02 15:35:49 -07:00
Jiakai Liu	402e9953df	[pytorch][bot] update mobile op deps (#44018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44018 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23470528 Pulled By: ljk53 fbshipit-source-id: b677e1c5677fc8929713ee108df69098502c50ea	2020-09-02 14:34:33 -07:00
Iurii Zdebskyi	297c938729	Add _foreach_add(TensorList tl1, TensorList tl2) and _foreach_add_(TensorList tl1, TensorList tl2) APIs (#42533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42533 [First PR: Add private API to support tensor lists: _foreach_add(TensorList tensors, Scalar scalar)](https://github.com/pytorch/pytorch/pull/41554). Motivation [GitHub issue](https://github.com/pytorch/pytorch/issues/38655) Current PyTorch optimizer implementations are not efficient in cases when we work with a lot of small feature tensors. Starting a lot of kernels slows down the whole process. We need to reduce the number of kernels that we start. As an example, we should be looking at [NVIDIAs Apex](https://github.com/NVIDIA/apex). In order to track progress, we will pick PyTorchs DCGAN model with Adam optimizer and once the optimizer is reimplemented with tensor lists, benchmark the model performance against original model version, Apexs version with original Adam optimizer and it’s FusedAdam optimizer. Current API restrictions - List can't be empty (will fixed in upcoming PRs). - All tensors in the list must have the same dtype, device and size. Broadcasting At this point we don't support broadcasting. What is 'Fast' and 'Slow' route In particular cases, we cant process an op with a fast list CUDA kernel. Still, we can do with a regular for-loop where the op will be applied to each tensor individually through the dispatch mechanisms. There are a few checks that decide whether the op will be performed via a 'fast' or 'slow' path. To go the fast route, - All tensors must have strided layout - All tensors must be dense and not have overlapping memory - The resulting tensor type must be the same. ---------------- In this PR - Adding a `_foreach_add(TensorList tl1, TensorList tl2)` API - Adding a `_foreach_add_(TensorList tl1, TensorList tl2)` API Tests Tested via unit tests TODO 1. Properly handle empty lists Plan for the next PRs 1. APIs - Binary Ops for list with Scalar - Binary Ops for list with list - Unary Ops for list - Pointwise Ops 2. Complete tasks from TODO 3. Rewrite PyTorch optimizers to use for-each operators for performance gains. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23331894 Pulled By: izdeby fbshipit-source-id: 876dd1bc82750f609b9e3ba23c8cad94d8d6041c	2020-09-02 12:18:28 -07:00
neginraoof	f6f9d22228	[ONNX] Export KLDivLoss (#41858 ) Summary: Enable export for KLDivLoss Pull Request resolved: https://github.com/pytorch/pytorch/pull/41858 Reviewed By: mrshenli Differential Revision: D22918004 Pulled By: bzinodev fbshipit-source-id: e3debf77a4cf0eae0df6ed5a72ee91c43e482b62	2020-09-02 11:45:13 -07:00
Joseph Spisak	4716284904	Update persons_of_interest.rst (#44031 ) Summary: Adding Geeta to the POI for TorchServe cc chauhang Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/44031 Reviewed By: jspisak Differential Revision: D23476439 Pulled By: soumith fbshipit-source-id: 6936d46c201e1437143d85e1dce24da355857628	2020-09-02 10:56:27 -07:00
Vasiliy Kuznetsov	b167402e2e	[redo] Fix SyncBatchNorm forward pass for non-default process group (#43861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43861 This is a redo of https://github.com/pytorch/pytorch/pull/38874, and fixing my original bug from https://github.com/pytorch/pytorch/pull/38246. Test Plan: CI Imported from OSS Reviewed By: supriyar Differential Revision: D23418816 fbshipit-source-id: 2a3a3d67fc2d03bb0bf30a87cce4e805ac8839fb	2020-09-02 10:44:46 -07:00
Elias Ellison	544a56ef69	[JIT] Always map node output in vmap (#43988 ) Summary: Previously when merging a node without a subgraph, we would merge the node's outputs to the corresponding subgraph values, but when merging a node with a subgraph the node's outputs would be absent in the value mapping. This PR makes it so they are included. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43988 Reviewed By: ZolotukhinM Differential Revision: D23462116 Pulled By: eellison fbshipit-source-id: 232c081261e9ae040df0accca34b1b96a5a5af57	2020-09-02 10:30:43 -07:00
Eli Uriegas	276158fd05	.circleci: Remove un-needed steps from binary builds (#43974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43974 We already install devtoolset7 in our docker images for binary builds and tclsh shouldn't be needed since we're not relying on unbuffer anymore Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D23462531 Pulled By: seemethere fbshipit-source-id: 83cbb8b0782054f0b543dab8d11fa6ac57685272	2020-09-02 09:57:52 -07:00
albanD	73f009a2aa	refactor manual function definitions (#43711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43711 this makes them available in forward if needed No change to the file content, just a copy-paste. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23454146 Pulled By: albanD fbshipit-source-id: 6269a4aaf02ed53870fadf8b769ac960e49af195	2020-09-02 09:23:21 -07:00
taivu	a6789074fc	Implement ChannelShuffle op with XNNPACK (#43602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43602 Test Plan: Imported from OSS Reviewed By: kimishpatel Differential Revision: D23334952 Pulled By: kimishpatel fbshipit-source-id: 858ef3db599b1c521ba3a1855c9a3c35fe3b02b0	2020-09-02 09:18:25 -07:00
Vasiliy Kuznetsov	df8da5cb5a	fx quant: make load_arg function more clear (#43923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43923 Readability improvements to `Quantizer.convert.load_arg`, makes things easier to read. 1. add docblock 2. `arg` -> `arg_or_args`, to match what's actually happening 3. `loaded_arg` -> `loaded_args`, to match what's actually happening Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23438745 fbshipit-source-id: f886b324d2e2e33458b72381499e37dccfc3bd30	2020-09-02 09:06:05 -07:00
Vasiliy Kuznetsov	77ef77e5fa	fx quant: rename matches -> is_match (#43914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43914 Renames `matches` function to `is_match`, since there is also a list named `matches` we are passing around in `Quantizer`, and would be good to decrease name conflicts. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23435601 fbshipit-source-id: 394af11e0120cfb07dedc79d5219247330d4dfd6	2020-09-02 09:06:01 -07:00
Vasiliy Kuznetsov	6f5282adc8	add quantization debug util to pretty print FX graphs (#43910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43910 Adds a debug function to get a representation of all nodes in the graph, such as ``` name op target args kwargs x plchdr x () {} linear_weight gt_prm linear.weight () {} add_1 cl_fun <bi_fun add> (x, linear_weight) {} linear_1 cl_mod linear (add_1,) {} relu_1 cl_meth relu (linear_1,) {} sum_1 cl_fun <bi_meth sum> (relu_1,) {'dim': -1} topk_1 cl_fun <bi_meth topk> (sum_1, 3) {} ``` using only Python STL. This is useful for printing internal state of graphs when working on FX code. Has some on-by-default logic to shorten things so that node reprs for toy models and unit tests fit into 80 chars. Flexible on function name and location, I care more that this is accessible from both inside PT as well as from debug scripts which are not checked in. Test Plan: see https://gist.github.com/vkuzo/ed0a50e5d6dc7442668b03bb417bd603 for example usage Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23435029 fbshipit-source-id: 1a2df797156a19cedd705e9e700ba7098b5a1376	2020-09-02 09:04:44 -07:00
kshitij12345	b6b5ebc345	Add `torch.vdot` (#43004 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42747 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43004 Reviewed By: mruberry Differential Revision: D23318935 Pulled By: anjali411 fbshipit-source-id: 12d4824b7cb42bb9ca703172c54ec5c663d9e325	2020-09-02 09:00:30 -07:00
Bram Wasti	14ebb2c67c	Allow no-bias MKLDNN Linear call (#43703 ) Summary: MKLDNN linear incorrectly assumes that bias is defined and will fail for no-bias calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43703 Reviewed By: glaringlee Differential Revision: D23373182 Pulled By: bwasti fbshipit-source-id: 1e817674838a07d237c02eebe235c386cf5b191e	2020-09-02 08:54:50 -07:00
Peter Bell	c88ac25679	Check for internal memory overlap in some indexing-type functions (#43423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43423 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23298652 Pulled By: zou3519 fbshipit-source-id: c13c59aec0c6967ef0d6365d782c1f4c98c04227	2020-09-02 08:51:50 -07:00
Peter Bell	5807bb92d3	TensorIteratorConfig: Check memory overlap by default (#43422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43422 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23298653 Pulled By: zou3519 fbshipit-source-id: a7b66a8a828f4b35e31e8be0c07e7fe9339181f2	2020-09-02 08:50:29 -07:00
Lillian Johnson	cd58114c6c	Adjust level of verbosity of debug dumps in graph executor T74227880 (#43682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43682 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23397980 Pulled By: Lilyjjo fbshipit-source-id: b0114efbd63b2a29eb14086b0a8963880023c2a8	2020-09-02 08:45:16 -07:00
taivu	8722952dbd	Add benchmark for channel_shuffle operator (#43509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43509 Test Plan: Imported from OSS Reviewed By: kimishpatel Differential Revision: D23299972 Pulled By: kimishpatel fbshipit-source-id: 6189d209859da5a41067eb9e8317e3bf7a0fc754	2020-09-02 08:15:19 -07:00
Bram Wasti	6512032699	[Static Runtime] Add OSS build for static runtime benchmarks (#43881 ) Summary: Adds CMake option. Build with: ``` BUILD_STATIC_RUNTIME_BENCHMARK=ON python setup.py install ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/43881 Reviewed By: hlu1 Differential Revision: D23430708 Pulled By: bwasti fbshipit-source-id: a39bf54e8d4d044a4a3e4273a5b9a887daa033ec	2020-09-02 08:00:18 -07:00
Gregory Chanan	c61a16b237	Kill dead code in common_nn as part of merging Criterion and NewCriterionTests. (#43956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43956 See https://github.com/pytorch/pytorch/pull/43769 and https://github.com/pytorch/pytorch/pull/43776 for proof this code is dead. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23452217 Pulled By: gchanan fbshipit-source-id: 6850aab2daaa1c321a6b7714f6f113f364f41973	2020-09-02 07:54:05 -07:00
Gregory Chanan	95f912ab13	Use NewCriterionTest in test_cpp_api_parity.py. (#43954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43954 CriterionTest is basically dead -- see https://github.com/pytorch/pytorch/pull/43769 and https://github.com/pytorch/pytorch/pull/43776. The only exception is the cpp parity test, but the difference there doesn't actually have any effect -- the get_target has unpack=True, but all examples don't require unpacking (I checked). As a pre-requisite for merging these tests, have the cpp parity test start using the NewCriterionTest. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23452144 Pulled By: gchanan fbshipit-source-id: 5dca1eb0878b882c93431d3b0e880b5bb1764522	2020-09-02 07:53:03 -07:00
Hong Xu	4bb5d33076	is_numpy_scalar should also consider bool and complex types (#43644 ) Summary: Before this PR, ```python import torch import numpy as np a = torch.tensor([1, 2], dtype=torch.bool) c = np.array([1, 2], dtype=np.bool) print(a[0] == c[0]) a = torch.tensor([1, 2], dtype=torch.complex64) c = np.array([1, 2], dtype=np.complex64) print(a[0] == c[0]) # This case is still broken a = torch.tensor([1 + 1j, 2 + 2j], dtype=torch.complex64) c = np.array([1 + 1j, 2 + 2j], dtype=np.complex64) print(a[0] == c[0]) ``` outputs ``` False False False ``` After this PR, it outputs: ``` tensor(True) /home/user/src/pytorch/torch/tensor.py:25: ComplexWarning: Casting complex values to real discards the imaginary part return f(args, *kwargs) tensor(True) tensor(False) ``` Related issue: https://github.com/pytorch/pytorch/issues/43579 cc anjali411 mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/43644 Reviewed By: ailzhang Differential Revision: D23425569 Pulled By: anjali411 fbshipit-source-id: a868209376b30cea601295e54015c47803923054	2020-09-02 07:41:50 -07:00
Yuchen Huang	7000c2efb5	[2/2][PyTorch][Mobile] Added mobile module metadata logging (#43853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43853 Add QPL logging for mobile module's metadata ghstack-source-id: 111113492 (Note: this ignores all push blocking failures!) Test Plan: - CI - Load the model trained by `mobile_model_util.py` - Local QPL logger standard output. {F319012106} Reviewed By: xcheng16 Differential Revision: D23417304 fbshipit-source-id: 7bc834f39e616be1eccfae698b3bccdf2f7146e5	2020-09-01 22:27:10 -07:00
generatedunixname89002005287564@sandcastle1415.cln1.facebook.com	1dd658f28f	[Codemod][GleanFbcode] Remove dead includes in caffe2/test (#43953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43953 Reviewed By: malfet Differential Revision: D23445556 fbshipit-source-id: 89cd6833aa06f35c5d3c99d698abb08cd61ae4ab	2020-09-01 21:48:28 -07:00
Sebastian Pop	c259146477	add missing NEON {vld1,vst1}_*_x2 intrinsics (#43683 ) Summary: Workaround for issue https://github.com/pytorch/pytorch/issues/43265. Add the missing intrinsics until gcc-7 gets the missing patches backported. Fixes https://github.com/pytorch/pytorch/issues/43265. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43683 Reviewed By: albanD Differential Revision: D23467867 Pulled By: malfet fbshipit-source-id: 7c138dd3de3c45852a60f2cfe8b4d7f7cf76bc7e	2020-09-01 21:19:39 -07:00
Jae Lee	137a4fcc3b	Back out "Selective meta programming preparation for prim ops" Summary: The diff D22618309 (`bacee6aa2e`) breaks CYA ACP e2e tests. (https://www.internalfb.com/intern/ods/chart/?rapido=%7B%22queries%22%3A[%7B%22entity%22%3A%22regex(assistant%5C%5C.cya%5C%5C..acp.)%2C%5Cn%2C%20!regex(assistant%5C%5C.cya%5C%5C..fair.)%2C%22%2C%22key%22%3A%22overview.pct_passed_x_1000%2C%22%2C%22transform%22%3A%22formula(%2F%20%241%201000.0)%2C%22%2C%22reduce_keys%22%3Atrue%2C%22datatypes%22%3A[%22raw%22]%2C%22reduce%22%3A%22%22%2C%22id%22%3A%22ds1%22%2C%22source%22%3A%22ods%22%2C%22active%22%3Atrue%7D]%2C%22period%22%3A%7B%22minutes_back%22%3A720%2C%22time_type%22%3A%22dynamic%22%7D%7D&view=%7B%22type%22%3A%22line_chart_client%22%2C%22params%22%3A%7B%22title%22%3A%22Pass%20Rates%20of%20All%20Continuous%20Runs%20in%20PROD%22%2C%22haspoints%22%3Afalse%2C%22state%22%3A%22published%22%2C%22title_use_v2%22%3Atrue%2C%22tooltip_outside%22%3Atrue%2C%22series_names_preg_replace_list%22%3A[%7B%22series_name_preg_replace_list_group%22%3Anull%2C%22pattern%22%3A%22%2Fassistant%5C%5C.cya%5C%5C.(%5C%5Cw%2B)%5C%5C.([%5E%3A]%2B)%3A%3A.*%2F%22%2C%22replacement%22%3A%22%241%2F%242%22%7D]%2C%22sort_by_series_name%22%3A%22ASC%22%2C%22use_y_axis_hints_as_limits%22%3Atrue%7D%7D&version=2) So I back out the diff. Test Plan: ``` cya test -n aloha.acp.arv2.prod --tp ~/tmp/cyaTests/assistant/cya/aloha_acp/whatsapp_call_who_ondevice_oacr.yaml --device_no_new_conn --retries 0 Installing: finished in 13.4 sec More details at https://www.internalfb.com/intern/buck/build/c48882e8-1032-43ca-ba8f-8 Running "aloha.acp.arv2.prod (acp)" [1 tests] with endpoint "https://prod.facebookvirtualassistant.com" . %100.0 tests passed: 1/1 Avg turn duration: 12.6s P99 turn duration: 24.4s CTP report: https://our.intern.facebook.com/intern/testinfra/testrun/2814749804232321 [jaeholee@32384.od ~/fbsource (7934576f)]$ ``` Differential Revision: D23464555 fbshipit-source-id: b2c712a512a207c4813585f4ee57fdb5607317c6	2020-09-01 21:05:45 -07:00
Xiang Gao	263412e536	Rename is_complex_t -> is_complex (#39906 ) Summary: `is_complex_t` is a bad name. For example in std, there are `std::is_same` but not `std::is_same_t`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39906 Reviewed By: mrshenli Differential Revision: D22665013 Pulled By: anjali411 fbshipit-source-id: 4b71745f5e2ea2d8cf5845d95ada4556c87e040d	2020-09-01 21:04:19 -07:00
Alex Suhan	9db90fe1f3	[TensorExpr] Remove unused functions in kernel.cpp (#43966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43966 Test Plan: build. Reviewed By: ZolotukhinM Differential Revision: D23456660 Pulled By: asuhan fbshipit-source-id: c13411b61cf62dd5d038e7246f79a8682822b472	2020-09-01 20:25:16 -07:00
Jerry Zhang	8fd9fe93be	[quant][graphmode][fx] Support dynamic quantization without calibration (#43952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43952 Run weight observer for dynamic quantization before inserting quant/dequant node Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D23452123 fbshipit-source-id: c322808fa8025bbadba36c2e5ab89f59e85de468	2020-09-01 19:09:48 -07:00
Rohan Varma	fbea2ee917	broadcast_object API for c10d (#43887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43887 As part of addressing #23232, this PR adds support for `broadcast_object_list` which is an API to broadcast arbitrary picklable objects to all the other ranks. This has been a long-requested feature, so would be good for Pytorch to natively support this. The implementation approach follows a similar approach as https://github.com/pytorch/pytorch/pull/42189. The input is a list of objects to be broadcasted and it is in place, meaning all ranks part of the group will have their input list modified to contain the broadcasted objects from the src rank. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. ghstack-source-id: 111180436 Reviewed By: mrshenli Differential Revision: D23422577 fbshipit-source-id: fa700abb86eff7128dc29129a0823e83caf4ab0e	2020-09-01 18:54:17 -07:00
Nikita Shulga	4134b7abfa	Pass CC env variable as ccbin argument to nvcc (#43931 ) Summary: This is the common behavior when one builds PyTorch (or any other CUDA project) using CMake, so it should be held true for Torch CUDA extensions as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43931 Reviewed By: ezyang, seemethere Differential Revision: D23441793 Pulled By: malfet fbshipit-source-id: 1af392107a94840331014fda970ef640dc094ae4	2020-09-01 17:26:08 -07:00
Jerry Zhang	0ffe3d84d5	[quant][graphmode][fx] Support dynamic quantization without calibration (#43892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43892 Run weight observer in the convert function, so user do not need to run calibration Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23429758 fbshipit-source-id: 5bc222e3b731789ff7a86463c449690a58dffb7b	2020-09-01 17:01:48 -07:00
Jerry Zhang	d15b9d980c	[quant][graphmode][fx][refactor] Move patterns to separate files (#43891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43891 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23429759 fbshipit-source-id: f19add96beb7c8bac323ad78f74588ca1393040c	2020-09-01 16:37:33 -07:00
James Reed	8d53df30ea	[FX] Better error when unpacking Proxy (#43740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43740 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D23380964 Pulled By: jamesr66a fbshipit-source-id: 9658ef1c50d0f9c4de38781a7485002487f6d3f7	2020-09-01 16:28:50 -07:00
yujunzhao@devvm537.prn0.facebook.com	ec7f14943c	[OSS] Update README.md -- Explain more complex arguments and functionalities Summary: Update `README.md` for oss to explain the usage of `--run` `--export` `--summary` Test Plan: Test locally. Reviewed By: malfet Differential Revision: D23431508 fbshipit-source-id: 368b8dd8cd5099f39c7f5bc985203c417bf7af39	2020-09-01 16:10:33 -07:00
Nikita Shulga	e49dd9fa05	Delete `raise_from` from `torch._six` (#43981 ) Summary: No need for compatibility wrapper in Python3+ world Pull Request resolved: https://github.com/pytorch/pytorch/pull/43981 Reviewed By: seemethere Differential Revision: D23458325 Pulled By: malfet fbshipit-source-id: 00f822895625f4867c22376fe558c50316f5974d	2020-09-01 15:46:18 -07:00
Gao, Xiang	5e97f251a8	Enable TF32 support for cuDNN (#40737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737 Reviewed By: mruberry Differential Revision: D22801525 Pulled By: ngimel fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2	2020-09-01 15:34:24 -07:00
yujunzhao@devvm1621.atn0.facebook.com	93fbbaab2a	Update `README.md` in oss (#43893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43893 Update `README.md` in oss, provide more examples, start from the most common use to specified use. Make `README.md` be more friendly and more specific. Test Plan: `README.md` doesn't need test. Reviewed By: malfet, seemethere Differential Revision: D23420203 fbshipit-source-id: 1a4c146393fbcaf2893321e7892740edf5d0c248	2020-09-01 14:58:28 -07:00
Randall Hunt	24eea364f7	Check SparseAdam params are dense on init (#41966 ) (#43668 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41966 Raises a value error if user attempts to create SparseAdam optimizer with sparse parameter tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43668 Reviewed By: glaringlee Differential Revision: D23388109 Pulled By: ranman fbshipit-source-id: 1fbcc7527d49eac6fae9ce51b3307c609a6ca38b	2020-09-01 14:25:59 -07:00
Martin Yuan	bacee6aa2e	Selective meta programming preparation for prim ops (#43540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43540 selected_mobile_ops.h is generated at BUCK build time, which contains the whitelist of root operators. It's used for templated selective build when XPLAT_MOBILE_BUILD is defined. ghstack-source-id: 111014372 Test Plan: CI and BSB Reviewed By: ljk53 Differential Revision: D22618309 fbshipit-source-id: ddf813904892f99c3f4ae0cd14ce8b27727be5a2	2020-09-01 13:51:44 -07:00
James Reed	a1a23669f2	[FX] Pickle serialization of GraphModule via forward source (#43674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43674 Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D23362396 Pulled By: jamesr66a fbshipit-source-id: cb8181edff70643b7bbe548cc6b0957328d4eedd	2020-09-01 13:31:18 -07:00
James Reed	73f7d63bc9	[FX] Support tensor-valued constants (#43666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43666 Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D23359110 Pulled By: jamesr66a fbshipit-source-id: 8569a2db0ef081ea7d8e81d7ba26a92bc12ed423	2020-09-01 13:30:04 -07:00
Hao Lu	06c277f38e	[TVM] Support slice op (#43969 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43969 Reviewed By: yinghai Differential Revision: D23413340 fbshipit-source-id: 20168bd573b81ce538e3589b72aba9590c3c055e	2020-09-01 12:34:30 -07:00
Emilio Castillo	5472426b9f	Reset `DataLoader` workers instead of creating new ones (#35795 ) Summary: This PR needs discussion as it changes the behavior of `DataLoader`. It can be closed if its not considered a good practice. Currently, the `DataLoader` spawns a new `_BaseDataLoaderIter` object every epoch, In the case of the multiprocess DataLoader, every epoch the worker processes are re-created and they make a copy of the original `Dataset` object. If users want to cache data or do some tracking on their datasets, all their data will be wiped out every epoch. Notice that this doesn't happen when the number of workers is 0. giving some inconsistencies with the multiprocess and serial data loaders. This PR keeps the `_BaseDataLoaderIter` object alive and just resets it within epochs, so the workers remain active and so their own `Dataset` objects. People seem to file issues about this often. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35795 Reviewed By: ailzhang Differential Revision: D23426612 Pulled By: VitalyFedyunin fbshipit-source-id: e16950036bae35548cd0cfa78faa06b6c232a2ea	2020-09-01 11:48:00 -07:00
yujunzhao@devvm1621.atn0.facebook.com	db6bd9d60b	rename input argunment `interested-folder` to `interest-only` -- be consistent with other arguments (#43889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43889 1. rename input argunment `interested-folder` to `interest-only` -- be consistent with `run-only`, `coverage-only` and be shorted Test Plan: Test on devserver and linux docker. Reviewed By: malfet Differential Revision: D23417338 fbshipit-source-id: ce9711e75ca3a1c30801ad6bd1a620f3b06819c5	2020-09-01 11:46:23 -07:00
Lingyi Liu	bc64efae48	Back out "Revert D19987020: [pytorch][PR] Add the sls tensor train op" (#43938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43938 resubmit Test Plan: unit test included Reviewed By: mruberry Differential Revision: D23443493 fbshipit-source-id: 7b68f8f7d1be58bee2154e9a498b5b6a09d11670	2020-09-01 11:42:12 -07:00
Nikita Shulga	7035cd0f84	Revert D23216393: Support work.result() to get result tensors for allreduce for Gloo, NCCL backends Test Plan: revert-hammer Differential Revision: D23216393 (`0b2694cd11`) Original commit changeset: fed5e37fbabb fbshipit-source-id: 27fbeb1617066fa3f271a681cb089622027d6689	2020-09-01 10:32:38 -07:00
Guilherme Leobas	63a0bb0ab9	Add typing annotations for torch.nn.quantized.dynamic.modules.rnn (#43186 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43185 xref: [gh-43072](https://github.com/pytorch/pytorch/issues/43072) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43186 Reviewed By: ezyang Differential Revision: D23441259 Pulled By: malfet fbshipit-source-id: 80265ae7f3a70f0087e620969dbd4aa8ca17c317	2020-09-01 10:25:10 -07:00
Rong Rong	8ca3913f47	Introduce BUILD_CAFFE2 flag (#43673 ) Summary: introduce BUILD_CAFFE2 flag. default to `ON`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43673 Reviewed By: malfet Differential Revision: D23381035 Pulled By: walterddr fbshipit-source-id: 1f4582987fa0c4a911f0b18d311c04fdbf8dd8f0	2020-09-01 10:18:23 -07:00
Jiakai Liu	76ca365661	[pytorch][bot] update mobile op deps (#43937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43937 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23443927 Pulled By: ljk53 fbshipit-source-id: 526ca08dfb5bd32527bff98b243da90dbbf2ea49	2020-09-01 10:07:52 -07:00
Lillian Johnson	e3cb582e05	Error printing extension support for multiline errors (#43807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43807 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23407457 Pulled By: Lilyjjo fbshipit-source-id: 05a6a50dc39c00474d9087ef56028a2c183aa53a	2020-09-01 10:02:43 -07:00
Ailing Zhang	224232032c	Move Autograd to an alias dispatch key (#43070 ) Summary: This PR moves `DispatchKey::Autograd` to an alias dispatch key mapping to `AutogradCPU, AutogradCUDA, AutogradXLA, AutogradOther, AutogradPrivate*` keys. A few things are handled in this PR: - Update alias dispatch key mapping and precompute dispatchTable logic - Move `Autograd` key from `always_included` set to TensorImpl constructor. - Update `dummyTensor` constructor to take `requires_grad` as optional argument so that it's closer to the real application in op_registration_test. - Use `BackendSelect` key for both backend select before and after autograd layer. (1 liner in backend_select codegen) A few planned followups ordered by priority: - [cleanup] Update `test_dispatch.py` to include testing `Autograd`. - [cleanup] Add Math alias key and move catchAll to Math. (to remove 2.2 in `computeDispatchTableEntryWithDebug`) - [new feature] Add support for Math in native_functions.yaml - [cleanup] Add iterator like functionality to DispatchKeySet - [cleanup/large] Only add Autograd backend keys when tensor requires grad. (cc: ljk53 ?) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43070 Reviewed By: ezyang Differential Revision: D23281535 Pulled By: ailzhang fbshipit-source-id: 9ad00b17142e9b83304f63cf599f785500f28f71	2020-09-01 09:05:29 -07:00
Heitor Schueroff de Souza	13a48ac1f3	MaxPool1d without indices optimization (#43745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43745 This is part of a larger effort to refactor and optimize the pooling code. Previously I started working on MaxPool2d here https://github.com/pytorch/pytorch/pull/43267 but since it uses MaxPool1d as a subroutine, it made more sense to work on 1D first and get it tested and optimized and then move up to 2D and then 3D. Below are some benchmarking results, the python script I used is under the results. ## Benchmarking ``` Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_googlenet[(3, 2, 0, 1, 0)-new] 79.7659 (1.03) 1,059.6327 (5.32) 90.6280 (1.01) 19.1196 (1.41) 84.2176 (1.01) 2.4289 (1.0) 1079;2818 11.0341 (0.99) 9055 1 test_googlenet[(3, 2, 0, 1, 0)-old] 505.1531 (6.55) 830.8962 (4.17) 563.4763 (6.29) 65.3974 (4.81) 538.3361 (6.43) 80.5371 (33.16) 242;99 1.7747 (0.16) 1742 1 test_googlenet[(3, 2, 0, 1, 1)-new] 80.2949 (1.04) 233.0020 (1.17) 97.6498 (1.09) 19.1228 (1.41) 89.2282 (1.07) 18.5743 (7.65) 1858;741 10.2407 (0.92) 9587 1 test_googlenet[(3, 2, 0, 1, 1)-old] 513.5350 (6.66) 977.4677 (4.91) 594.4559 (6.63) 69.9372 (5.15) 577.9080 (6.90) 79.8218 (32.86) 503;84 1.6822 (0.15) 1675 1 test_googlenet[(3, 2, 1, 1, 0)-new] 77.1061 (1.0) 199.1168 (1.0) 89.6529 (1.0) 13.5864 (1.0) 83.7557 (1.0) 7.5139 (3.09) 1419;1556 11.1541 (1.0) 7434 1 test_googlenet[(3, 2, 1, 1, 0)-old] 543.6055 (7.05) 964.5708 (4.84) 636.9867 (7.11) 84.0732 (6.19) 616.7777 (7.36) 100.4562 (41.36) 434;65 1.5699 (0.14) 1552 1 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_inception[(3, 2, 0, 1, 0)-new] 84.5827 (1.00) 184.2827 (1.0) 90.5438 (1.01) 9.6324 (1.0) 89.3027 (1.05) 4.5672 (1.03) 637;759 11.0444 (0.99) 6274 1 test_inception[(3, 2, 0, 1, 0)-old] 641.2268 (7.59) 1,704.8977 (9.25) 686.9383 (7.65) 57.2499 (5.94) 682.5905 (8.01) 58.3753 (13.17) 86;21 1.4557 (0.13) 802 1 test_inception[(3, 2, 0, 1, 1)-new] 84.5008 (1.0) 1,093.6335 (5.93) 89.8233 (1.0) 14.0443 (1.46) 85.2682 (1.0) 4.4331 (1.0) 802;1106 11.1330 (1.0) 9190 1 test_inception[(3, 2, 0, 1, 1)-old] 643.7078 (7.62) 851.4188 (4.62) 687.4905 (7.65) 41.1116 (4.27) 685.1386 (8.04) 60.2733 (13.60) 286;14 1.4546 (0.13) 1300 1 test_inception[(3, 2, 1, 1, 0)-new] 106.0739 (1.26) 258.5649 (1.40) 115.3597 (1.28) 17.5436 (1.82) 106.9643 (1.25) 5.5470 (1.25) 894;1402 8.6685 (0.78) 7635 1 test_inception[(3, 2, 1, 1, 0)-old] 651.0504 (7.70) 955.2278 (5.18) 698.0295 (7.77) 45.5097 (4.72) 692.8109 (8.13) 64.6794 (14.59) 145;15 1.4326 (0.13) 909 1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_large_batch_size[new] 2.9608 (1.0) 5.1127 (1.0) 3.3096 (1.0) 0.1936 (1.0) 3.3131 (1.0) 0.2093 (1.0) 71;6 302.1515 (1.0) 297 1 test_large_batch_size[old] 130.6583 (44.13) 152.9521 (29.92) 137.1385 (41.44) 7.4352 (38.40) 135.1784 (40.80) 5.1358 (24.53) 1;1 7.2919 (0.02) 7 1 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_large_channel_size[new] 2.9696 (1.0) 5.5595 (1.0) 3.5997 (1.0) 0.5836 (1.0) 3.3497 (1.0) 0.3445 (1.0) 58;54 277.8014 (1.0) 277 1 test_large_channel_size[old] 19.6838 (6.63) 22.6637 (4.08) 21.1775 (5.88) 0.8610 (1.48) 21.3739 (6.38) 1.4930 (4.33) 13;0 47.2199 (0.17) 36 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_large_width[new] 1.7714 (1.0) 2.4104 (1.0) 1.8988 (1.0) 0.0767 (1.0) 1.8911 (1.0) 0.0885 (1.0) 86;13 526.6454 (1.0) 373 1 test_large_width[old] 19.5708 (11.05) 22.8755 (9.49) 20.7987 (10.95) 0.7009 (9.14) 20.6623 (10.93) 0.8584 (9.70) 14;1 48.0799 (0.09) 46 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ test_multithreaded[new] 15.0560 (1.0) 24.2891 (1.0) 16.1627 (1.0) 1.5657 (1.0) 15.7182 (1.0) 0.7598 (1.0) 4;6 61.8709 (1.0) 65 1 test_multithreaded[old] 115.7614 (7.69) 120.9670 (4.98) 118.3004 (7.32) 1.6259 (1.04) 118.4164 (7.53) 1.9613 (2.58) 2;0 8.4531 (0.14) 8 1 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Legend: Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile. OPS: Operations Per Second, computed as 1 / Mean ``` ### Benchmarking script To run the benchmark make sure you have pytest-benchmark installed with `pip install pytest-benchmark` and use the following command: `pytest benchmark.py --benchmark-sort='name'` ``` import torch import pytest def _test_speedup(benchmark, batches=1, channels=32, width=32, kernel_size=2, stride=None, padding=0, dilation=1, ceil_mode=False, return_indices=False): torch.set_num_threads(1) x = torch.randn((batches, channels, width)) model = torch.nn.MaxPool1d(kernel_size, stride, padding, dilation, return_indices, ceil_mode) benchmark(model, x) pytest.mark.benchmark(group="inception") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) pytest.mark.parametrize("params", [(3, 2), (3, 2, 0, 1, True), (3, 2, 1)], ids=["(3, 2, 0, 1, 0)", "(3, 2, 0, 1, 1)", "(3, 2, 1, 1, 0)"]) def test_inception(benchmark, params, return_indices): _test_speedup(benchmark, 10, 64, 147, params, return_indices=return_indices) pytest.mark.benchmark(group="googlenet") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) pytest.mark.parametrize("params", [(3, 2), (3, 2, 0, 1, True), (3, 2, 1)], ids=["(3, 2, 0, 1, 0)", "(3, 2, 0, 1, 1)", "(3, 2, 1, 1, 0)"]) def test_googlenet(benchmark, params, return_indices): _test_speedup(benchmark, 10, 64, 112, params, return_indices=return_indices) pytest.mark.benchmark(group="large batch size") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_large_batch_size(benchmark, return_indices): _test_speedup(benchmark, 100000, 1, 32, return_indices=return_indices) pytest.mark.benchmark(group="large channel size") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_large_channel_size(benchmark, return_indices): _test_speedup(benchmark, 1, 100000, 32, return_indices=return_indices) pytest.mark.benchmark(group="large width") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_large_width(benchmark, return_indices): _test_speedup(benchmark, 1, 32, 100000, return_indices=return_indices) pytest.mark.benchmark(group="multithreading") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_multithreaded(benchmark, return_indices): x = torch.randn((40, 10000, 32)) model = torch.nn.MaxPool1d(2, return_indices=return_indices) benchmark(model, x) ``` ## Discussion The new algorithm is on average 7x faster than the old one. But because the old algorithm had many issues with how it parallelized the code and made use of the cache, one can come up with input parameters (like large batch size) that will make the new algorithm much faster than the original one. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D23425348 Pulled By: heitorschueroff fbshipit-source-id: 3fa3f9b8e71200da48424a95510124a83f50d7b2	2020-09-01 08:40:01 -07:00
Oscar Sandoval	a044c039c0	updated documentation to streamline setup (#42850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42850 Reviewed By: mrshenli Differential Revision: D23449055 Pulled By: osandoval-fb fbshipit-source-id: 6db695d4fe5f6d9b7bb2895c85c855db4779516b	2020-09-01 08:25:48 -07:00
Xiang Gao	b1f19c20d6	Run function check and out check in TestTensorDeviceOps (#43830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43830 Reviewed By: ailzhang Differential Revision: D23438101 Pulled By: mruberry fbshipit-source-id: b581ce779ea2f50ea8dfec51d5469031ec7a0a67	2020-09-01 08:21:53 -07:00
Richard Zou	9b98bcecfa	torch.cat and torch.stack batching rules (#43798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43798 These are relatively straightforward. Test Plan: - `pytest test/test_vmap.py -v` Reviewed By: ezyang Differential Revision: D23405000 Pulled By: zou3519 fbshipit-source-id: 65c78da3dee43652636bdb0a65b636fca69e765d	2020-09-01 08:12:46 -07:00
Richard Zou	dbc4218f11	Batching rules for: torch.bmm, torch.dot (#43781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43781 Test Plan: - `pytest test/test_vmap.py -v` Reviewed By: ezyang Differential Revision: D23400843 Pulled By: zou3519 fbshipit-source-id: a901bba6dc2d8435d314cb4dac85bbd5cd4ee2a5	2020-09-01 08:12:43 -07:00
Richard Zou	fa12e225d3	Batching rule for torch.mv (#43780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43780 The general strategy is: - unsqueeze the physical inputs enough - pass the unsqueezed physical inputs to at::matmul - squeeze any extra dimensions Test Plan: - `pytest test/test_vmap.py -v` Reviewed By: ezyang Differential Revision: D23400842 Pulled By: zou3519 fbshipit-source-id: c550eeb935747c08e3b083609ed307a4374b9096	2020-09-01 08:12:41 -07:00
Richard Zou	2789a4023b	TestVmapOperators: add structured tests that batching rules get invoked (#43731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43731 After this PR, for each test in TestVmapOperators, TestVmapOperators tests that the test never invokes the slow vmap fallback path. The rationale behind this change is that TestVmapOperators is used for testing batching rules and we want confidence that the batching rules actually get invoked. We set this up using a similar mechanism to the CUDA memory leak check: (`bff741a849/torch/testing/_internal/common_utils.py (L506-L511)`) This PR also implements the batching rule for `to.dtype_layout`; the new testing caught that we were testing vmap on `to.dtype_layout` but it didn't actually have a batching rule implemented! Test Plan: - New tests in `pytest test/test_vmap.py -v` that test the mechanism. Reviewed By: ezyang Differential Revision: D23380729 Pulled By: zou3519 fbshipit-source-id: 6a4b97a7fa7b4e1c5be6ad80d6761e0d5b97bb8c	2020-09-01 08:11:35 -07:00
Mehdi Mirzazadeh	0b2694cd11	Support work.result() to get result tensors for allreduce for Gloo, NCCL backends (#43386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43386 Resolves #43178 ghstack-source-id: 111109716 Test Plan: Added checks to existing unit test and ran it on gpu devserver. Reviewed By: rohan-varma Differential Revision: D23216393 fbshipit-source-id: fed5e37fbabbd2ac4a9055b20057fffe3c416c0b	2020-09-01 08:05:55 -07:00
Gregory Chanan	a67246b2d4	Add reduction string test for ctc_loss. (#43884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43884 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23427907 Pulled By: gchanan fbshipit-source-id: 889bd92e9d3e0528b57e3952fc83e25bc7abe293	2020-09-01 07:01:54 -07:00
Vincent QB	fab012aa28	Revert "Added support for Huber Loss (#37599 )" (#43351 ) Summary: This reverts commit 11e5174926d807a540fc7b54fb45a26ec0c5d9c0 due to [comment](https://github.com/pytorch/pytorch/pull/37599#pullrequestreview-471950192). Pull Request resolved: https://github.com/pytorch/pytorch/pull/43351 Reviewed By: pbelevich, seemethere Differential Revision: D23249511 Pulled By: vincentqb fbshipit-source-id: 18b8b346f00eaf0ef7376b06579d404a84add4de	2020-09-01 06:34:26 -07:00
Bert Maher	c14a3613a8	Fix NaN propagation in TE fuser's min/max implementation (#43609 ) Summary: Per eager mode source-of-truth, NaNs shall be propagated by min/max. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43609 Reviewed By: ZolotukhinM Differential Revision: D23349184 Pulled By: bertmaher fbshipit-source-id: 094eb8b89a02b27d5ecf3988d0f473c0f91e4afb	2020-09-01 02:10:13 -07:00
Ksenija Stanojevic	820c4b05a9	[ONNX] Update slice symbolic function (#42935 ) Summary: During scripting, combination of shape (or size()) and slice (e.g x.shape[2:]) produces following error: slice() missing 1 required positional argument: 'step' This happens because aten::slice has 2 signatures: - aten::slice(Tensor self, int dim, int start, int end, int step) -> Tensor - aten::slice(t[] l, int start, int end, int step) -> t[] and when a list is passed instead of tensor the 2nd of the two slice signatures is called, and since it has 4 instead of 5 arguments it produces the above exception. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42935 Reviewed By: houseroad Differential Revision: D23398435 Pulled By: bzinodev fbshipit-source-id: 4151a8f878c520cea199b265973fb476b17801fe	2020-09-01 02:08:48 -07:00
Pritam Damania	f1624b82b5	Preserve python backtrace in autograd engine errors. (#43684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43684 This PR attempts to address #42560 by capturing the appropriate exception_ptr in the autograd engine and passing it over to the Future. As part of this change, there is a significant change the Future API where we now only accept an exception_ptr as part of setError. For the example in #42560, the exception trace would now look like: ``` > Traceback (most recent call last): > File "test_autograd.py", line 6914, in test_preserve_backtrace > Foo.apply(t).sum().backward() > File "torch/tensor.py", line 214, in backward > torch.autograd.backward(self, gradient, retain_graph, create_graph) > File "torch/autograd/__init__.py", line 127, in backward > allow_unreachable=True) # allow_unreachable flag > File "torch/autograd/function.py", line 87, in apply > return self._forward_cls.backward(self, *args) > File "test_autograd.py", line 6910, in backward > raise ValueError("something") > ValueError: something ``` ghstack-source-id: 111109637 Test Plan: waitforbuildbot Reviewed By: albanD Differential Revision: D23365408 fbshipit-source-id: 1470c4776ec8053ea92a6ee1663460a3bae6edc5	2020-09-01 01:28:47 -07:00
Jerry Zhang	825c109eb7	[reland][quant][graphmode][fx] Add support for weight prepack folding (#43728 ) (#43902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43902 Trace back from the weight node util we hit getattr, reconstruct the graph module with the traced nodes and run the graph module to pack the weight. then replace the original chain of ops with the packed weight. Test Plan: Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D23432431 fbshipit-source-id: 657f21a8287494f7f87687a9d618ca46376d3aa3	2020-09-01 00:26:19 -07:00
Hong Xu	6da26cf0d9	Update torch.range warning message regarding the removal version number (#43569 ) Summary: `torch.range` still hasn't been removed way after version 0.5. This PR fixes the warning message. Alternatively, we can remove `torch.range`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43569 Reviewed By: ngimel Differential Revision: D23408233 Pulled By: mruberry fbshipit-source-id: 86c4f9f018ea5eddaf80b78a3c54dfa41cfc6fa6	2020-08-31 22:23:32 -07:00
Alex Suhan	85d91a3230	[TensorExpr] Check statements in test_kernel.cpp (#43911 ) Summary: Check statements and fix all the warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43911 Test Plan: test_tensorexpr Reviewed By: ZolotukhinM Differential Revision: D23441092 Pulled By: asuhan fbshipit-source-id: f671eef4b4eb9b51acb15054131152ae650fedbd	2020-08-31 22:16:25 -07:00
Ailing Zhang	f229d2c07b	Revert D23335106: [quant][graphmode][fix] Fix insert quant dequant for observers without qparams Test Plan: revert-hammer Differential Revision: D23335106 (`602209751e`) Original commit changeset: 84af2884d521 fbshipit-source-id: 8d227fe2048b532016407d8ecfbaa6ffd1c313fd	2020-08-31 22:12:37 -07:00
Leon Gao	69080e9e7e	simplify profile text output by displaying only top-level ops statistics (#42262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42262 Test Plan: Imported from OSS ``` ================================================================================================================================================================================== TEST ----------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Input Shapes ----------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------------------------------------- aten::add_ 3.61% 462.489us 3.61% 462.489us 462.489us 1 [[3, 20], [3, 20], []] aten::slice 1.95% 249.571us 1.95% 250.018us 250.018us 1 [[3, 80], [], [], [], []] aten::lstm 1.89% 242.534us 22.41% 2.872ms 2.872ms 1 [[5, 3, 10], [], [], [], [], [], [], [], []] aten::lstm 1.68% 215.852us 18.18% 2.330ms 2.330ms 1 [[5, 3, 10], [], [], [], [], [], [], [], []] aten::lstm 1.68% 215.767us 18.49% 2.370ms 2.370ms 1 [[5, 3, 10], [], [], [], [], [], [], [], []] aten::lstm 1.60% 205.014us 20.15% 2.582ms 2.582ms 1 [[5, 3, 10], [], [], [], [], [], [], [], []] aten::lstm 1.55% 198.213us 18.53% 2.375ms 2.375ms 1 [[5, 3, 10], [], [], [], [], [], [], [], []] aten::addmm 0.95% 122.359us 1.01% 129.857us 129.857us 1 [[80], [3, 20], [20, 80], [], []] aten::stack 0.29% 36.745us 0.63% 80.179us 80.179us 1 [[], []] aten::add_ 0.28% 35.694us 0.28% 35.694us 35.694us 1 [[3, 20], [3, 20], []] ----------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------------------------------------- Self CPU time total: 12.817ms ----------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Input Shapes ----------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------------------------------------- aten::mul 11.45% 1.467ms 12.88% 1.651ms 11.006us 150 [[3, 20], [3, 20]] aten::lstm 8.41% 1.077ms 97.76% 12.529ms 2.506ms 5 [[5, 3, 10], [], [], [], [], [], [], [], []] aten::addmm 7.65% 979.982us 11.38% 1.459ms 29.182us 50 [[80], [3, 20], [20, 80], [], []] aten::sigmoid_ 6.78% 869.295us 9.74% 1.249ms 8.327us 150 [[3, 20]] aten::add_ 5.82% 745.801us 5.82% 745.801us 14.916us 50 [[3, 20], [3, 20], []] aten::slice 5.58% 715.532us 6.61% 847.445us 4.237us 200 [[3, 80], [], [], [], []] aten::unsafe_split 4.24% 544.015us 13.25% 1.698ms 33.957us 50 [[3, 80], [], []] aten::tanh 3.11% 398.881us 6.05% 775.024us 15.500us 50 [[3, 20]] aten::empty 3.04% 389.055us 3.04% 389.055us 1.319us 295 [[], [], [], [], [], []] aten::sigmoid 2.96% 379.686us 2.96% 379.686us 2.531us 150 [[3, 20], [3, 20]] ----------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------------------------------------- Self CPU time total: 12.817ms ================================================================================================================================================================================== TEST ================================================================================================================================================================================== This report only display top-level ops statistics ----------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Input Shapes ----------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------------------------------------- aten::lstm 1.89% 242.534us 22.41% 2.872ms 2.872ms 1 [[5, 3, 10], [], [], [], [], [], [], [], []] aten::lstm 1.68% 215.852us 18.18% 2.330ms 2.330ms 1 [[5, 3, 10], [], [], [], [], [], [], [], []] aten::lstm 1.68% 215.767us 18.49% 2.370ms 2.370ms 1 [[5, 3, 10], [], [], [], [], [], [], [], []] aten::lstm 1.60% 205.014us 20.15% 2.582ms 2.582ms 1 [[5, 3, 10], [], [], [], [], [], [], [], []] aten::lstm 1.55% 198.213us 18.53% 2.375ms 2.375ms 1 [[5, 3, 10], [], [], [], [], [], [], [], []] ----------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------------------------------------- Self CPU time total: 12.817ms ================================================================================================================================================================================== This report only display top-level ops statistics ----------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Input Shapes ----------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------------------------------------- aten::lstm 8.41% 1.077ms 97.76% 12.529ms 2.506ms 5 [[5, 3, 10], [], [], [], [], [], [], [], []] ----------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------------------------------------- Self CPU time total: 12.817ms Total time based on python measurements: 13.206ms CPU time measurement python side overhead: 3.03% ``` Reviewed By: ilia-cher Differential Revision: D22830328 Pulled By: ilia-cher fbshipit-source-id: c9a71be7b23a8f84784117c788faa43caa96f545	2020-08-31 21:41:40 -07:00
Kurt Mohler	d7ee84c9b5	Update determinism documentation (#41692 ) Summary: Add user-facing documentation for set_deterministic Also update grammar and readability in Reproducibility page Issue https://github.com/pytorch/pytorch/issues/15359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41692 Reviewed By: ailzhang Differential Revision: D23433061 Pulled By: mruberry fbshipit-source-id: 4c4552950803c2aaf80f7bb4792d2095706d07cf	2020-08-31 21:06:24 -07:00
Hong Xu	69fbc705d8	Remained changes of #43578 (#43921 ) Summary: Not full https://github.com/pytorch/pytorch/issues/43578 was merged. This PR is the remained part. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43921 Reviewed By: ailzhang Differential Revision: D23438504 Pulled By: mruberry fbshipit-source-id: 9c5e26346dfc423b7a440b8a986420a27349090f	2020-08-31 20:42:07 -07:00
Jianyu Huang	3c2f6d2ecf	[caffe2] Extend dedup SparseAdagrad fusion with stochastic rounding FP16 (#43124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43124 Add the stochastic rounding FP16 support for dedup version of SparseAdagrad fusion. ghstack-source-id: 111037723 Test Plan: ``` buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient $caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps$' --print-passing-details ``` https://our.intern.facebook.com/intern/testinfra/testrun/5629499566042000 ``` buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_mean_gradient $caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps$' --print-passing-details ``` https://our.intern.facebook.com/intern/testinfra/testrun/1125900076333177 Reviewed By: xianjiec Differential Revision: D22893851 fbshipit-source-id: 81c7a7fe4b0d2de0e6b4fc965c5d23210213c46c	2020-08-31 20:35:22 -07:00
Akihiro Nitta	f17d7a5556	Fix exception chaining in `torch/` (#43836 ) Summary: ## Motivation Fixes https://github.com/pytorch/pytorch/issues/43770. ## Description of the change This PR fixes exception chaining only in files under `torch/` where appropriate. To fix exception chaining, I used either: 1. `raise new_exception from old_exception` where `new_exception` itself seems not descriptive enough to debug or `old_exception` delivers valuable information. 2. `raise new_exception from None` where raising both of `new_exception` and `old_exception` seems a bit noisy and redundant. I subjectively chose which one to use from the above options. ## List of lines containing raise in except clause: I wrote [this simple script](https://gist.github.com/akihironitta/4223c1b32404b36c1b349d70c4c93b4d) using [ast](https://docs.python.org/3.8/library/ast.html#module-ast) to list lines where `raise`ing in `except` clause. - [x] `000739c31a/torch/jit/annotations.py (L35)` - [x] `000739c31a/torch/jit/annotations.py (L150)` - [x] `000739c31a/torch/jit/annotations.py (L158)` - [x] `000739c31a/torch/jit/annotations.py (L231)` - [x] `000739c31a/torch/jit/_trace.py (L432)` - [x] `000739c31a/torch/nn/utils/prune.py (L192)` - [x] `000739c31a/torch/cuda/nvtx.py (L7)` - [x] `000739c31a/torch/utils/cpp_extension.py (L1537)` - [x] `000739c31a/torch/utils/tensorboard/_pytorch_graph.py (L292)` - [x] `000739c31a/torch/utils/data/dataloader.py (L835)` - [x] `000739c31a/torch/utils/data/dataloader.py (L849)` - [x] `000739c31a/torch/utils/data/dataloader.py (L856)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L186)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L189)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L424)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L1279)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L1283)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L1356)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L1388)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L1391)` - [ ] `000739c31a/torch/testing/_internal/common_utils.py (L1412)` - [x] `000739c31a/torch/testing/_internal/codegen/random_topo_test.py (L310)` - [x] `000739c31a/torch/testing/_internal/codegen/random_topo_test.py (L329)` - [x] `000739c31a/torch/testing/_internal/codegen/random_topo_test.py (L332)` - [x] `000739c31a/torch/testing/_internal/jit_utils.py (L183)` - [x] `000739c31a/torch/testing/_internal/common_nn.py (L4789)` - [x] `000739c31a/torch/onnx/utils.py (L367)` - [x] `000739c31a/torch/onnx/utils.py (L659)` - [x] `000739c31a/torch/onnx/utils.py (L892)` - [x] `000739c31a/torch/onnx/utils.py (L897)` - [x] `000739c31a/torch/serialization.py (L108)` - [x] `000739c31a/torch/serialization.py (L754)` - [x] `000739c31a/torch/distributed/rpc/_testing/faulty_agent_backend_registry.py (L76)` - [x] `000739c31a/torch/distributed/rpc/backend_registry.py (L260)` - [x] `000739c31a/torch/distributed/distributed_c10d.py (L184)` - [x] `000739c31a/torch/_utils_internal.py (L57)` - [x] `000739c31a/torch/hub.py (L494)` - [x] `000739c31a/torch/contrib/_tensorboard_vis.py (L16)` - [x] `000739c31a/torch/distributions/lowrank_multivariate_normal.py (L100)` - [x] `000739c31a/torch/distributions/constraint_registry.py (L142)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/43836 Reviewed By: ailzhang Differential Revision: D23431212 Pulled By: malfet fbshipit-source-id: 5f7f41b391164a5ad0efc06e55cd58c23408a921	2020-08-31 20:26:23 -07:00
Ralf Gommers	da32bf4cc6	Move type annotations for remaining torch.utils stub files inline (#43406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43406 Reviewed By: mruberry Differential Revision: D23319736 Pulled By: malfet fbshipit-source-id: e25fbb49f27aa4893590b022441303d6d98263a9	2020-08-31 18:44:09 -07:00
Jerry Zhang	602209751e	[quant][graphmode][fix] Fix insert quant dequant for observers without qparams (#43606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43606 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23335106 fbshipit-source-id: 84af2884d52118c069fc43a9f166dc336a8a87c8	2020-08-31 18:27:53 -07:00
Jerry Zhang	7db7da7151	[reland][quant][graphmode][fx] Add top level APIs (#43581 ) (#43901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43901 Add similar APIs like eager and graph mode on torchscript - fuse_fx - quantize_fx (for both post training static and qat) - quantize_dynamic_fx (for post training dynamic) - prepare_fx (for both post training static and qat) - prepare_dynamic_fx (for post training dynamic) - convert_fx (for all modes) Test Plan: Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D23432430 fbshipit-source-id: fc99eb75cbecd6ee7a3aa6c8ec71cd499ff7e3c1	2020-08-31 18:24:26 -07:00
Alex Suhan	deb5fde51c	[TensorExpr] Make KernelSumMultipleAxes much faster (#43905 ) Summary: Reduce input size, skip the dtype conversion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43905 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.KernelSum* Reviewed By: ailzhang Differential Revision: D23433398 Pulled By: asuhan fbshipit-source-id: 0d95ced3c1382f10595a9e5745bf4bef007cc913	2020-08-31 17:58:43 -07:00
Ksenija Stanojevic	ee53a335c0	[ONNX] Floordiv (#43022 ) Summary: Add export of floordiv op Pull Request resolved: https://github.com/pytorch/pytorch/pull/43022 Reviewed By: houseroad Differential Revision: D23398493 Pulled By: bzinodev fbshipit-source-id: f929a88b3bc0c3867e8fbc4e50afdf0c0c71553d	2020-08-31 17:54:40 -07:00
Qi Zhou	f73ba88946	Avoid resizing in MinMaxObserver (#43789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43789 Since it's single element.. In some cases we may not be able to resize the buffers. Test Plan: unit tests Reviewed By: supriyar Differential Revision: D23393108 fbshipit-source-id: 46cd7f73ed42a05093662213978a01ee726433eb	2020-08-31 17:41:39 -07:00
Mikhail Zolotukhin	98b846cd1d	[JIT] Remove loop peeling from the profiling executor pipeline. (#43847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43847 It seems to slowdown two fastRNN benchmarks and does not speed up others. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23416197 Pulled By: ZolotukhinM fbshipit-source-id: 598144561979e84bcf6bccf9b0ca786f5af18383	2020-08-31 17:26:55 -07:00
Mikhail Zolotukhin	d69d603061	[JIT] Specialize autograd zero: actually remove the original graph after we created its versioned copy. (#43900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43900 The original code assumed that the versioning if was inserted in the beginning of the graph while in fact it was inserted in the end. We're now also not removing `profile_optional` nodes and rely on DCE to clean it up later (the reason we're not doing it is that deletion could invalidate the insertion point being used). Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23432175 Pulled By: ZolotukhinM fbshipit-source-id: 1bf55affaa3f17af1bf71bad3ef64edf71a3e3fb	2020-08-31 17:26:51 -07:00
Mikhail Zolotukhin	f150f924d3	[JIT] Specialize autograd zero: fix the guarding condition. (#43846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43846 We are looking for tensors that are expected to be undefined (according to the profile info) and should be checking for them to satisfy the following condition: "not(have any non-zero)", which is equivalent to "tensor is all zeros". The issue was that we've been checking tensors that were expected not to be undefined. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23416198 Pulled By: ZolotukhinM fbshipit-source-id: 71e22f552680f68f2af29f427b7355df9b1a4278	2020-08-31 17:25:50 -07:00
Venkata Chintapalli	9b820fe904	Fix ImportError in the OSS land. (#43912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43912 Fixed the ImportError: cannot import name 'compute_ulp_error' from 'caffe2.python.oss.fakelowp.test_utils' Test Plan: test_op_nnpi_fp16.py Reviewed By: hyuen Differential Revision: D23435218 fbshipit-source-id: be0b240ee62090d06fdc8efac85fb1c32803da0d	2020-08-31 16:48:54 -07:00
Danqi Huang	7137327646	log message at per-test level for`perfpipe_pytorch_test_times` (#43752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43752 Test Plan: {F315930458} {F315930459} Reviewed By: walterddr, malfet Differential Revision: D23387998 Pulled By: dhuang29 fbshipit-source-id: 2da8b607c049a6f8f21d98dbb25e664ea6229f27	2020-08-31 16:22:44 -07:00
Ralf Gommers	4c19a1e350	Move torch/autograd/grad_mode.pyi stubs inline (#43415 ) Summary: - Add `torch._C` bindings from `torch/csrc/autograd/init.cpp` - Renamed `torch._C.set_grad_enabled` to `torch._C._set_grad_enabled` so it doesn't conflict with torch.set_grad_enabled anymore This is a continuation of gh-38201. All I did was resolve merge conflicts and finish the annotation of `_DecoratorContextManager.__call__` that ezyang started in the first commit. ~Reverts commit b5cd3a80bbc, which was only motivated by not having `typing_extensions` available.~ (JIT can't be made to understand `Literal[False]`, so keep as is). Pull Request resolved: https://github.com/pytorch/pytorch/pull/43415 Reviewed By: ngimel Differential Revision: D23301168 Pulled By: malfet fbshipit-source-id: cb5290f2e556b4036592655b9fe54564cbb036f6	2020-08-31 16:14:41 -07:00
yujunzhao@devvm1621.atn0.facebook.com	e941a462a3	Enable gcc coverage in OSS (#43883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43883 Check the result of GCC coverage in OSS is reasonable and ready to ship. The amount of executable lines are not the same between `gcc` and `clang` because of the following reasons: * Lines following are counted in `clang` but not in `gcc`: 1. empty line or line with only “{” or “}” 3. some comments are counted in clang but not in gcc 5. `#define ...` -- not supported by gcc according to official documentation * Besides, a statement that explains to more than one line will be counted as only one executable line in gcc, but several lines in clang ## Advantage of `gcc` coverage 1. Much faster - code coverage tool runtime is onle 4 min (ammazzzing!!) by `gcc`, compared to 3 hours!! by `clang`, to analyze all the tests' artifacts 2. Use less disk - `Clang`'s artifacts will take as large as 170G, but `GCC` is 980M Besides, also update `README.md`. Test Plan: Compare the result in OSS `clang` and OSS `gcc` with the same command: ``` python oss_coverage.py --run-only atest test_nn.py --interested-folder=aten ``` ---- ## GCC Summary > time: 0:15:45 summary percentage: 44.85% Report and Log [File Coverage Report](P140825162) [Line Coverage Report](P140825196) [Log](P140825385) ------ ## CLANG Summary > time: 0:21:35 summary percentage: 44.08% Report and Log [File Coverage Report](P140825845) [Line Coverage Report](P140825923) [Log](P140825950) ---------- # Run all tests ``` # run all tests and get coverage over Pytorch python oss_coverage.py ``` Summary > time: 1:27:20. ( time to run tests: 1:23:33) summary percentage: 56.62% Report and Log [File Coverage Report](P140837175) [Log](P140837121) Reviewed By: malfet Differential Revision: D23416772 fbshipit-source-id: a6810fa4d8199690f10bd0a4f58a42ab2a22182b	2020-08-31 16:11:33 -07:00
yujunzhao@devvm1621.atn0.facebook.com	da0e93a8c3	Move `fbcode` related coverage code to `fb/` folder and add `TARGETS` (#43800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43800 1. Move fbcode related coverage code to fb/ folder and add TARGETS so that we can use buck run to run the tool and solved the import probelm. 2. Write `README.md` to give users guidance about the tool Test Plan: On devserver: ``` buck run //caffe2/fb/code_coverage/tool:coverage -- //caffe2/c10: ``` More examples in README.md Reviewed By: malfet Differential Revision: D23404988 fbshipit-source-id: 4942cd0e0fb7bd28a5e884d9835b93f00adb7b92	2020-08-31 16:10:33 -07:00
kiyosora	3682df77db	Implementing NumPy-like function torch.heaviside() (#42523 ) Summary: - Related with https://github.com/pytorch/pytorch/issues/38349 - Implementing the NumPy-like function `torch.heaviside()` . Pull Request resolved: https://github.com/pytorch/pytorch/pull/42523 Reviewed By: ngimel Differential Revision: D23416743 Pulled By: mruberry fbshipit-source-id: 9975bd9c9fa73bd0958fe9879f79a692aeb722d5	2020-08-31 15:54:56 -07:00
Hong Xu	7680d87a76	Let linspace support bfloat16 and complex dtypes (#43578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43578 Reviewed By: malfet Differential Revision: D23413690 Pulled By: mruberry fbshipit-source-id: 8c24f7b054269e1317fe53d26d523fea4decb164	2020-08-31 14:54:22 -07:00
Nikita Shulga	3278beff44	Skip target determination for codecov test (#43899 ) Summary: Python code coverage tests should not rely on target determination as it will negatively impact the coverage score Pull Request resolved: https://github.com/pytorch/pytorch/pull/43899 Reviewed By: seemethere Differential Revision: D23432069 Pulled By: malfet fbshipit-source-id: 341fcadafaab6bd96d33d23973e01f7d421a6593	2020-08-31 14:43:12 -07:00
Jiakai Liu	ffca81e38b	[pytorch][bot] update mobile op deps (#43871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43871 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23422523 Pulled By: ljk53 fbshipit-source-id: 95f2a1b6a2d25b13618c65944a2b919922083fb8	2020-08-31 14:42:12 -07:00
Rohan Varma	4e4626a23d	Join-based API to support DDP uneven inputs (#42577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42577 Closes https://github.com/pytorch/pytorch/issues/38174. Implements a join-based API to support training with the DDP module in the scenario where different processes have different no. of inputs. The implementation follows the description in https://github.com/pytorch/pytorch/issues/38174. Details are available in the RFC, but as a summary, we make the following changes: #### Approach 1) Add a context manager `torch.nn.parallel.distributed.join` 2) In the forward pass, we schedule a "present" allreduce where non-joined process contribute 1 and joined processes contribute 0. This lets us keep track of joined processes and know when all procs are joined. 3) When a process depletes its input and exits the context manager, it enters "joining" mode and attempts to "shadow" the collective comm. calls made in the model's forward and backward pass. For example we schedule the same allreduces in the same order as the backward pass, but with zeros 4) We adjust the allreduce division logic to divide by the effective world size (no. of non-joined procs) rather than the absolute world size to maintain correctness. 5) At the end of training, the last joined process is selected to be the "authoritative" model copy We also make some misc. changes such as adding a `rank` argument to `_distributed_broadcast_coalesced` and exposing some getters/setters on `Reducer` to support the above changes. #### How is it tested? We have tests covering the following models/scenarios: - [x] Simple linear model - [x] Large convolutional model - [x] Large model with module buffers that are broadcast in the forward pass (resnet). We verify this with a helper function `will_sync_module_buffers` and ensure this is true for ResNet (due to batchnorm) - [x] Scenario where a rank calls join() without iterating at all, so without rebuilding buckets (which requires collective comm) - [x] Model with unused params (with find unused parameters=True) - [x] Scenarios where different processes iterate for a varying number of different iterations. - [x] Test consistency in tie-breaking when multiple ranks are the last ones to join - [x] Test that we divide by the effective world_size (no. of unjoined processes) #### Performance implications ###### Trunk vs PR patched, 32 GPUs, batch size = 32 P50, forward + backward + optimizer batch latency & total QPS: 0.121 264/s vs 0.121 264/s P50 backwards only batch latency & total QPS: 0.087 369/s vs 0.087 368/s ###### join(enable=True) vs without join, 32 GPUs, batch size = 32, even inputs P50, forward + backward + optimizer batch latency & total QPS: 0.120 265/s vs 0.121 264/s P50 backwards only batch latency & total QPS: 0.088 364/s vs 0.087 368/s ###### join(enable=False) vs without join, 32 GPUs, batch size = 32, even inputs P50 forward + backward + optimizer batch latency & total QPS: 0.121 264/s vs 0.121 264/s P50 backwards only batch latency & total QPS: 0.087 368/s vs 0.087 368/s ###### join(enable=True) with uneven inputs (offset = 2000), 32 GPUs, batch size = 32 P50 forward + backward + optimizer batch latency & total QPS: 0.183 174/s vs 0.121 264/s P50 backwards only batch latency & total QPS: 0.150 213/s vs 0.087 368/s ###### join(enable=True) with uneven inputs ((offset = 2000)), 8 GPUs, batch size = 32 P50 forward + backward + optimizer batch latency & total QPS: 0.104 308/s vs 0.104 308/s P50 backwards only batch latency & total QPS: 0.070 454/s vs 0.070 459/s The 2 above uneven inputs benchmark was conducted 32 GPUs and 4 GPUs immediately depleting their inputs and entering "join" mode (i.e. not iterating at all), while the other 28 iterating as normal. It looks like there is a pretty significant perf hit for this case when there are uneven inputs and multi-node training. Strangely, when there is a single node (8 GPUs), this does not reproduce. #### Limitations 1) This is only implemented for MPSD, not SPMD. Per a discussion with mrshenli we want to encourage the use of MPSD over SPMD for DDP. 2) This does not currently work with SyncBN or custom collective calls made in the model's forward pass. This is because the `join` class only shadows the `broadcast` for buffers in the forward pass, the gradient allreduces in the bwd pass, unused parameters reduction, and (optionally) the rebuild buckets broadcasting in the backwards pass. Supporting this will require additional design thought. 3) Has not been tested with the [DDP comm. hook](https://github.com/pytorch/pytorch/issues/39272) as this feature is still being finalized/in progress. We will add support for this in follow up PRs. ghstack-source-id: 111033819 Reviewed By: mrshenli Differential Revision: D22893859 fbshipit-source-id: dd02a7aac6c6cd968db882c62892ee1c48817fbe	2020-08-31 13:29:03 -07:00
Shen Li	2f52748515	Publish all_gather_object and gather_object docs (#43772 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43772 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23398495 Pulled By: rohan-varma fbshipit-source-id: 032e1d628c0c0f2dec297226167471698c56b605	2020-08-31 13:28:00 -07:00
Alban Desmaison	f7bae5b6b1	Revert D23385091: [quant][graphmode][fx] Add top level APIs Test Plan: revert-hammer Differential Revision: D23385091 (`eb4199b0a7`) Original commit changeset: b789e54e1a0f fbshipit-source-id: dc3dd9169d34beab92488d78d42d7e7d05e771d1	2020-08-31 12:18:29 -07:00
Alban Desmaison	68304c527a	Revert D23385090: [quant][graphmode][fx] Add support for weight prepack folding Test Plan: revert-hammer Differential Revision: D23385090 (`ef08f92076`) Original commit changeset: 11341f0af525 fbshipit-source-id: fe2bcdc16106923a2cee99eb5cc0a1e9c14ad2c5	2020-08-31 12:17:28 -07:00
kshitij12345	0394c5a283	[fix] torch.multinomial : fix for 0 size dim (#43775 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43768 TO-DO: * [x] Add test Pull Request resolved: https://github.com/pytorch/pytorch/pull/43775 Reviewed By: ZolotukhinM Differential Revision: D23421979 Pulled By: ngimel fbshipit-source-id: 949fcdd30f18d17ae1c372fa6ca6a0b8d0d538ce	2020-08-31 11:57:42 -07:00
Elias Ellison	3c8b1d73c9	Update aliasing in tensorexpr fuser (#43743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43743 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D23385205 Pulled By: eellison fbshipit-source-id: 097a15d5bcf216453e1dd144d6117108b3deae4d	2020-08-31 11:52:26 -07:00
Elias Ellison	5da8a7bf2d	use types in the IR instead of vmap (#43742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43742 We can remove all prim::profiles, update the values to their specialized profiled types, and then later guard the input graphs based on the input types of the fusion group. After that we remove specialized tensor types from the graph. This gets rid of having to update the vmap and removes all of the profile nodes in fusing. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D23385206 Pulled By: eellison fbshipit-source-id: 2c84bd1d1c38df0d7585e523c30f7bd28f399d7c	2020-08-31 11:52:23 -07:00
Elias Ellison	259e5b7d71	Add passes to profiling executor pipeline (#43636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43636 We weren't running inlining in the forward graph of differentiable subgraphs, and we weren't getting rid of all profiles as part of optimization. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23358804 Pulled By: eellison fbshipit-source-id: 05ede5fa356a15ca385f899006cb5b35484ef620	2020-08-31 11:52:20 -07:00
Elias Ellison	a7e7981c0b	Use prim::TensorExprGroup interned symbol (#43635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43635 Intern the symbol, no functional changes. Aliasing need to be looked at but this should be done in a separate PR; this PR is just changing the symbol. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23358806 Pulled By: eellison fbshipit-source-id: f18bcd142a0daf514136f019ae607e4c3f45d9f8	2020-08-31 11:52:16 -07:00
Elias Ellison	1c0faa759e	Update requires grad property (#43634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43634 Because differentiable graphs detach the gradients of input Tensors, creating and inlining differentiable graphs changes the requires_grad property of tensors in the graph. In the legacy executor, this was not a problem as the Fuser would simply ignore the gradient property because it would be invariant that the LegacyExecutor only passed tensors with grad = False. This is not the case with the profiler, as the Fuser does it's own guarding. Updating the type also helps with other typechecks, e.g. the ones specializing the backward, and with debugging the graph. Other possibilities considered were: - Fuser/Specialize AutogradZero always guards against requires_grad=False regardless of the profiled type - Re-profile forward execution of differentiable graph Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23358803 Pulled By: eellison fbshipit-source-id: b106998accd5d0f718527bc00177de9af5bad5fc	2020-08-31 11:51:06 -07:00
Lucas Roberts	2bede78a05	add qr_backward functionality for wide case (#42216 ) Summary: Unblocks implementation of https://github.com/pytorch/pytorch/issues/27036. Note that this PR *does not* fix #{27036}. Currently QR decomposition only has support for square and tall (a.k.a. skinny) case. This PR adds functionality for wide A matrix/tensors, includes 3 unit tests for the new case and restructures the `qr_backward` method to use the same Walther method as a helper. cc albanD t-vi I don't have a gpu machine so haven't tested on cuda but everything passes on my local machine in cpu. The basic idea of the PR is noted in the comments in the `Functions.cpp` file but I'll note here too for clarity: let <img src="https://render.githubusercontent.com/render/math?math=A_{m,n}"> be a matrix and <img src="https://render.githubusercontent.com/render/math?math=m < n"> then partition <img src="https://render.githubusercontent.com/render/math?math=A_{m, n}"> as <img src="https://render.githubusercontent.com/render/math?math=A_{m,n} = [ X_{m,m} \|\ Y_{m, n-m} ]"> and take QR of <img src="https://render.githubusercontent.com/render/math?math=X"> and call that one <img src="https://render.githubusercontent.com/render/math?math=X=QU"> the <img src="https://render.githubusercontent.com/render/math?math=Q"> here from <img src="https://render.githubusercontent.com/render/math?math=X"> is the same as the <img src="https://render.githubusercontent.com/render/math?math=Q"> from <img src="https://render.githubusercontent.com/render/math?math=QR"> on entire <img src="https://render.githubusercontent.com/render/math?math=A"> matrix. Then transform <img src="https://render.githubusercontent.com/render/math?math=Y"> with the <img src="https://render.githubusercontent.com/render/math?math=Q"> rotation got from <img src="https://render.githubusercontent.com/render/math?math=X"> to get <img src="https://render.githubusercontent.com/render/math?math=V=Q^{T}Y"> now <img src="https://render.githubusercontent.com/render/math?math=R= [U \|\ V] "> and similarly for the grads of each piece, e.g. if <img src="https://render.githubusercontent.com/render/math?math=\bar{A}"> is `grad_A` then <img src="https://render.githubusercontent.com/render/math?math=\bar{A} = [ \bar{X} \|\ \bar{Y}]"> and <img src="https://render.githubusercontent.com/render/math?math=\bar{R} = [ \bar{U} \|\ \bar{V}]"> and then <img src="https://render.githubusercontent.com/render/math?math=\bar{Y} = Q\bar{V}"> and <img src="https://render.githubusercontent.com/render/math?math=\bar{V}"> is the `narrow()` of `grad_R`. <img src="https://render.githubusercontent.com/render/math?math=\bar{X}"> is calculated very similar to the original Walther formula (exactly the same in the tall and square cases) but is slightly modified here for wide case matrices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42216 Reviewed By: glaringlee Differential Revision: D23373118 Pulled By: albanD fbshipit-source-id: 3702ba7e7e23923868c02cdb7e10a96036052344	2020-08-31 11:46:45 -07:00
Rohan Varma	69dd0bab90	[RPC profiling] Add test to ensure using record_function works for RPC (#43657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43657 We didn't have a test that ensures functions ran over RPC that are being profiled can use `with record_function()` to profile specific blocks in the function execution. This is useful if the user wants certain information about specific blocks in the function ran over RPC composed of many torch ops and some custom logic, for example. Currently, this will not work if the function is TorchScripted since `with record_function()` is not torchscriptable yet. We can add support for this in future PRs so that torchscript RPC functions can also be profiled like this. ghstack-source-id: 111033981 Reviewed By: mrshenli Differential Revision: D23355215 fbshipit-source-id: 318d92e285afebfeeb2a7896b4959412c5c241d4	2020-08-31 11:43:09 -07:00
Xiang Gao	4ef12be900	Add __complex__ (#43844 ) Summary: fixes https://github.com/pytorch/pytorch/issues/43833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43844 Reviewed By: ZolotukhinM Differential Revision: D23422000 Pulled By: ngimel fbshipit-source-id: ebc6a27a9b04c77c3977e6c184cefce9e817cc2f	2020-08-31 11:39:41 -07:00
Gao, Xiang	c5d0f091b2	addmm/addmv should accept complex alpha and beta (#43827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43827 Reviewed By: malfet Differential Revision: D23415869 Pulled By: ngimel fbshipit-source-id: a47b76df5fb751f76d36697f5fd95c69dd3a6efe	2020-08-31 11:35:58 -07:00
Michael Suo	89452a67de	[fx] GraphModule.src -> GraphModule.code (#43655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43655 Pure, unadulerated bikeshed. The good stuff. This makes things more consistent with ScriptModule. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23401528 Pulled By: suo fbshipit-source-id: 7dd8396365f118abcd045434acd9348545314f44	2020-08-31 11:26:05 -07:00
Nick Gibson	1390cad2d8	[NNC] Hook up registerizer to Cuda codegen [2/x] (#42878 ) Summary: Insert the registerizer into the Cuda Codegen pass list, to enable scalar replacement and close the gap in simple reduction performance. First up the good stuff, benchmark before: ``` Column sum Caffe2 NNC Simple Better (10, 100) 5.7917 9.7037 6.9386 6.0448 (100, 100) 5.9338 14.972 7.1139 6.3254 (100, 10000) 21.453 741.54 145.74 12.555 (1000, 1000) 8.0678 122.75 22.833 9.0778 Row sum Caffe2 NNC Simple Better (10, 100) 5.4502 7.9661 6.1469 5.5587 (100, 100) 5.7613 13.897 21.49 5.5808 (100, 10000) 21.702 82.398 75.462 22.793 (1000, 1000) 22.527 129 176.51 22.517 ``` After: ``` Column sum Caffe2 NNC Simple Better (10, 100) 6.0458 9.4966 7.1094 6.056 (100, 100) 5.9299 9.1482 7.1693 6.593 (100, 10000) 21.739 121.97 162.63 14.376 (1000, 1000) 9.2374 29.01 26.883 10.127 Row sum Caffe2 NNC Simple Better (10, 100) 5.9773 8.1792 7.2307 5.8941 (100, 100) 6.1456 9.3155 24.563 5.8163 (100, 10000) 25.384 30.212 88.531 27.185 (1000, 1000) 26.517 32.702 209.31 26.537 ``` Speedup about 3-8x depending on the size of the data (increasing with bigger inputs). The gap between NNC and simple is closed or eliminated - remaining issue appears to be kernel launch overhead. Next up is getting us closer to the _Better_ kernel. It required a lot of refactoring and bug fixes on the way: * Refactored flattening of parallelized loops out of the CudaPrinter and into its own stage, so we can transform the graph in the stage between flattening and printing (where registerization occurs). * Made AtomicAddFuser less pessimistic, it will now recognize that if an Add to a buffer is dependent on all used Block and Thread vars then it has no overlap and does not need to be atomic. This allows registerization to apply to these stores. * Fixed PrioritizeLoad mutator so that it does not attempt to separate the Store and Load to the same buffer (i.e. reduction case). * Moved CudaAnalysis earlier in the process, allowing later stages to use the analyzed bufs. * Fixed a bug in the Registerizer where when adding a default initializer statement it would use the dtype of the underlying var (which is always kHandle) instead of the dtype of the Buf. * Fixed a bug in the IRMutator where Allocate statements logic was inverted to be replaced only if they did not change. * Added simplification of simple Division patterns to the IRSimplifier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42878 Reviewed By: glaringlee Differential Revision: D23382499 Pulled By: nickgg fbshipit-source-id: 3640a98fd843723abad9f54e67070d48c96fe949	2020-08-31 10:39:46 -07:00
Yinghai Lu	63dbef3038	Better msg (#43848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43848 Missing space in logging. Test Plan: build Reviewed By: hl475 Differential Revision: D23416698 fbshipit-source-id: bf7c494f33836601f5f380c03a0910f419c2e62b	2020-08-31 10:36:59 -07:00
Jerry Zhang	ef08f92076	[quant][graphmode][fx] Add support for weight prepack folding (#43728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43728 Trace back from the weight node util we hit getattr, reconstruct the graph module with the traced nodes and run the graph module to pack the weight. then replace the original chain of ops with the packed weight. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23385090 fbshipit-source-id: 11341f0af525a02ecec36f163a9cd35dee3744a1	2020-08-31 10:35:11 -07:00
Jerry Zhang	eb4199b0a7	[quant][graphmode][fx] Add top level APIs (#43581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43581 Add similar APIs like eager and graph mode on torchscript - fuse_fx - quantize_fx (for both post training static and qat) - quantize_dynamic_fx (for post training dynamic) - prepare_fx (for both post training static and qat) - prepare_dynamic_fx (for post training dynamic) - convert_fx (for all modes) Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23385091 fbshipit-source-id: b789e54e1a0f3af6b026fd568281984e253e0433	2020-08-31 10:12:55 -07:00
Gregory Chanan	42c895de4d	Properly check that reduction strings are valid for l1_loss, smoothl1_loss, and mse_loss. (#43527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43527 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23306786 Pulled By: gchanan fbshipit-source-id: f3b7c9c02ae02813da116cb6b247a95727c47587	2020-08-31 09:53:56 -07:00
Jerry Zhang	b8d34547ee	[quant][graphmode][fx][fix] enable per channel quantization for functional ops (#43534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43534 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23310857 fbshipit-source-id: ff7a681ee55bcc51f564e9de78319249b989366c	2020-08-31 09:35:25 -07:00
Edward Yang	6ea89166bd	Rewrite of ATen code generator (#42629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42629 How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D23183978 Pulled By: ezyang fbshipit-source-id: 6073ba432ad182c7284a97147b05f0574a02f763	2020-08-31 09:00:22 -07:00
mfkasim91	576880febf	Print all traceback for nested backwards in detect_anomaly (#43626 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43405. This pull request adds a feature of printing all tracebacks if a `detect_anomaly` mode detects `nan` in nested backward operations. The way I did it is by assigning a node as a parent to all nodes it produces during its backward calculation. Then if one of the children produces `nan`, it will print the traceback from the parent and grand parents (if any). The parent is assigned in `parent_node_` member in `Node` class which is accessible in C++ by function `node->parent()` and in Python by `node.parent_function`. A node has a parent iff: 1. it is created from a backward operation, and 2. created when anomaly mode and grad mode are both enabled. An example of this feature: import torch def example(): x = torch.tensor(1.0, requires_grad=True) y = torch.tensor(1e-8, requires_grad=True) # small to induce nan in n-th backward a = x * y b = x * y z1 = a / b # can produce nan in n-th backward as long as https://github.com/pytorch/pytorch/issues/43414 is unsolved z = z1 * z1 gy , = torch.autograd.grad( z , (y,), create_graph=True) gy2, = torch.autograd.grad(gy , (y,), create_graph=True) gy3, = torch.autograd.grad(gy2, (y,), create_graph=True) gy4, = torch.autograd.grad(gy3, (y,), create_graph=True) return gy4 with torch.autograd.detect_anomaly(): gy4 = example() with output: example.py:16: UserWarning: Anomaly Detection has been enabled. This mode will increase the runtime and should only be enabled for debugging. with torch.autograd.detect_anomaly(): /home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py:190: UserWarning: Error detected in DivBackward0. Traceback of forward call that caused the error: File "example.py", line 17, in <module> gy4 = example() File "example.py", line 12, in example gy3, = torch.autograd.grad(gy2, (y,), create_graph=True) File "/home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 190, in grad return Variable._execution_engine.run_backward( (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:61.) return Variable._execution_engine.run_backward( /home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py:190: UserWarning: Traceback of forward call that induces the previous calculation: File "example.py", line 17, in <module> gy4 = example() File "example.py", line 11, in example gy2, = torch.autograd.grad(gy , (y,), create_graph=True) File "/home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 190, in grad return Variable._execution_engine.run_backward( (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:65.) return Variable._execution_engine.run_backward( /home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py:190: UserWarning: Traceback of forward call that induces the previous calculation: File "example.py", line 17, in <module> gy4 = example() File "example.py", line 8, in example z1 = a / b # can produce nan in n-th backward as long as https://github.com/pytorch/pytorch/issues/43414 is unsolved (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:65.) return Variable._execution_engine.run_backward( Traceback (most recent call last): File "example.py", line 17, in <module> gy4 = example() File "example.py", line 13, in example gy4, = torch.autograd.grad(gy3, (y,), create_graph=True) File "/home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 190, in grad return Variable._execution_engine.run_backward( RuntimeError: Function 'DivBackward0' returned nan values in its 1th output. cc & thanks to albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/43626 Reviewed By: malfet Differential Revision: D23397499 Pulled By: albanD fbshipit-source-id: aa7435ec2a7f0d23a7a02ab7db751c198faf3b7d	2020-08-31 08:23:07 -07:00
Richard Zou	1cdb9d2ab5	Test runner for batched gradient computation with vmap (#43664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43664 This PR implements the test runner for batched gradient computation with vmap. It also implements the batching rule for sigmoid_backward and tests that one can compute batched gradients with sigmoid (and batched 2nd gradients). Test Plan: - New tests: `python test/test_vmap.py -v` Reviewed By: ezyang Differential Revision: D23358555 Pulled By: zou3519 fbshipit-source-id: 7bb05b845a41b638b7cca45a5eff1fbfb542a51f	2020-08-31 08:21:41 -07:00
Gregory Chanan	1dcc4fb6b7	Kill unused _pointwise_loss function. (#43523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43523 The code is also wrong, see https://github.com/pytorch/pytorch/issues/43228. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23305461 Pulled By: gchanan fbshipit-source-id: 9fe516d87a4243d5ce3c29e8822417709a1d6346	2020-08-31 07:58:04 -07:00
Xiang Gao	a860be898e	[resubmit] Add amax/amin (#43819 ) Summary: Resubmit for landing next week. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43819 Reviewed By: ngimel Differential Revision: D23421906 Pulled By: mruberry fbshipit-source-id: 23dd60d1e365bb1197d660c3bfad7ee07ba3e97f	2020-08-31 04:54:48 -07:00
Jeff Daily	8fb7c50250	Enable complex blas for ROCm. (#43744 ) Summary: Revert "Skips some complex tests on ROCm (https://github.com/pytorch/pytorch/issues/42759)". This reverts commit 55b1706775726418ddc5dd3b7756ea0388c0817c. Use new cuda_to_hip_mappings.py from https://github.com/pytorch/pytorch/issues/43004. Fixes https://github.com/pytorch/pytorch/pull/42383#issuecomment-670771922 CC sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43744 Reviewed By: glaringlee Differential Revision: D23391263 Pulled By: ngimel fbshipit-source-id: ddf734cea3ba69c24f0d79cf1b87c05cdb45ec3d	2020-08-30 22:43:54 -07:00
BowenBao	08126c9153	[ONNX] Utilize ONNX shape inference for ONNX exporter (#40628 ) Summary: It is often that the conversion from torch operator to onnx operator requires input rank/dtype/shape to be known. Previously, the conversion depends on tracer to provide these info, leaving a gap in conversion of scripted modules. We are extending the export with support from onnx shape inference. If enabled, onnx shape inference will be called whenever an onnx node is created. This is the first PR introducing the initial look of the feature. More and more cases will be supported following this PR. * Added pass to run onnx shape inference on a given node. The node has to have namespace `onnx`. * Moved helper functions from `export.cpp` to a common place for re-use. * This feature is currently experimental, and can be turned on through flag `onnx_shape_inference` in internal api `torch.onnx._export`. * Currently skipping ONNX Sequence ops, If/Loop and ConstantOfShape due to limitations. Support will be added in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40628 Reviewed By: mrshenli Differential Revision: D22709746 Pulled By: bzinodev fbshipit-source-id: b52aeeae00667e66e0b0c1144022f7af9a8b2948	2020-08-30 18:35:46 -07:00
Mike Ruberry	3aeb70db0b	Documents sub properly, adds subtract alias (#43850 ) Summary: `torch.sub` was undocumented, so this PR adds its documentation, analogous to `torch.add`'s documentation, and adds the alias `torch.subtract` for `torch.sub`, too. This alias comes from NumPy (see https://numpy.org/doc/stable/reference/generated/numpy.subtract.html?highlight=subtract#numpy.subtract) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43850 Reviewed By: ngimel Differential Revision: D23416908 Pulled By: mruberry fbshipit-source-id: 6c4d2ebaf6ecae91f3a6efe484ce6c4dad96f016	2020-08-30 15:44:56 -07:00
Nikita Shulga	3dc9645430	Disable RocM CircleCI jobs (#42630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42630 Reviewed By: seemethere Differential Revision: D22957640 Pulled By: malfet fbshipit-source-id: 9f7d633310c653fcd14e66755168c0e559307b69	2020-08-30 11:41:40 -07:00
Xiang Gao	7b835eb887	Update CUDA11 docker container (#42200 ) Summary: - no more `-rc` - add magma Pull Request resolved: https://github.com/pytorch/pytorch/pull/42200 Reviewed By: ZolotukhinM, mruberry Differential Revision: D23411686 Pulled By: malfet fbshipit-source-id: 04532bc1cc65b3e14ddf29e8bf61a7a3b4c706ad	2020-08-30 11:39:20 -07:00
Xiang Gao	5021ec826b	Fix docs for kwargs, f-p (#43586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43586 Reviewed By: glaringlee Differential Revision: D23390667 Pulled By: mruberry fbshipit-source-id: dd51a4a48ff4e2fc10675ec817a206041957982f	2020-08-30 10:13:36 -07:00
Bert Maher	1830e4f08c	Remove unnamed namespace in headers (#43689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43689 Test Plan: Imported from OSS Reviewed By: eellison, asuhan Differential Revision: D23367636 Pulled By: bertmaher fbshipit-source-id: ddb6d34d2f7cadff3a591c3650e1dd1b401c3d2d	2020-08-29 22:45:53 -07:00
Gao, Xiang	ab3ea95e90	#include <string> in loopnest.h (#43835 ) Summary: This file is causing compiling failure on my gcc-10.1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43835 Reviewed By: bhosmer Differential Revision: D23416417 Pulled By: ZolotukhinM fbshipit-source-id: d0c2998347438fb729212574d52ce20dd6faae85	2020-08-29 19:06:44 -07:00
Ashkan Aliabadi	628db9699f	Vulkan command buffer and pool. (#42930 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42930 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23252333 Pulled By: AshkanAliabadi fbshipit-source-id: 738385e0058edf3d3b34173e1b1011356adb7b3c	2020-08-29 17:48:19 -07:00
Ashkan Aliabadi	d1df098956	Vulkan resource cache. (#42709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42709 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23252339 Pulled By: AshkanAliabadi fbshipit-source-id: 977ab3fdedfe98789a48dd263127529d8be0ed37	2020-08-29 17:48:17 -07:00
Ashkan Aliabadi	87e8f50aae	Vulkan descriptor and descriptor layout cache. (#42642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42642 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23252337 Pulled By: AshkanAliabadi fbshipit-source-id: 075acc8c093e639bb24a0d4653d5c922b36a1128	2020-08-29 17:48:14 -07:00
Ashkan Aliabadi	15aaeb8867	Vulkan pipeline and pipeline layout cache. (#42395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42395 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23252334 Pulled By: AshkanAliabadi fbshipit-source-id: 6b4e88f9794a7879d47a1cdb671076d50f1944d9	2020-08-29 17:48:12 -07:00
Ashkan Aliabadi	387dc24c92	Vulkan memory allocator. (#42786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42786 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23252332 Pulled By: AshkanAliabadi fbshipit-source-id: 14e848ad81b4ba1367e8cf719343a51995457827	2020-08-29 17:48:10 -07:00
Ashkan Aliabadi	287fb273cd	Vulkan (source and binary) shader and shader layout cache. (#42325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42325 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23252336 Pulled By: AshkanAliabadi fbshipit-source-id: f3f26c78366be45c90a370db9194d88defbf08d8	2020-08-29 17:48:08 -07:00
Ashkan Aliabadi	6373063a98	Generic Vulkan object cache. (#42394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42394 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23252340 Pulled By: AshkanAliabadi fbshipit-source-id: 34e753964b94153ed6ed1fcaa7f3b4a7c6b5f340	2020-08-29 17:48:06 -07:00
Ashkan Aliabadi	4e39c310eb	Move torch/csrc/utils/hash.h to c10/util/hash.h. (#42503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42503 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23252331 Pulled By: AshkanAliabadi fbshipit-source-id: 3c4c0e27b9a7eec8560e374c2a3ba5f1c65dae48	2020-08-29 17:47:00 -07:00
Gao, Xiang	7f967c08b8	Document the beta=0 behavior of BLAS functions (#43823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43823 Reviewed By: mruberry Differential Revision: D23413899 Pulled By: ngimel fbshipit-source-id: d3c4e5631db729a3f3d5eb9290c76cb1aa529f74	2020-08-29 13:03:16 -07:00
Mike Ruberry	cc52386096	Revert D19987020: [pytorch][PR] Add the sls tensor train op Test Plan: revert-hammer Differential Revision: D19987020 (`f31b111a35`) Original commit changeset: e3ca7b00a374 fbshipit-source-id: a600c747a45dfb51e0882196e382a21ccaa7b989	2020-08-29 12:46:11 -07:00
Ashkan Aliabadi	45ba836876	Revert "Revert D23252335: Refactor Vulkan context into its own files. Use RAII." (#43628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43628 This reverts commit 6c772515ed1a87ec676382492ff3c019c6d194c3. Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23356714 Pulled By: AshkanAliabadi fbshipit-source-id: a44af3b3c7b00a097eae1b0c9a00fdabc7ab6f86	2020-08-29 12:39:22 -07:00
Lingyi Liu	f31b111a35	Add the sls tensor train op (#33525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33525 Reviewed By: wx1988 Differential Revision: D19987020 Pulled By: lly-zero-one fbshipit-source-id: e3ca7b00a374a75ee42716c4e6236bf168ebebf1	2020-08-29 12:16:44 -07:00
Xiang Gao	550fb2fd52	Expand the coverage of test_blas_empty (#43822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43822 Reviewed By: mruberry Differential Revision: D23413359 Pulled By: ngimel fbshipit-source-id: fcdb337e32ed2d1c791fa0762d5233b346b26d14	2020-08-29 12:13:15 -07:00
Alex Suhan	60ad7e9c04	[TensorExpr] Make sum available from Python (#43730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43730 Test Plan: python test/test_jit_fuser_te.py -k TestTEFuser.test_sum test_tensorexpr --gtest_filter=TensorExprTest.KernelSum* Reviewed By: ZolotukhinM Differential Revision: D23407600 Pulled By: asuhan fbshipit-source-id: e6da4690ae6d802f9be012e39e61b7467aa5285c	2020-08-29 10:38:21 -07:00
Martin Yuan	8a41fa4718	[Selective Build] Move register_prim_ops and register_special_ops to app level (#43539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43539 Move the two source files out of the base internal mobile library to the app level. Make it ready for app-based selective build. Opensource build should not be affected. The file list change in build_variables.bzl affects internal build only. ghstack-source-id: 111006135 Test Plan: CI Reviewed By: ljk53 Differential Revision: D23287661 fbshipit-source-id: 9b2d688544e79e0fca9c84730ef0259952cd8abe	2020-08-29 03:12:28 -07:00
Nikita Shulga	d10056652b	Enable `torch.half` for `lt` and `masked_select` (#43704 ) Summary: Enable testing of those options in `TestTorchDeviceTypeCPU.test_logical_cpu` and `TestTorchDeviceTypeCPU.test_masked_select_cpu_float16` Add `view_as_real` testing for `torch.complex32` type Pull Request resolved: https://github.com/pytorch/pytorch/pull/43704 Reviewed By: albanD Differential Revision: D23373070 Pulled By: malfet fbshipit-source-id: 00f17f23b48513379a414227aea91e2d3c0dd5f9	2020-08-29 02:37:26 -07:00
Pritam Damania	931b8b4ac8	Use ivalue::Future in autograd engine and DistEngine. (#43676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43676 This is one part of https://github.com/pytorch/pytorch/issues/41574 to ensure we consolidate everything around ivalue::Future. I've removed the use of torch/csrc/utils/future.h from the autograd engines and used ivalue::Future instead. ghstack-source-id: 110895545 Test Plan: waitforbuildbot. Reviewed By: albanD Differential Revision: D23362415 fbshipit-source-id: aa109b3f8acf0814d59fc5264a85a8c27ef4bdb6	2020-08-29 02:15:26 -07:00
Nikolay Korovaiko	000739c31a	Function calls for fallback paths (#43274 ) Summary: This PR adds API to package unoptimized/fallback blocks as function calls. It's mainly meant to be used by TensorExpressionsFuser and SpecializeAutogradZero passes as both specialize the original graph but would also like to provide a fallback path in case the assumptions under which the graph was specialized do not hold for some inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43274 Reviewed By: malfet Differential Revision: D23406961 Pulled By: Krovatkin fbshipit-source-id: ef21fc9ad886953461b09418d02c75c58375490c	2020-08-28 23:31:02 -07:00
Hao Lu	8538a79bfe	[jit][static] Basic executor (#43647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43647 Nothing fancy, just a basic implementation of the graph executor without using stack machine. Reviewed By: bwasti Differential Revision: D23208413 fbshipit-source-id: e483bb6ad7ba8591bbe1767e669654d82f42c356	2020-08-28 23:20:07 -07:00
shubhambhokare1	6aaae3b08b	[ONNX] Addition of diagnostic tool API (#43020 ) Summary: Added initial diagnostic tool API Pull Request resolved: https://github.com/pytorch/pytorch/pull/43020 Reviewed By: malfet Differential Revision: D23398459 Pulled By: bzinodev fbshipit-source-id: 7a6d9164a19e3ba51676fbcf645c4d358825eb42	2020-08-28 23:04:59 -07:00
Martin Yuan	58148c85f4	Use template OperatorGenerator for prim and special operator registration (#43481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43481 Apply OperatorGenerator for prim and special operator registration. It does not affect the existing build by default. However, if a whitelist of operator exists, only the operators in the whitelist will be registered. It has the potential to save up to 200 KB binary size, depending on the usage. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D23287251 Pulled By: iseeyuan fbshipit-source-id: 3ca39fbba645bad8d69e69195f3680e4f6d633c5	2020-08-28 21:18:00 -07:00
Yizhou Yu	8997a4b56b	[typing] Enable typing in torch.quantization.fuse_modules typechecks … (#43786 ) Summary: …during CI Fixes #{42971} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43786 Reviewed By: malfet Differential Revision: D23403258 Pulled By: yizhouyu fbshipit-source-id: 4cd24a4fcf1408341a210fa50f574887b6db5e0e	2020-08-28 20:42:23 -07:00
Manish Ram	eae92b7187	Updated README.md by correcting grammatical errors (#43779 ) Summary: Fixed grammatical errors and punctuation so that it be can more understandable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43779 Reviewed By: ZolotukhinM Differential Revision: D23407849 Pulled By: malfet fbshipit-source-id: 09c064ce68d0f37f8023c2ecae8775fc00541a2c	2020-08-28 20:30:03 -07:00
Vinod Kumar S	13c7c6227e	Python/C++ API Parity: TransformerDecoder (#42886 ) Summary: Fixes #{[37756](https://github.com/pytorch/pytorch/issues/37756)} Pull Request resolved: https://github.com/pytorch/pytorch/pull/42886 Reviewed By: zhangguanheng66 Differential Revision: D23385631 Pulled By: glaringlee fbshipit-source-id: 610a2fabb4c25b2dfd37b33287215bb8872d653d	2020-08-28 20:13:53 -07:00
Nikita Shulga	64906497cd	Revert D23391941: [pytorch][PR] Implementing NumPy-like function torch.heaviside() Test Plan: revert-hammer Differential Revision: D23391941 (`a1eae6d158`) Original commit changeset: 7b942321a625 fbshipit-source-id: c2a7418a1fedaa9493300945c30e2392fc0d08ee	2020-08-28 19:16:58 -07:00
Dmytro Dzhulgakov	47e489b135	Make ExtraFilesMap return bytes instead of str (#43241 ) Summary: In case we want to store binary files using `ScriptModule.save(..., _extra_files=...)` functionality. With python3 we can just use bytes only and not bother about it. I had to do a copy-pasta from pybind sources, maybe we should upstream it, but it'd mean adding a bunch of template arguments to `bind_map` which is a bind untidy. Let me know if there's a better place to park this function (it seems to be the only invocation of `bind_map` so I put it in the same file) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43241 Reviewed By: zdevito Differential Revision: D23205244 Pulled By: dzhulgakov fbshipit-source-id: 8f291eb4294945fe1c581c620d48ba2e81b3dd9c	2020-08-28 19:11:33 -07:00
Sinan Nasir	1a79d7bb28	DDP communication hook examples (#43310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43310 In this diff, we prepared some example DDP communication hooks [#40848](https://github.com/pytorch/pytorch/pull/40848): 1\. `allreduce_hook`: This DDP communication hook just calls ``allreduce`` using ``GradBucket`` tensors. Once gradient tensors are aggregated across all workers, its ``then`` callback takes the mean and returns the result. If user registers this hook DDP results is expected to be same as the case where no hook was registered. Hence, this won't change behavior of DDP and user can use this as a reference or modify this hook to log useful information or any other purposes while unaffecting DDP behavior. 2\. `allgather_then_aggregate_hook` Similar to ``allreduce_hook``, this hook first gathers ``GradBucket`` tensors and its ``then`` callback aggregates the gathered gradient tensors and takes mean. Instead of ``allreduce`` this hook uses ``allgather``. Note that with W workers, both the computation and communication time scale as O(W) for allgather compared to O(logW) for allreduce. Therefore, this hook is expected to be much slower than ``allreduce_hook`` although both essentially do the same thing with the gradients. 3\. `fp16_compress_hook` This DDP communication hook implements a simple gradient compression approach that converts ``GradBucket`` tensors whose type is assumed to be ``torch.float32`` to half-precision floating point format (``torch.float16``). It allreduces those ``float16`` gradient tensors. Once compressed gradient tensors are allreduced, its then callback called ``decompress`` converts the aggregated result back to ``float32`` and takes the mean. 4\. `quantization_pertensor_hook` does quantization per tensor and uses the idea in https://pytorch.org/docs/master/generated/torch.quantize_per_tensor.html. Note that we separately send scale and zero_point (two floats per rank) before quantized tensors. 5\. `quantization_perchannel_hook` does quantization per channel similar to https://pytorch.org/docs/master/generated/torch.quantize_per_channel.html. The main motivation is that after the initial QSGD study diff, we realized that for considerably large gradient tensors such as a tensor that contains 6 million floats quantizing dividing it into smaller channels (512 float chunks) and quantizing independently may significantly increase the resolution and result with lower error. ghstack-source-id: 110923269 Test Plan: python torch/distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py Couldn't download test skip set, leaving all tests enabled... ..... ---------------------------------------------------------------------- Ran 4 tests in 26.724s OK Internal testing: ``` buck run mode/dev-nosan //caffe2/test/distributed/algorithms/ddp_comm_hooks:test_ddp_hooks ``` Reviewed By: malfet Differential Revision: D22937999 fbshipit-source-id: 274452e7932414570999cb978ae77a97eb3fb0ec	2020-08-28 18:59:14 -07:00
Kurt Mohler	68b9daa9bf	Add `torch.linalg.norm` (#42749 ) Summary: Adds `torch.linalg.norm` function that matches the behavior of `numpy.linalg.norm`. Additional changes: * Add support for dimension wrapping in `frobenius_norm` and `nuclear_norm` * Fix `out` argument behavior for `nuclear_norm` * Fix issue where `frobenius_norm` allowed duplicates in `dim` argument * Add `_norm_matrix` Closes https://github.com/pytorch/pytorch/issues/24802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42749 Reviewed By: ngimel Differential Revision: D23336234 Pulled By: mruberry fbshipit-source-id: f0aba3089a3a0bf856aa9c4215e673ff34228fac	2020-08-28 18:28:33 -07:00
neginraoof	cd0bab8d8d	[ONNX] Where op (#41544 ) Summary: Extending where op export Pull Request resolved: https://github.com/pytorch/pytorch/pull/41544 Reviewed By: malfet Differential Revision: D23279515 Pulled By: bzinodev fbshipit-source-id: 4627c95ba18c8a5ac8d06839c343e06e71c46aa7	2020-08-28 18:15:01 -07:00
kiyosora	a1eae6d158	Implementing NumPy-like function torch.heaviside() (#42523 ) Summary: - Related with https://github.com/pytorch/pytorch/issues/38349 - Implementing the NumPy-like function `torch.heaviside()` . Pull Request resolved: https://github.com/pytorch/pytorch/pull/42523 Reviewed By: glaringlee Differential Revision: D23391941 Pulled By: mruberry fbshipit-source-id: 7b942321a62567a5fc0a3679a289f4c4c19e6134	2020-08-28 18:11:20 -07:00
Dmytro Dzhulgakov	633d239409	[torch.fx] Pass placeholders through delegate too (#43432 ) Summary: It's useful if we add additional attributed to nodes in the graph - it's easier to set the attribute on all nodes, even if the value would happen to be None. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43432 Reviewed By: jamesr66a Differential Revision: D23276433 Pulled By: dzhulgakov fbshipit-source-id: c69e7cb723bbbb4dba3b508a3d6c0e456fe610df	2020-08-28 18:07:52 -07:00
Nikita Shulga	3f0120edb4	Revert D23360705: [pytorch][PR] Add amax/amin Test Plan: revert-hammer Differential Revision: D23360705 (`bcec8cc3f9`) Original commit changeset: 5bdeb08a2465 fbshipit-source-id: 76a9e199823c7585e55328bad0778bcd8cd49381	2020-08-28 18:01:25 -07:00
Sinan Nasir	7d517cf96f	[NCCL] Dedicated stream to run all FutureNCCL callbacks. (#43447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43447 Two main better-engineering motivations to run all FutureNCCL callbacks on a dedicated stream: 1. Each time a then callback was called, we would get a stream from the pool and run the callback on that stream. If we observe the stream traces using that approach, we would see a lot of streams and debugging would become more complicated. If we have a dedicated stream to run all then callback operations, the trace results will be much cleaner and easier to follow. 2. getStreamFromPool may eventually return the default stream or a stream that is used for other operations. This can cause slowdowns. Unless then callback takes longer than preceding allreduce, this approach will be as performant as the previous approach. ghstack-source-id: 110909401 Test Plan: Perf trace runs to validate the desired behavior: See the dedicated stream 152 is running the then callback operations: {F299759342} I run pytorch.benchmark.main.workflow using resnet50 and 32 GPUs registering allreduce with then hook. See f213777896 [traces](https://www.internalfb.com/intern/perfdoctor/results?run_id=26197585) After updates, same observation: see f214890101 Reviewed By: malfet Differential Revision: D23277575 fbshipit-source-id: 67a89900ed7b70f3daa92505f75049c547d6b4d9	2020-08-28 17:26:23 -07:00
Vasiliy Kuznetsov	3f5ea2367e	Adding a version serialization type to ConvPackedParam (#43086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43086 This PR changes the format of `ConvPackedParam` in a nearly backwards-compatible way: * a new format is introduced which has more flexibility and a lower on-disk size * custom pickle functions are added to `ConvPackedParams` which know how to load the old format * the custom pickle functions are not BC because the output type of `__getstate__` has changed. We expect this to be acceptable as no user flows are actually broken (loading a v1 model with v2 code works), which is why we whitelist the failure. Test plan (TODO finalize): ``` // adhoc testing of saving v1 and loading in v2: https://gist.github.com/vkuzo/f3616c5de1b3109cb2a1f504feed69be // test that loading models with v1 conv params format works and leads to the same numerics python test/test_quantization.py TestSerialization.test_conv2d_graph python test/test_quantization.py TestSerialization.test_conv2d_nobias_graph // test that saving and loading models with v2 conv params format works and leads to same numerics python test/test_quantization.py TestSerialization.test_conv2d_graph_v2 python test/test_quantization.py TestSerialization.test_conv2d_nobias_graph_v2 // TODO before land: // test numerics for a real model // test legacy ONNX path ``` Note: this is a newer copy of https://github.com/pytorch/pytorch/pull/40003 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D23347832 Pulled By: vkuzo fbshipit-source-id: 06bbe4666421ebad25dc54004c3b49a481d3cc92	2020-08-28 15:41:30 -07:00
Vasiliy Kuznetsov	af4ecb3c11	quantized conv: add support for graph mode BC testing, and increase coverage (#43524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43524 1. adds support for testing BC on data format and numerics for graph mode quantized modules 2. using the above, adds coverage for quantized conv2d on graph mode Test Plan: ``` python test/test_quantization.py TestSerialization.test_conv2d_nobias python test/test_quantization.py TestSerialization.test_conv2d_graph python test/test_quantization.py TestSerialization.test_conv2d_nobias_graph ``` Imported from OSS Reviewed By: supriyar Differential Revision: D23335222 fbshipit-source-id: 0c9e93a940bbf6c676c2576eb62fcc725247588b	2020-08-28 15:40:22 -07:00
Iurii Zdebskyi	4cb8d306e6	Add _foreach_add_(TensorList tensors, Scalar scalar) API (#42531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42531 [First PR: Add private API to support tensor lists: _foreach_add(TensorList tensors, Scalar scalar)](https://github.com/pytorch/pytorch/pull/41554). Motivation [GitHub issue](https://github.com/pytorch/pytorch/issues/38655) Current PyTorch optimizer implementations are not efficient in cases when we work with a lot of small feature tensors. Starting a lot of kernels slows down the whole process. We need to reduce the number of kernels that we start. As an example, we should be looking at [NVIDIAs Apex](https://github.com/NVIDIA/apex). In order to track progress, we will pick PyTorchs DCGAN model with Adam optimizer and once the optimizer is reimplemented with tensor lists, benchmark the model performance against original model version, Apexs version with original Adam optimizer and it’s FusedAdam optimizer. Current API restrictions - List can't be empty (will fixed in upcoming PRs). - All tensors in the list must have the same dtype, device and size. Broadcasting At this point we don't support broadcasting. What is 'Fast' and 'Slow' route In particular cases, we cant process an op with a fast list CUDA kernel. Still, we can do with a regular for-loop where the op will be applied to each tensor individually through the dispatch mechanisms. There are a few checks that decide whether the op will be performed via a 'fast' or 'slow' path. To go the fast route, - All tensors must have strided layout - All tensors must be dense and not have overlapping memory - The resulting tensor type must be the same. --------------- In this PR - Adding a `std::vector<Tensor> _foreach_add_(TensorList tensors, Scalar scalar)` API - Resolving some additional comments from previous [PR](https://github.com/pytorch/pytorch/pull/41554). Tests Tested via unit tests TODO 1. Properly handle empty lists Plan for the next PRs 1. APIs - Binary Ops for list with Scalar - Binary Ops for list with list - Unary Ops for list - Pointwise Ops 2. Complete tasks from TODO 3. Rewrite PyTorch optimizers to use for-each operators for performance gains. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23331892 Pulled By: izdeby fbshipit-source-id: c585b72e1e87f6f273f904f75445618915665c4c	2020-08-28 14:34:46 -07:00
Mike Ruberry	20abfc21e4	Adds arctanh, arcsinh aliases, simplifies arc* alias dispatch (#43762 ) Summary: Adds two more "missing" NumPy aliases: arctanh and arcsinh, and simplifies the dispatch of other arc* aliases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43762 Reviewed By: ngimel Differential Revision: D23396370 Pulled By: mruberry fbshipit-source-id: 43eb0c62536615fed221d460c1dec289526fb23c	2020-08-28 13:59:19 -07:00
yujunzhao@devvm229.ftw0.facebook.com	0564d7a652	Land code coverage tool for OSS (#43778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43778 Move code_coverage_tool from experimental folder to caffe2/tools folder. Delete `TODO` and fb-related code. Test Plan: Test locally Reviewed By: malfet Differential Revision: D23399983 fbshipit-source-id: 92316fd3cc88409d087d2dc6ed0be674155b3762	2020-08-28 13:56:15 -07:00
Nikita Shulga	89e2a3591e	Add 1% threshold to codecov (#43783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43783 Reviewed By: seemethere Differential Revision: D23402196 Pulled By: malfet fbshipit-source-id: bd11d6edc6d1f15bd227636a549b9ea7b3aca256	2020-08-28 13:51:23 -07:00
Eli Uriegas	b23e9cdd64	.circleci: Add slash to end of s3 cp (#43792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43792 This fixes the issue we had with the nightlies not being uploaded properly, basically what was happening was that `aws s3 cp` doesn't automatically distinguish between prefixes that are already "directories" vs a single file with the same name. This means that if you'd like to upload a file to a "directory" in S3 you need to suffix your destination with a slash. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D23402074 Pulled By: seemethere fbshipit-source-id: 6085595283fcbbbab0836ccdfe0f8aa2a6abd7c8	2020-08-28 13:37:25 -07:00
Mikhail Zolotukhin	776c2d495f	[JIT] IRParser: store list attributes as generic ivalue lists. (#43785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43785 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23400565 Pulled By: ZolotukhinM fbshipit-source-id: e248eb1854c4ec40da9455d4279ea6e47b1f2a16	2020-08-28 13:27:28 -07:00
Gao, Xiang	bcec8cc3f9	Add amax/amin (#43092 ) Summary: Add a max/min operator that only return values. ## Some important decision to discuss \| Question \| Current State \| \|---------------------------------------\|-------------------\| \| Expose torch.max_values to python? \| No \| \| Remove max_values and only keep amax? \| Yes \| \| Should amax support named tensors? \| Not in this PR \| ## Numpy compatibility Reference: https://numpy.org/doc/stable/reference/generated/numpy.amax.html \| Parameter \| PyTorch Behavior \| \|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|-----------------------------------------------------------------------------------\| \| `axis`: None or int or tuple of ints, optional. Axis or axes along which to operate. By default, flattened input is used. If this is a tuple of ints, the maximum is selected over multiple axes, instead of a single axis or all the axes as before. \| Named `dim`, behavior same as `torch.sum` (https://github.com/pytorch/pytorch/issues/29137) \| \| `out`: ndarray, optional. Alternative output array in which to place the result. Must be of the same shape and buffer length as the expected output. \| Same \| \| `keepdims`: bool, optional. If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array. \| implemented as `keepdim` \| \| `initial`: scalar, optional. The minimum value of an output element. Must be present to allow computation on empty slice. \| Not implemented in this PR. Better to implement for all reductions in the future. \| \| `where`: array_like of bool, optional. Elements to compare for the maximum. \| Not implemented in this PR. Better to implement for all reductions in the future. \| Note from numpy: > NaN values are propagated, that is if at least one item is NaN, the corresponding max value will be NaN as well. To ignore NaN values (MATLAB behavior), please use nanmax. PyTorch has the same behavior Pull Request resolved: https://github.com/pytorch/pytorch/pull/43092 Reviewed By: ngimel Differential Revision: D23360705 Pulled By: mruberry fbshipit-source-id: 5bdeb08a2465836764a5a6fc1a6cc370ae1ec09d	2020-08-28 12:51:03 -07:00
Mike Ruberry	f4695203c2	Fixes fft function calls for C++ API (#43749 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43732. Requires importing the fft namespace in the C++ API, just like the Python API does, to avoid clobbering torch::fft the function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43749 Reviewed By: glaringlee Differential Revision: D23391544 Pulled By: mruberry fbshipit-source-id: d477d0b6d9a689d5c154ad6c31213a7d96fdf271	2020-08-28 12:41:30 -07:00
Kimish Patel	dc5d365514	Fix bug in caching allocator. (#43719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43719 Accidentally this slipped through: with guard did not update the current context Test Plan: cpu_caching_allocator_test Reviewed By: linbinyu Differential Revision: D23374453 fbshipit-source-id: 1d3ef21cc390d0a8bde98fb1b5c2175b40ab571b	2020-08-28 11:56:23 -07:00
Mykhailo Lesyk	be3ec6ab3e	[caffe2][torch] correctly re-raise Manifold StorageException Summary: 1) Manifold raises StorageException when it see's an error: https://fburl.com/diffusion/kit3me8a 2) torch re-raises exception: https://fburl.com/diffusion/zbw9wmpu Issue here, that in StorageException first argument is bool canRetry while re-raising happens with first argument being str as in all Python exceptions. Test Plan: Existing tests should pass. + ``` In [1]: from manifold.clients.python import StorageException In [2]: getattr(StorageException, "message", None) Out[2]: <attribute 'message' of 'manifold.blobstore.blobstore.types.StorageException' objects> In [3]: getattr(Exception, "message", None) is None Out[3]: True Reviewed By: haijunz Differential Revision: D23195514 fbshipit-source-id: baa1667dbba4086db6ec93f009e400611ac9b938	2020-08-28 11:41:10 -07:00
XiaobingSuper	b72da0cf28	OneDNN: report error for dilation max_pooling and replace AT_ERROR with TORCH_CHECK in oneDNN codes (#43538 ) Summary: Fix https://github.com/pytorch/pytorch/issues/43514. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43538 Reviewed By: agolynski Differential Revision: D23364302 Pulled By: ngimel fbshipit-source-id: 8d17752cf33dcacd34504e32b5e523e607cfb497	2020-08-28 10:57:19 -07:00
maxosen64	1f7434d1ea	Fix 'module' to 'model' in quantize_dynamic doc (#43693 ) Summary: Fixes issue https://github.com/pytorch/pytorch/issues/43503 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43693 Reviewed By: malfet Differential Revision: D23397641 Pulled By: mrshenli fbshipit-source-id: bc216cea4f0a30c035e84a6cfebabd3755ef1305	2020-08-28 10:44:43 -07:00
Aadesh	a76184fe1e	grammatical error fix (#43697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43697 Reviewed By: malfet Differential Revision: D23397655 Pulled By: mrshenli fbshipit-source-id: fb447dcde4f83bc6650f0faa0728a1867cfa5213	2020-08-28 10:38:46 -07:00
mspryn@fb.com	b630c1870d	Add stateful XNNPack deconvolution2d operator to torch. (#43233 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43233 XNNPack is already being used for the convolution2d operation. Add the ability for it to be used with transpose convolution. Test Plan: buck run caffe2/test:xnnpack_integration Reviewed By: kimishpatel Differential Revision: D23184249 fbshipit-source-id: 3fa728ce1eaca154d24e60f800d5e946d768c8b7	2020-08-28 10:31:36 -07:00
Protonu Basu	58a7e73a95	[TensorExpr] Block Codegen (#40054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40054 Reviewed By: ZolotukhinM Differential Revision: D22061350 Pulled By: protonu fbshipit-source-id: 004f7c316629b16610ecdbb97e43036c72c65067	2020-08-28 09:53:42 -07:00
Hong Xu	9063bcee04	Don't proceed into setup.py too far if Python version is unsupported (#42870 ) Summary: This prevents confusing errors when the interpreter encounters some syntax errors in the middle. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42870 Reviewed By: albanD Differential Revision: D23269265 Pulled By: ezyang fbshipit-source-id: 61f62cbe294078ad4a909fa87aa93abd08c26344	2020-08-28 09:04:55 -07:00
Peter Bell	c177d25edf	TensorIterator: Check for memory overlap in all `nullary_op`s (#43421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43421 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23298654 Pulled By: zou3519 fbshipit-source-id: 71b401f6ea1e3b50b830fef650927cc5b3fb940f	2020-08-28 08:40:25 -07:00
Peter Bell	dc0722e9b7	TensorIterator: Check for memory overlap in all `compare_op`s (#43420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43420 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23298650 Pulled By: zou3519 fbshipit-source-id: 171cd17a3012880a5d248ffd0ea6942fbfb6606f	2020-08-28 08:40:22 -07:00
Peter Bell	065ebdb92f	TensorIterator: Check for memory overlap in all `binary_op`s (#43419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43419 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23298655 Pulled By: zou3519 fbshipit-source-id: 82e0ff308a6a7e46b4342d57ddb4c1d73745411a	2020-08-28 08:40:19 -07:00
Peter Bell	bdee8e02c0	TensorIterator: Check memory overlap in all `unary_op`s (#43418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43418 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23298651 Pulled By: zou3519 fbshipit-source-id: 84be498f5375813fd10cf30b8beabbd2d15210a3	2020-08-28 08:39:13 -07:00
Richard Zou	0ab83f7f9f	Fixed undefined behavior in BatchedFallback (#43705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43705 This was causing fb-internal flakiness. I'm surprised that the ASAN builds don't catch this behavior. The problem is that dereferencing the end() pointer of a vector is undefined behavior. This PR fixes one callsite where BatchedFallback dereferences the end() pointer and adds an assert to make sure another callsite doesn't do that. Test Plan: - Make sure all tests pass (`pytest test/test_vmap.py -v`) - It's hard to write a new test for this because most of the time this doesn't cause a crash. It really depends on what lives at the end() pointer. Reviewed By: ezyang Differential Revision: D23373352 Pulled By: zou3519 fbshipit-source-id: 61ea0be80dc006f6d4e73f2c5badd75096f63e56	2020-08-28 08:09:17 -07:00
albanD	8e507ad00e	Update the div formula for numerical stability (#43627 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43414 See the issue for numerical improvements and quick benchmark. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43627 Reviewed By: agolynski Differential Revision: D23350124 Pulled By: albanD fbshipit-source-id: 19d51640b3f200db37c32d2233a4244480e5a15b	2020-08-28 07:49:35 -07:00
George Guanheng Zhang	b29375840a	Revert D23379383: Land `code_coverage_tool` to `caffe2/tools` folder Test Plan: revert-hammer Differential Revision: D23379383 (`f06d3904f2`) Original commit changeset: f6782389ebb1 fbshipit-source-id: 33a26761deb58dfe81314ea912bf485c5fc962b7	2020-08-28 07:19:12 -07:00
kshitij12345	c7787f7fbf	[numpy compatibility]Fix `argmin/argmax` when multiple max/min values (#42004 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41998 Fixes https://github.com/pytorch/pytorch/issues/22853 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42004 Reviewed By: ngimel Differential Revision: D23049003 Pulled By: mruberry fbshipit-source-id: a6fddbadfec4b8696730550859395ce4f0cf50d6	2020-08-28 06:42:42 -07:00
generatedunixname89002005287564@sandcastle148.ftw3.facebook.com	26161e8ab6	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D23393950 fbshipit-source-id: 6a31b7ab6961cba88014f41b3ed1eda108edebab	2020-08-28 05:38:13 -07:00
yujunzhao@devvm229.ftw0.facebook.com	f06d3904f2	Land `code_coverage_tool` to `caffe2/tools` folder Summary: Move `code_coverage_tool` from `experimental` folder to `caffe2/tools` folder. Not sure if the fb related code is something we don't want to share with the oss. Can reviewers please help me check with `fbcode_coverage.py` and files in `fbcode/` folder? Test Plan: Test locally Reviewed By: malfet Differential Revision: D23379383 fbshipit-source-id: f6782389ebb1b147eaf6d3664b5955db79d24ff3	2020-08-27 18:44:40 -07:00
Elias Ellison	654ab209c6	[JIT] Disable broken tests (#43750 ) Summary: These started failing since https://github.com/pytorch/pytorch/pull/43633 for indecipherable reasons; temporarily disable. The errors on the PRs were ``` Downloading workspace layers workflows/workspaces/3ca9ca71-7449-4ae1-bb7b-b7612629cc62/0/8607ba99-5ced-473b-b60a-0025b48739a6/0/105.tar.gz - 8.4 MB Applying workspace layers 8607ba99-5ced-473b-b60a-0025b48739a6 ``` which is not too helpful... Pull Request resolved: https://github.com/pytorch/pytorch/pull/43750 Reviewed By: ZolotukhinM Differential Revision: D23388060 Pulled By: eellison fbshipit-source-id: 96afa0160ec948049f3e194787a0a7ddbeb5124a	2020-08-27 18:12:57 -07:00
Spandan Tiwari	1a21c92364	[ONNX] Update in scatter ONNX export when scalar src has different type (#43440 ) Summary: `torch.scatter` allows `src` to be of different type when `src` is a scalar. This requires a an explicit cast op to be inserted in the ONNX graph because ONNX `ScatterElements` does not allow different types. This PR updates the export of `torch.scatter` with this logic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43440 Reviewed By: hl475 Differential Revision: D23352317 Pulled By: houseroad fbshipit-source-id: c9eeddeebb67fc3c40ad01def134799ef2b4dea6	2020-08-27 16:45:37 -07:00
Meghan Lele	87d7c362b1	[JIT] Add JIT support for torch.no_grad (#41371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41371 Summary This commit enables the use of `torch.no_grad()` in a with item of a with statement within JIT. Note that the use of this context manager as a decorator is not supported. Test Plan This commit adds a test case to the existing with statements tests for `torch.no_grad()`. Fixes This commit fixes #40259. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D22649519 Pulled By: SplitInfinity fbshipit-source-id: 7fa675d04835377666dfd0ca4e6bc393dc541ab9	2020-08-27 15:32:57 -07:00
Radhakrishnan Venkataramani	8032dbc117	Add Rowwise Prune PyTorch op (#42708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42708 Add rowwise prune pytorch op. This operator introduces sparsity to the 'weights' matrix with the help of the importance indicator 'mask'. A row is considered important and not pruned if the mask value for that particular row is 1(True) and not important otherwise. Test Plan: buck test caffe2/torch/fb/sparsenn:test -- rowwise_prune buck test caffe2/test:pruning Reviewed By: supriyar Differential Revision: D22849432 fbshipit-source-id: 456f4f77c04158cdc3830b2e69de541c7272a46d	2020-08-27 15:16:23 -07:00
Jiakai Liu	3a0e35c9f2	[pytorch] deprecate static dispatch (#43564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564 Static dispatch was originally introduced for mobile selective build. Since we have added selective build support for dynamic dispatch and tested it in FB production for months, we can deprecate static dispatch to reduce the complexity of the codebase. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23324452 Pulled By: ljk53 fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7	2020-08-27 14:52:48 -07:00
Jiakai Liu	3afd24d62c	[pytorch] check in default generated op dependency graph (#43570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43570 Add the default op dependency graph to the source tree - use it if user runs custom build in dynamic dispatch mode without providing the graph. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23326988 Pulled By: ljk53 fbshipit-source-id: 5fefe90ca08bb0ca20284e87b70fe1dba8c66084	2020-08-27 14:51:44 -07:00
Linbin Yu	9a2d4d550e	update build flags for benchmark binaries Summary: Suggested by Shoaib Meenai, we should use mode/ndk_libcxx to replace mode/gnustl. This diff updated all build flags for caffe2 and pytorch in aibench. For easy management, I created two mode files in xplat/caffe2/mode, and delete buckconfig.ptmobile.pep. Test Plan: caffe2 ``` buck run aibench:run_bench -- -b aibench/specifications/models/caffe2/squeezenet/squeezenet.json --remote --devices s9f ``` https://our.intern.facebook.com/intern/aibench/details/433604719423848 full jit ``` buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/fbnet_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices SM-G960F-8.0.0-26 ``` https://our.intern.facebook.com/intern/aibench/details/189359776958060 lite interpreter ``` buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/fbnet_mobile_inference.json --platform android --framework pytorch --remote --devices s9f ``` https://our.intern.facebook.com/intern/aibench/details/568178969092066 Reviewed By: smeenai Differential Revision: D23338089 fbshipit-source-id: 62f4ae2beb004ceaab1f73f4de8ff9e0c152d5ee	2020-08-27 14:40:01 -07:00
Elias Ellison	01f974eb1e	Specialize optionals for grad_sum_to_size (#43633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43633 In the backward graph, _grad_sum_to_size is inserted whenever a possibly broadcasting op is called:" `"aten::_grad_sum_to_size(Tensor(a) self, int[]? size) -> Tensor(a)"` If a broadcast occurred, a sum is called, otherwise the second input is None and it is a no-op. Most of the time, it's a no-op (in the fast RNNs benchmark > 90% of the time). We can get rid of this op by profiling the optionality of the second input. I added `prim::profile_optional` to do this, which counts the number of times it saw a None value and the number of times it saw a value present. When specializing the backward graph, we insert checks for values we profiled as None, and in the optimized block can remove the grad_sum_to_size calls that use those values. In the future we may revisit this when NNC supports reductions and we want to replace grad_sum_to_size with sums as well, but I think this is worth landing now. Test Plan: Imported from OSS Reviewed By: bwasti, ZolotukhinM Differential Revision: D23358809 Pulled By: eellison fbshipit-source-id: a30a148ca581370789d57ba082d23cbf7ef2cd4d	2020-08-27 14:35:37 -07:00
Elias Ellison	a19fd3a388	Add undefined specializations in backward (#43632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43632 Specialize the backward graph by guarding on the undefinedness of the input tensors. The graph will look like: ``` ty1, ty2, succesful_checks = prim::TypeCheck(...) if (succesful_checks) -> optimized graph else: -> fallback graph ``` Specializing on the undefinedness of tensors allows us to clean up the ``` if any_defined(inputs): outputs = <original_computation> else: outputs = autograd zero tensors ``` blocks that make up the backward graph, so that we can fuse the original_computation nodes together. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23358808 Pulled By: eellison fbshipit-source-id: f5bb28f78a4a3082ecc688a8fe0345a8a098c091	2020-08-27 14:35:35 -07:00
Elias Ellison	a4cf4c2437	refactor tests (#43631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43631 I added a new test for just profiler stuff - I don't think the test should go in test_jit.py. Maybe this should just go in test_tensorexpr_fuser, but I'm not really testing tensorexpr stuff either... LMK Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23358810 Pulled By: eellison fbshipit-source-id: 074238e1b60e4c4a919a052b7a5312b790ad5d82	2020-08-27 14:35:33 -07:00
Elias Ellison	e189ef5577	Refactor pass to class (#43630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43630 No functional changes here - just refactoring specialize autograd zero to a class, and standardizing its API to take in a shared_ptr<Graph> Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23358805 Pulled By: eellison fbshipit-source-id: 42e19ef2e14df66b44592252497a47d03cb07a7f	2020-08-27 14:35:30 -07:00
Elias Ellison	d1c4d75c14	Add API for unexecuted op (#43629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43629 We have a few places where we count the size a block / subgraph - it's nice to have a shared API to ignore operators that are not executed in the optimized graph (will be used when i add a new profiling node in PR ^^) Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23358807 Pulled By: eellison fbshipit-source-id: 62c745d9025de94bdafd9f748f7c5a8574cace3f	2020-08-27 14:34:05 -07:00
Darius Tan	5da97a38d1	Check if input is ChannelsLast or ChannelsLast3d for quantized AdaptivePool3d. (#42780 ) Summary: cc z-a-f, vkuzo. This serves as a very simple first step to the issue mentioned in https://github.com/pytorch/pytorch/issues/42779. # Description Since `ChannelsLast` and `ChannelsLast3d` are not equivalent [(MemoryFormat.h)](`4e93844ab1/c10/core/MemoryFormat.h (L27)`), the "fast" path for `NDHWC` is ignored. This PR would produce the expected behaviour for 4 (5 if including batch) dimensional tensors. # Benchmarks ## Notes - For channels `< 8`, it is actually slower than before. - For `qint32`, it is actually `2x` slower than before. - For channels `> 8`, the execution time decreases up to `9-10` times in the benchmarks. - While execution time does improve, it remains slower than the `contiguous` variant when channels `> 64`. ## C++ <img width="1667" alt="before_after_py" src="https://user-images.githubusercontent.com/37529096/89711911-5da22d80-d9e1-11ea-9b30-0c23d46c2c93.png"> ## Python <img width="1523" alt="before_after_cpp" src="https://user-images.githubusercontent.com/37529096/89711906-58dd7980-d9e1-11ea-9696-1963f394198a.png"> ## Reproduce See https://github.com/pytorch/pytorch/issues/42779. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42780 Reviewed By: smessmer Differential Revision: D23035424 Pulled By: z-a-f fbshipit-source-id: 15594846f66b73c22d2371eb8e47c472324d6139	2020-08-27 14:23:57 -07:00
aizjForever	cdc3e232e9	Add `__str__` and `__repr__` bindings to SourceRange (#43601 ) Summary: Added the bindings for `__str__` and `__repr__` methods for SourceRange Pull Request resolved: https://github.com/pytorch/pytorch/pull/43601 Test Plan: `python test/test_jit.py` cc gmagogsfm Reviewed By: agolynski Differential Revision: D23366500 Pulled By: gmagogsfm fbshipit-source-id: ab4be6e8f9ad5f67a323554437878198483f4320	2020-08-27 12:30:47 -07:00
Nikita Shulga	04ccd3ed77	Fix bazel dependencies (#43688 ) Summary: Add `header_template_rule` to `substitution.bzl` Use it in BUILD.bazel to specify dependencies on autogenerated headers Pull Request resolved: https://github.com/pytorch/pytorch/pull/43688 Test Plan: bazel build --sandbox_writable_path=$HOME/.ccache -c dbg :caffe2 Reviewed By: seemethere Differential Revision: D23374702 Pulled By: malfet fbshipit-source-id: 180dd996d1382df86258bb6abab9f2c7e964152e	2020-08-27 12:11:34 -07:00
Linbin Yu	bff741a849	Improve save_for_mobile cxx binary (#43721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43721 We can combine optimization pass and save_for_mobile together to reduce friction. Since lite interpreter model can also be used in full JIT, I don't think we need the option to save it as full JIT model. Also - improved usage message - print op list before and after optimization pass Test Plan: ``` buck run //xplat/caffe2:optimize_for_mobile -- --model=/home/linbin/sparkspot.pt Building: finished in 12.4 sec (100%) 2597/2597 jobs, 2 updated Total time: 12.5 sec pt_operator_library( name = "old_op_library", ops = [ "aten::_convolution", "aten::adaptive_avg_pool2d", "aten::add_.Tensor", "aten::batch_norm", "aten::mul.Tensor", "aten::relu_", "aten::softplus", "aten::sub.Tensor", ], ) pt_operator_library( name = "new_op_library", ops = [ "aten::adaptive_avg_pool2d", "aten::add_.Tensor", "aten::batch_norm", "aten::mul.Tensor", "aten::relu_", "aten::softplus", "aten::sub.Tensor", "prepacked::conv2d_clamp_run", ], ) The optimized model for lite interpreter was saved to /home/linbin/sparkspot_mobile_optimized.bc ``` ``` buck run //xplat/caffe2:optimize_for_mobile -- --model=/home/linbin/sparkspot.pt --backend=vulkan ``` Reviewed By: kimishpatel Differential Revision: D23363533 fbshipit-source-id: f7fd61aaeda5944de5bf198e7f93cacf8368babd	2020-08-27 11:01:12 -07:00
Michael Suo	3830998ac3	[fx] When generating names, avoid shadowing builtins (#43653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43653 When nodes are created without an explicit name, a name is generated for it based on the target. In these cases, we need to avoid shadowing builtin names. Otherwise, code like: ``` a.foo.bar ``` results in pretty-printed code like: ``` getattr = a.foo getattr_1 = getattr.bar ``` While this is technically allowed in Python, it's probably a bad idea, and more importantly is not supported by TorchScript (where `getattr` is hardcoded). This PR changes the name generation logic to avoid shadowing all builtins and langauge keywords. We already do this for PyTorch built-ins, so just extend that logic. So now the generated code will look like: ``` getattr_1 = a.foo getattr_2 = getattr_1.bar ``` Fixes #43522 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23357420 Pulled By: suo fbshipit-source-id: 91e9974adc22987eca6007a2af4fb4fe67f192a8	2020-08-27 10:43:56 -07:00
Jerry Zhang	5a1aa0e21e	[reland][quant][graphmode][fx] Add e2e test on torchvision (#43587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43587 Add tests for graph mode quantization on torchvision and make sure it matches current eager mode quantization Test Plan: Imported from OSS Imported from OSS Reviewed By: z-a-f Differential Revision: D23331253 fbshipit-source-id: 0445a44145d99837a2c975684cd0a0b7d965c8f9	2020-08-27 10:12:07 -07:00
Martin Yuan	73dcfc5e78	Update RNN op registration format (#43599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43599 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D23350223 Pulled By: iseeyuan fbshipit-source-id: 94c528799e31b2ffb02cff675604e7cce639687f	2020-08-27 07:27:14 -07:00
Martin Yuan	288a2effa0	Operator generator based on templated selective build. (#43456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43456 Introduce the template OperatorGenerator, which returns an optional Operator. It's null if the templated bool value is null. RegisterOperators() is updated to take the optional Operator. A null will not be registered. With this update the selective operator registration can be done at compile time. Tests are added to show an operator can be registered if it's in a whitelist and it will not be registered if it's not in the whitelist. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D23283563 Pulled By: iseeyuan fbshipit-source-id: 456e0c72b2f335256be800aeabb797bd83bcf0b3	2020-08-27 07:26:07 -07:00
anjali411	c25d0015f0	Autograd code clean up (#43167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43167 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23222358 Pulled By: anjali411 fbshipit-source-id: b738c63b294bcee7d680fa64c6300007d988d218	2020-08-27 07:07:52 -07:00
Alex Suhan	de84db2a9d	[TensorExpr] Add aten::sum lowering to the kernel (#43585 ) Summary: Handles all dimensions and selected dimensions, per PyTorch semantics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43585 Test Plan: test_tensorexpr Reviewed By: bertmaher Differential Revision: D23362382 Pulled By: asuhan fbshipit-source-id: e8d8f1197a026be0b46603b0807d996a0de5d58c	2020-08-27 02:46:47 -07:00
lixinyu	48e08f884e	C++ APIs TransformerEncoder (#43187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43187 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23182770 Pulled By: glaringlee fbshipit-source-id: 968846138d4b1c391a74277216111dba8b72d683	2020-08-27 01:31:46 -07:00
Xiang Gao	f63d06a57b	Fix docs for kwargs, a-e (#43583 ) Summary: To reduce the chance of conflicts, not all ops are fixed. Ops starting with letter `f` will be fixed in separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43583 Reviewed By: ZolotukhinM Differential Revision: D23330347 Pulled By: mruberry fbshipit-source-id: 3387cb1e495faebd16fb183039197c6d90972ad4	2020-08-27 00:14:05 -07:00
James Reed	a070c619b9	[FX] Native callables in FX lowering (#43426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43426 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23273427 Pulled By: jamesr66a fbshipit-source-id: 3a9d04486c72933d8afd9c181578fe98c3d825b0	2020-08-27 00:00:03 -07:00
Basil Hosmer	79e6aaeb4c	pull empty() out of use_c10_dispatcher: full (#43572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43572 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D23326019 Pulled By: bhosmer fbshipit-source-id: 10a4d7ffe33b4be4ae45396725456c6097ce1757	2020-08-26 22:51:06 -07:00
kshitij12345	01b5c06254	[fix] handle empty args in chain_matmul (#43553 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41817 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43553 Reviewed By: agolynski Differential Revision: D23342586 Pulled By: mruberry fbshipit-source-id: c6349f8fa9fcefcf03681d92c085a21265d1e690	2020-08-26 18:54:46 -07:00
Jithun Nair	28be3ef2f2	Fix hipify script for pytorch extensions (#43528 ) Summary: PyTorch extensions can have .cpp or .h files which contain CUDA code that needs to be hipified. The current hipify script logic has overly strict conditions to determine which files get considered for hipification: https://github.com/pytorch/pytorch/blob/master/torch/utils/hipify/hipify_python.py#L146 These conditions might apply well to pytorch/caffe2 source code, but are overconstrained for third-party extensions. `is_pytorch_file` conditions: https://github.com/pytorch/pytorch/blob/master/torch/utils/hipify/hipify_python.py#L549 `is_caffe2_gpu_file` conditions: https://github.com/pytorch/pytorch/blob/master/torch/utils/hipify/hipify_python.py#L561 This PR relaxes these conditions if we're hipifying a pytorch extension (specified by `is_pytorch_extension=True`) and considers all the file extensions specified using the `extensions` parameter: https://github.com/pytorch/pytorch/blob/master/torch/utils/hipify/hipify_python.py#L820 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43528 Reviewed By: mruberry Differential Revision: D23328272 Pulled By: ngimel fbshipit-source-id: 1e9c3a54ae2da65ac596a7ecd5539f3e14eeed88	2020-08-26 18:41:48 -07:00
Mikhail Zolotukhin	c4e5ab6ff2	[TensorExpr] Disable a flaky test. (#43678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43678 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D23363651 Pulled By: ZolotukhinM fbshipit-source-id: 9557fbfda28633cea169836b02d034e9c950bc71	2020-08-26 18:35:24 -07:00
Meghan Lele	00c1501bc0	[JIT] Cast return values of functions returning Any (#42259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42259 Summary This commit modifies IR generation to insert explicit cast that cast each return value to `Any` when a function is annotated as returning `Any`. This precludes the failure in type unification (see below) that caused this issue. Issue #41962 reported that the use of an `Any` return type in combination with different code paths returning values of different types causes a segmentation fault. This is because the exit transform pass tries to unify the different return types, fails, but silently sets the type of the if node to c10::nullopt. This causes problems later in shape analysis when that type object is dereferenced. Test Plan This commit adds a unit test that checks that a function similar to the one in #41962 can be scripted and executed. Fixes This commit fixes #41962. Differential Revision: D22883244 Test Plan: Imported from OSS Reviewed By: eellison, yf225 Pulled By: SplitInfinity fbshipit-source-id: 523d002d846239df0222cd07f0d519956e521c5f	2020-08-26 18:24:11 -07:00
Sean Lynch	f73e32cd04	Reduce amount of work done within a global lock within ParallelLoadOp (#43508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43508 Differential Revision: D22952007 fbshipit-source-id: 11e28d20175271e6068edce8cb36f9fcf867a02a	2020-08-26 18:19:40 -07:00
Bert Maher	0bf27d64f4	Fix NaN propagation in fuser's min/max implementation (#43590 ) Summary: fmax/fmin propagate the number if one argument is NaN, which doesn't match the eager mode behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43590 Reviewed By: mruberry Differential Revision: D23338664 Pulled By: bertmaher fbshipit-source-id: b0316a6f01fcf8946ba77621efa18f339379b2d0	2020-08-26 17:31:06 -07:00
Xiong Wei	033b7ae3ef	implement NumPy-like functionality maximum, minimum (#42579 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349 Implement NumPy-like functions `maximum` and `minimum`. The `maximum` and `minimum` functions compute input tensors element-wise, returning a new array with the element-wise maxima/minima. If one of the elements being compared is a NaN, then that element is returned, both `maximum` and `minimum` functions do not support complex inputs. This PR also promotes the overloaded versions of torch.max and torch.min, by re-dispatching binary `torch.max` and `torch.min` to `torch.maximum` and `torch.minimum`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42579 Reviewed By: mrshenli Differential Revision: D23153081 Pulled By: mruberry fbshipit-source-id: 803506c912440326d06faa1b71964ec06775eac1	2020-08-26 16:56:12 -07:00
shubhambhokare1	9ca338a9d4	[ONNX] Modified slice node in inplace ops pass (#43275 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42292 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43275 Reviewed By: hl475 Differential Revision: D23352540 Pulled By: houseroad fbshipit-source-id: 7fce3087c333efe3db4b03e9b678d0bee418e93a	2020-08-26 16:51:20 -07:00
Nikita Shulga	1bda5e480c	Add Python code coverage (#43600 ) Summary: Replace `test` with `coverage_test` stage for `pytorch-linux-bionic-py3.8-gcc9` configuration Add `coverage.xml` to the list of ignored files Add `codecov.yml` that maps installed pytorch folders back to original locations Cleanup coverage option utilization in `run_test.py` and adapt it towards combining coverage reports across the runs Pull Request resolved: https://github.com/pytorch/pytorch/pull/43600 Reviewed By: seemethere Differential Revision: D23351877 Pulled By: malfet fbshipit-source-id: acf78ae4c8f3e23920a76cce1d50f2821b83eb06	2020-08-26 16:16:03 -07:00
Gao, Xiang	88e35fb8bd	Skip SVD tests when no lapack (#43566 ) Summary: These tests are failing on one of my system that does not have lapack Pull Request resolved: https://github.com/pytorch/pytorch/pull/43566 Reviewed By: ZolotukhinM Differential Revision: D23325378 Pulled By: mruberry fbshipit-source-id: 5d795e460df0a2a06b37182d3d4084d8c5c8e751	2020-08-26 15:58:31 -07:00
Tom Colen	cf26050e29	[pytorch] Move TensorIteratorConfig method implementation to cpp file (#43554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43554 Move function implementations in the TensorIteratorConfig Class from TensorIterator.h to TensorIterator.cpp to avoid this issue: https://github.com/pytorch/pytorch/issues/43300 Reviewed By: malfet Differential Revision: D23319007 fbshipit-source-id: 6cc3474994ea3094a294f795ac6998c572d6fb9b	2020-08-26 15:18:37 -07:00
Michael Suo	6c28df7ceb	[fx] add test for args/kwargs handling (#43640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43640 + added a `self.checkGraphModule` utility function to wrap the common test assert pattern. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23356262 Pulled By: suo fbshipit-source-id: a50626dcb01246d0dbd442204a8db5958cae23ab	2020-08-26 14:39:25 -07:00
Hector Yuen	5a15f56668	match batchmatmul on 1.0.0.6 (#43559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43559 - remove mkl strided gemm since it was acting weird in some cases, use the plain for loop for gemm for now, it will have performance implications but this closes the gap for the ctr_instagram_5x model - reproduced the failure scenario of batchmatmul on ctr_instagram_5x by increasing the dimensions of the inputs - added an option in netrunner to skip bmm if needed Test Plan: - net runner passes with ctr_instagram 5x - bmm unit test repros the discrepancy fixed Reviewed By: amylittleyang Differential Revision: D23320857 fbshipit-source-id: 7d5cfb23c1b0d684e1ef766f1c1cd47bb86c9757	2020-08-26 14:35:31 -07:00
Sinan Nasir	769b9381fc	DDP Communication hook: Fix the way we pass future result to buckets. (#43307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43307 I identified a bug with DDP communication hook while I was trying accuracy benchmarks: I was getting `loss=nan`. Looks like when we re-`initialize_bucketviews` with the value of `future_work`, as `Reducer::mark_variable_ready_dense` does `bucket_view.copy_(grad)` it wasn't copying the `grads` back to the contents since `bucket_view` wouldn't have any relationship with `contents` after re-intitializing it with something else. As we have multiple iterations, this was causing problems. I solved this by adding two states for `bucket_view`: ``` // bucket_views_in[i].copy_(grad) and // grad.copy_(bucket_views_out[i]) // provide convenient ways to move grad data in/out of contents. std::vector<at::Tensor> bucket_views_in; std::vector<at::Tensor> bucket_views_out; ``` I included two additional unit tests where we run multiple iterations for better test coverage: 1) `test_accumulate_gradients_no_sync_allreduce_hook` 2) `test_accumulate_gradients_no_sync_allreduce_with_then_hook`. ghstack-source-id: 110728299 Test Plan: Run `python test/distributed/test_c10d.py`, some perf&accuracy benchmarks. New tests: `test_accumulate_gradients_no_sync_allreduce_hook` `test_accumulate_gradients_no_sync_allreduce_with_then_hook` Acc benchmark results look okay: f214188350 Reviewed By: agolynski Differential Revision: D23229309 fbshipit-source-id: 329470036cbc05ac12049055828495fdb548a082	2020-08-26 14:22:09 -07:00
Yuchen Huang	0521c71241	[D23047144 Duplicate][2/3][lite interpreter] add metadata when saving and loading models for mobile (#43584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43584 1. add `metadata.pkl` to `.bc` file which includes the model info that we are interested in 2. load `metadata.pkl` as a attribute `unordered_map<string, string>` in the module ghstack-source-id: 110730013 Test Plan: - CI ```buck build //xplat/caffe2:jit_module_saving ``` ```buck build //xplat/caffe2:torch_mobile_core ``` Reviewed By: xcheng16 Differential Revision: D23330080 fbshipit-source-id: 5d65bd730b4b566730930d3754fa1bf16aa3957e	2020-08-26 14:07:49 -07:00
Pritam Damania	306eb3def7	Additional error checking for `torch.cuda.nccl` APIs. (#43247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43247 `torch.cuda.nccl` APIs didn't throw appropriate errors when called with inputs/outputs that were of the wrong type and it resulted in some cryptic errors instead. Adding some error checks with explicit error messages for these APIs. ghstack-source-id: 110683546 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D23206069 fbshipit-source-id: 8107b39d27f4b7c921aa238ef37c051a9ef4d65b	2020-08-26 13:50:00 -07:00
Wenfang Xu	db1fbc5729	[OACR][NLU] Add aten::str operator (#43573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43573 We recently updated the Stella NLU model in D23307228, and the App started to crash with `Following ops cannot be found:{aten::str, }`. Test Plan: Verified by installing the assistant-playground app on Android. Reviewed By: czlx0701 Differential Revision: D23325409 fbshipit-source-id: d670242868774bb0aef4be5c8212bc3a3f2f667c	2020-08-26 13:27:11 -07:00
Kyle Chen	6459f0a077	added rocm 3.7 docker image (#43576 ) Summary: Added bionic rocm 3.7 docker image - jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/43576 Reviewed By: malfet Differential Revision: D23352310 Pulled By: seemethere fbshipit-source-id: fd544b3825d8c25587f5765332c0a8ed1fa63c6e	2020-08-26 12:39:46 -07:00
Nikita Shulga	a91e1cedc5	Reduce number of hypothesis tests in CI (#43591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43591 100 randomized inputs vs 50 doesn't change the balance that much but speed up test runtime Test Plan: CI Reviewed By: orionr, seemethere Differential Revision: D23332393 fbshipit-source-id: 7a8ff9127ee3e045a83658a7a670a844f3862987	2020-08-26 11:54:49 -07:00
Rohan Varma	2a4d312027	Allow GPU skip decorators to report the right number of GPUs required in (#43468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43468 Closes https://github.com/pytorch/pytorch/issues/41378. https://github.com/pytorch/pytorch/pull/41973 enhanced the skip decorators to report the right no. of GPUs required, but this information was not passed to the main process where the message is actually displayed. This PR uses a `multiprocessing.Manager()` so that the dictionary modification is reflected correctly in the main process. ghstack-source-id: 110684228 Test Plan: With this diff, we can run a test in such as in https://github.com/pytorch/pytorch/pull/42577 that requires 4 GPUs on a 2 GPU machine, and we get the expected message: ``` test_ddp_uneven_inputs_replicated_error (test_distributed.TestDistBackend) ... skipped 'Need at least 4 CUDA devices' ``` Reviewed By: mrshenli Differential Revision: D23285790 fbshipit-source-id: ac32456ef3d0b1d8f1337a24dba9f342c736ca18	2020-08-26 11:44:13 -07:00
Hao Lu	25dcc28cd6	[jit][static] Replace deepcopy with copy (#43182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43182 We should avoid using `deepcopy` on the module because it involves copying the weights. Comparing the implementation of `c10::ivalue::Object::copy()` vs `c10::ivalue::Object::deepcopy()`, the only difference is `deepcopy` copies the attributes (slots) while `copy` does not. Reviewed By: bwasti Differential Revision: D23171770 fbshipit-source-id: 3cd711c6a2a19ea31d1ac1ab2703a0248b5a4ef3	2020-08-26 11:15:49 -07:00
Eli Uriegas	51861cc9b1	.circleci: Add CUDA 11 to nightly binary builds (#43366 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43366 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D23348556 Pulled By: seemethere fbshipit-source-id: 0cd129c5c27ffceec80636384762c3ff7bf74fdc	2020-08-26 10:11:01 -07:00
Peter Bell	42f6c3b1f4	Raise error on device mismatch in addmm (#43505 ) Summary: Fixes gh-42282 This adds a device-mismatch check to `addmm` on CPU and CUDA. Although it seems like the dispatcher is always selecting the CUDA version here if any of the inputs are on GPU. So in theory the CPU check is unnecessary, but probably better to err on the side of caution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43505 Reviewed By: mruberry Differential Revision: D23331651 Pulled By: ngimel fbshipit-source-id: 8eb2f64f13d87e3ca816bacec9d91fe285d83ea0	2020-08-26 09:37:57 -07:00
Eli Uriegas	7beeef2c69	.jenkins: Remove openssh installs (#43597 ) Summary: openssh should be installed by either the circleci machines or from the jenkins workers so we shouldn't need to install it ourselves in order to get ssh functionality Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/43597 Reviewed By: ezyang Differential Revision: D23333479 Pulled By: seemethere fbshipit-source-id: 17a1ad0200a9df7d4818ab1ed44c8488ec8888fb	2020-08-26 09:36:53 -07:00
Ralf Gommers	573940f8d7	Fix type annotation errors in torch.functional (#43446 ) Summary: Closes gh-42968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43446 Reviewed By: albanD Differential Revision: D23280962 Pulled By: malfet fbshipit-source-id: de5386a95a20ecc814c39cbec3e4252112340b3a	2020-08-26 08:27:59 -07:00
Nikita Shulga	2b70f82737	fix typo in test_dataloader test_multiprocessing_contexts (take 2) (#43588 ) Summary: 2nd attempt to land https://github.com/pytorch/pytorch/pull/43343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43588 Reviewed By: seemethere Differential Revision: D23332284 Pulled By: malfet fbshipit-source-id: d78faf468c56af2f176dbdd2ce4bd51f0b5df6fd	2020-08-25 21:11:53 -07:00
Mikhail Zolotukhin	c1553ff94b	Benchmarks: temporarily disable profiling-te configuration. (#43603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43603 We are in the midst of landing a big reword of profiling executor and benchmarks are expected to fail while we are in the transitional state. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23334818 Pulled By: ZolotukhinM fbshipit-source-id: 99ff17c6f8ee18d003f6ee76ff0e719cea68c170	2020-08-25 21:00:10 -07:00
Mikhail Zolotukhin	3ec24f02af	[TensorExpr] Start using typecheck in the fuser. (#43173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43173 With this change the fuser starts to generate typechecks for inputs of fusion group. For each fusion group we generate a typecheck and an if node: the true block contains the fused subgraph, the false block contains unoptimized original subgraph. Differential Revision: D23178230 Test Plan: Imported from OSS Reviewed By: eellison Pulled By: ZolotukhinM fbshipit-source-id: f56e9529613263fb3e6575869fdb49973c7a520b	2020-08-25 18:13:32 -07:00
Mikhail Zolotukhin	b763666f9f	[JIT] Subgraph utils: add an optional vmap argument to the API to allow retrieving value mappings. (#43235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43235 This functionality is needed when we want to not lose track of nodes/values as we merge and unmerge them into other nodes. For instance, if we have a side data structure with some meta information about values or nodes, this new functionality would allow to keep that metadata up to date after merging and unmerging nodes. Differential Revision: D23202648 Test Plan: Imported from OSS Reviewed By: eellison Pulled By: ZolotukhinM fbshipit-source-id: 350d21a5d462454166f8a61b51d833551c49fcc9	2020-08-25 18:13:29 -07:00
Mikhail Zolotukhin	d18566c617	[TensorExpr] Fuser: disallow aten::slice nodes. (#43365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43365 We don't have shape inference for them yet. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23253418 Pulled By: ZolotukhinM fbshipit-source-id: 9c38778b8a616e70f6b2cb5aab03d3c2013b34b0	2020-08-25 18:13:27 -07:00
Mikhail Zolotukhin	8dc4b415eb	[TensorExpr] Fuser: only require input shapes to be known (output shapes can be inferred). (#43171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43171 Differential Revision: D23178228 Test Plan: Imported from OSS Reviewed By: eellison Pulled By: ZolotukhinM fbshipit-source-id: e3465066e0cc4274d28db655de274a51c67594c4	2020-08-25 18:13:25 -07:00
Mikhail Zolotukhin	f6b7c6da19	[TensorExpr] Fuser: move canHandle and some other auxiliary functions into TensorExprFuser class. (#43170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43170 Differential Revision: D23178227 Test Plan: Imported from OSS Reviewed By: eellison Pulled By: ZolotukhinM fbshipit-source-id: 3c3a0215344fb5942c4f3078023fef32ad062fe9	2020-08-25 18:12:01 -07:00
Haoran Li	f35e069622	Back out "Make grad point to bucket buffer in DDP to save memory usage" (#43557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43557 backout the diff that caused some errors in pytext distributed training Test Plan: Tested by rayhou who verified reverting the diff works Differential Revision: D23320238 fbshipit-source-id: caa0fe74404059e336cd95fdb41373f58ecf486e	2020-08-25 18:04:39 -07:00
Huamin Li	58666982fb	check in intel nnpi 1007 into fbcode/tp2 Summary: As title Test Plan: * Details of conducted tests can be found in https://fb.workplace.com/groups/527892364588452/permalink/615694119141609/ * Sandcastle Reviewed By: arunm-git Differential Revision: D23198458 fbshipit-source-id: dd8d34a985dced66a5624a21e5d4a7e9a499ce39	2020-08-25 17:59:11 -07:00
Richard Zou	b3f8834033	Batching rule for torch.pow, torch.result_type (#43515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43515 This PR adds a batching rule for torch.pow. This required adding a batching rule for torch.result_type. Test Plan: - added new tests: `pytest test/test_vmap.py -v` Reviewed By: cpuhrsch Differential Revision: D23302737 Pulled By: zou3519 fbshipit-source-id: 2cade358750f6cc3abf45f81f2394900600927cc	2020-08-25 17:55:53 -07:00
Deepak Velmurugan	c9f125bf70	Black to Block for various files (#42913 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41735 #41736 https://github.com/pytorch/pytorch/issues/41737 #41738 all areas where black is mentioned is replaced to block Pull Request resolved: https://github.com/pytorch/pytorch/pull/42913 Reviewed By: houseroad Differential Revision: D23112873 Pulled By: malfet fbshipit-source-id: a515b56dc2ed20aa75741c577988d95f750b364c	2020-08-25 17:43:31 -07:00
Xiang Gao	348e78b086	Evenly distribute output grad into all matching inputs for min/max/median (#43519 ) Summary: cc: ngimel mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/43519 Reviewed By: albanD Differential Revision: D23312235 Pulled By: ngimel fbshipit-source-id: 678bda54996df7f29acf96add928bb7042fc2069	2020-08-25 16:36:33 -07:00
Mikhail Zolotukhin	be637fd5f6	Revert D23306683: [quant][graphmode][fx] Testing torchvision Test Plan: revert-hammer Differential Revision: D23306683 (`62dcd253e3`) Original commit changeset: 30d27e225d45 fbshipit-source-id: e661334d187d3d6756facd36f2ebdb3ab2cd2e26	2020-08-25 15:24:02 -07:00
Yuchen Huang	05f27b18fb	Back out D23047144 "[2/3][lite interpreter] add metadata when saving and loading models for mobile" Summary: Original commit changeset: f368d00f7bae Back out "[2/3][lite interpreter] add metadata when saving and loading models for mobile" D23047144 (`e37f871e87`) Pull Request: https://github.com/pytorch/pytorch/pull/43516 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: xcheng16 Differential Revision: D23304639 fbshipit-source-id: 970ca3438c1858f8656cbcf831ffee2c4a551110	2020-08-25 14:58:38 -07:00
Rohan Varma	5ca6cbbd93	Remove unnecessary copies in ProcessGroupGloo for multiple inputs allreduce (#43543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43543 Closes https://github.com/pytorch/pytorch/issues/14691. This is not needed in the multiple outputs case, because gloo allreduce will broadcast the result tensor to all the outputs. See https://github.com/facebookincubator/gloo/issues/152 and commit `9cabb5aaa4` for more details. Came across this when debugging https://github.com/pytorch/pytorch/pull/42577. This effectively reverts https://github.com/pytorch/pytorch/pull/14688 while still keeping the tests. Tested by ensuring `test_allreduce_basics` in `test_c10d.py` still works as expected. ghstack-source-id: 110636498 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23173945 fbshipit-source-id: d1ae08f84b4ac9919c53080949b8fffcb2fe63a8	2020-08-25 14:01:26 -07:00
peterjc123	9b05fbd92e	Correct the windows docs (#43479 ) Summary: Fixes https://discuss.pytorch.org/t/i-cannot-use-the-pytorch-that-was-built-successfully-from-source-dll-initialization-routine-failed-error-loading-caffe2-detectron-ops-gpu-dll/93243/5?u=peterjc123. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43479 Reviewed By: mrshenli, ngimel Differential Revision: D23294211 Pulled By: ezyang fbshipit-source-id: d67df7d0355c2783153d780c94f959758b246d36	2020-08-25 13:41:24 -07:00
Nikita Vedeneev	3df398a3a8	Update the QR documentation to include a warning about when the QR.backward is well-defined. (#43547 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43547 Reviewed By: mruberry Differential Revision: D23318829 Pulled By: albanD fbshipit-source-id: 4764ebe1ad440e881b1c4c88b16fb569ef8eb0fa	2020-08-25 13:19:25 -07:00
Jerry Zhang	62dcd253e3	[quant][graphmode][fx] Testing torchvision (#43526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43526 Add tests for graph mode quantization on torchvision and make sure it matches current eager mode quantization Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23306683 fbshipit-source-id: 30d27e225d4557bfc1d9aa462086e416aa9a9c0e	2020-08-25 13:02:14 -07:00
George Guanheng Zhang	9420c773d0	Revert D23299452: [pytorch][PR] fix typo in test_dataloader test_multiprocessing_contexts Test Plan: revert-hammer Differential Revision: D23299452 (`6a2d7a05c4`) Original commit changeset: 9489c48b83bc fbshipit-source-id: e8c15d338dd89d8e92f3710e9cf149149bd2e763	2020-08-25 12:34:49 -07:00
Ralf Gommers	ebc0fc4dfc	Polish the nightly.py docs in CONTRIBUTING a little (#43494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43494 Reviewed By: mruberry Differential Revision: D23296032 Pulled By: ngimel fbshipit-source-id: c85a6d4c39cbb60644f79136a6f21fd49c813b61	2020-08-25 12:13:27 -07:00
Łukasz Goncerzewicz	3dcfe84861	Grammatical corrections (#43473 ) Summary: Few documentation corrections. 1. [...] If there is hard-to-debug error in one of your TorchScript models, you can use this flag [...] 2. [...] Since TorchScript (scripting and tracing) is disabled with this flag [...] Before corrections (as of now): ![before-fix](https://user-images.githubusercontent.com/45713346/90977203-d8bc2580-e543-11ea-9609-fbdf5689dcb9.jpg) After corrections: ![after-fix](https://user-images.githubusercontent.com/45713346/90977209-dbb71600-e543-11ea-8259-011618efd95b.jpg) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43473 Reviewed By: mruberry Differential Revision: D23296167 Pulled By: ngimel fbshipit-source-id: 932c9b25cc79d6e266e5ddb3744573b0bd63d925	2020-08-25 12:09:14 -07:00
Vincent Huang	f32ca57c5e	Fix typo in LSTMCell document (#43395 ) Summary: Fixes typo in document Pull Request resolved: https://github.com/pytorch/pytorch/pull/43395 Reviewed By: mruberry Differential Revision: D23312561 Pulled By: ngimel fbshipit-source-id: 28340c96faf52c17acfe9f6b1dd94b71ea4d60ce	2020-08-25 12:04:59 -07:00
Kiran Kumar Matam	f8e9e7ad4a	Allocating warp to an input index in compute_cuda_kernel (#43354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43354 Instead of assigning a thread to an input index for repeating that index, we assign a warp to an index. This helps us in avoiding the costly uncoaelesced memory accesses and brach divergence which occur when each thread is repeating the index. Test Plan: Run trainer to test Reviewed By: ngimel Differential Revision: D23230917 fbshipit-source-id: 731e912c844f1d859b0384fcaebafe69cb4ab56a	2020-08-25 10:47:50 -07:00
Yi Zhang	76894062dc	move wholearchive to link option (#43485 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43216 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43485 Reviewed By: glaringlee Differential Revision: D23318735 Pulled By: malfet fbshipit-source-id: 90c316d3d5ed51afcff356e6d9219950f119a902	2020-08-25 10:36:10 -07:00
Noel Mathew	1089ff404c	Refactored the duplicate code into a function in _ConvNd (#43525 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43525 Reviewed By: ngimel Differential Revision: D23306593 Pulled By: jerryzh168 fbshipit-source-id: 3427cd2b9132a203858477b6c858d59b00e1282e	2020-08-25 10:00:07 -07:00
Parichay Kapoor	8ecfa9d9a2	[cmake] End support for python3.5 for pytorch (#43105 ) Summary: PyTorch uses f-string in its python codes. Python support for f-string started with version 3.6 Using python version 3.5 or older fails the build with latest release/master. This patch checks the version of the python used for build and mandates it to be 3.6 or higher. Signed-off-by: Parichay Kapoor <kparichay@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/43105 Reviewed By: glaringlee Differential Revision: D23301481 Pulled By: malfet fbshipit-source-id: e9b4f7bffce7384c8ade3b7d131b10cf58f5e8a0	2020-08-25 09:42:42 -07:00
Jeff Daily	6a2d7a05c4	fix typo in test_dataloader test_multiprocessing_contexts (#43343 ) Summary: https://github.com/pytorch/pytorch/issues/22990 added a multiprocessing_context argument to DataLoader, but a typo in the test causes the wrong DataLoader class to be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43343 Reviewed By: glaringlee Differential Revision: D23299452 Pulled By: malfet fbshipit-source-id: 9489c48b83bce36f46d350cad902f7ad96e1eec4	2020-08-25 09:36:56 -07:00
Ralf Gommers	b430347a60	Address JIT/Mypy issue with torch._VF (#43454 ) Summary: - `torch._VF` is a hack to work around the lack of support for `torch.functional` in the JIT - that hack hides `torch._VF` functions from Mypy - could be worked around by re-introducing a stub file for `torch.functional`, but that's undesirable - so instead try to make both happy at the same time: the type ignore comments are needed for Mypy, and don't seem to affect the JIT after excluding them from the `get_type_line()` logic Encountered this issue while trying to make `mypy` run on `torch/functional.py` in gh-43446. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43454 Reviewed By: glaringlee Differential Revision: D23305579 Pulled By: malfet fbshipit-source-id: 50e490693c1e53054927b57fd9acc7dca57e88ca	2020-08-25 09:23:54 -07:00
Hongfei XU	f02753fabb	Support AMP in nn.parallel (#43102 ) Summary: Take care of the state of autocast in `parallel_apply`, so there is no need to decorate model implementations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43102 Reviewed By: ngimel Differential Revision: D23294610 Pulled By: mrshenli fbshipit-source-id: 0fbe0c79de976c88cadf2ceb3f2de99d9342d762	2020-08-25 08:38:49 -07:00
Dmytro Dzhulgakov	cbdaa20c88	[serialize] Expose zip file alignment calculation functions (#43531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43531 It's useful for building some tooling out of tree to manipulate zip files in PyTorch-y way Test Plan: contbuild Reviewed By: houseroad Differential Revision: D23277361 fbshipit-source-id: e15fad20e792d1e41018d32fd48295cfe74bea8c	2020-08-25 02:32:58 -07:00
Natalia Gimelshein	d1d32003bb	force pytorch tensors to contiguous before calling c2 ops Summary: per title, makes c2 wrappers safer as contiguity of torch inputs is not guaranteed Test Plan: covered by existing tests Reviewed By: dzhulgakov Differential Revision: D23310137 fbshipit-source-id: 3fe12abc7e394b8762098d032200778018e5b591	2020-08-24 23:04:13 -07:00
Nikita Shulga	675f3f0482	Fix "save binary size" steps (#43529 ) Summary: `pip3` alias might not be available, so call `python3 -mpip` to be on the safe side Should fix failures like that: https://app.circleci.com/pipelines/github/pytorch/pytorch/203448/workflows/3837b2d6-b089-4a19-b797-38bdf989c82e/jobs/6913032/parallel-runs/0/steps/0-109 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43529 Reviewed By: seemethere Differential Revision: D23307306 Pulled By: malfet fbshipit-source-id: b55e6782b29f1a1f56787902cbb85b3c3d20370c	2020-08-24 19:25:33 -07:00
Sean Lynch	f80b695a75	Properly format db.h and db.cc (#43027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43027 Format db.h and db.cc using the default formatter. This change was split off of D22705434. Test Plan: Wait for sandcastle. Reviewed By: rohithmenon, marksantaniello Differential Revision: D23113765 fbshipit-source-id: 3f02d55bfb055bda0fcba5122336fa001562d42e	2020-08-24 18:29:45 -07:00
Jerry Zhang	7b243a4d46	[quant][graphmode[fx][test][refactor] Refactor tests for graph mode quantization on fx (#43445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43445 changed the interface for checkGraphModule to make the arguments more explicit as requested in https://github.com/pytorch/pytorch/pull/43437 Test Plan: TestQuantizeFx Imported from OSS Reviewed By: vkuzo Differential Revision: D23280586 fbshipit-source-id: 5b5859e326d149a5aacb1d15cbeee69667cc9109	2020-08-24 17:58:55 -07:00
Ann Shan	87905b5856	[pytorch] add option to include autograd for code analyzer (#43155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43155 Update the code_analyzer build.sh script to be able to take additional build flags in the mobile build/analysis Test Plan: Checkout associated PR or copy contents of build.sh into PyTorch repo (must be run from root of PyTorch repo) To run with inclusion of autograd dependencies (note BUILD_MOBILE_AUTOGRAD is still an experimental build flag): `ANALYZE_TORCH=1 DEPLOY=1 BASE_OPS_FILE=/path/to/baseopsfile MOBILE_BUILD_FLAGS='-DBUILD_MOBILE_AUTOGRAD=ON' tools/code_analyzer/build.sh` Reviewed By: ljk53 Differential Revision: D23065754 fbshipit-source-id: d83a7ad62ad366a84725430ed020adf4d56687bd	2020-08-24 15:04:43 -07:00
Supriya Rao	284ff04792	[quant] Support set API for EmbeddingBag quantization (#43433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43433 Add support for torch.quint8 dtype Test Plan: Imported from OSS Reviewed By: radkris-git Differential Revision: D23277002 fbshipit-source-id: 4204bc62f124b4fd481aaa6aa47b9437978c43ee	2020-08-24 14:33:35 -07:00
Yuchen Huang	e37f871e87	[2/3][lite interpreter] add metadata when saving and loading models for mobile Summary: 1. add `metadata.pkl` to `.bc` file which includes the model info that we are interested in 2. load `metadata.pkl` as a attribute `unordered_map<string, string>` in the module Test Plan: - CI ```buck build //xplat/caffe2:jit_module_saving ``` ```buck build //xplat/caffe2:torch_mobile_core ``` Reviewed By: xcheng16 Differential Revision: D23047144 fbshipit-source-id: f368d00f7baef2d3d15f89473cdb146467aa1e0b	2020-08-24 13:40:52 -07:00
David Reiss	ed8b08a3ba	Update quantize_jit to handle new upsample overloads (#43407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43407 ghstack-source-id: 110404846 Test Plan: test_general_value_ops passes with D21209991 applied. (Without this diff D21209991 breaks that test.) Reviewed By: jerryzh168 Differential Revision: D23256503 fbshipit-source-id: 0f75e50a9f7fccb5b4325604319a5f76b42dfe5e	2020-08-24 13:33:47 -07:00
albanD	e08e93f946	Reland of benchmark code (#43428 ) Summary: Reland of the benchmark code that broke the slow tests because the GPU were running out of memory Pull Request resolved: https://github.com/pytorch/pytorch/pull/43428 Reviewed By: ngimel Differential Revision: D23296136 Pulled By: albanD fbshipit-source-id: 0002ae23dc82f401604e33d0905d6b9eedebc851	2020-08-24 13:27:26 -07:00
Jeff Daily	4cfac34075	[ROCm] allow .jenkins/pytorch/test.sh to run on centos (#42197 ) Summary: This doesn't fix any reported issue. We validate ROCm PyTorch on ubuntu and centos. For centos, we must modify the test.sh script to let it run on centos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42197 Reviewed By: ezyang, ngimel Differential Revision: D23175669 Pulled By: malfet fbshipit-source-id: 0da435de6fb17d2ca48e924bec90ef61ebbb5042	2020-08-24 13:12:49 -07:00
Yanan Cao	35a36c1280	Implement JIT Enum type serialization and deserialization (#43460 ) Summary: [Re-review tips: nothing changed other than a type in python_ir.cpp to fix a windows build failure] Adds code printing for enum type Enhance enum type to include all contained enum names and values Adds code parsing for enum type in deserialization Enabled serialization/deserialization test in most TestCases. (With a few dangling issues to be addressed in later PRs to avoid this PR grows too large) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43460 Reviewed By: albanD Differential Revision: D23284929 Pulled By: gmagogsfm fbshipit-source-id: e3e81d6106f18b7337ac3ff5cd1eeaff854904f3	2020-08-24 12:04:31 -07:00
Nikita Shulga	0fa99d50bc	Enable torch.cuda.memory typechecking (#43444 ) Summary: Add number of function prototypes defined in torch/csrs/cuda/Module.cpp to `__init__.pyi.in` Fixes https://github.com/pytorch/pytorch/issues/43442 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43444 Reviewed By: ezyang Differential Revision: D23280221 Pulled By: malfet fbshipit-source-id: 7d67dff7b24c8d7b7e72c919e6e7b847f242ef83	2020-08-24 11:46:04 -07:00
Supriya Rao	7024ce8a2c	[quant] Add benchmarks for quantized embeddingbag module (#43296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43296 Use common config for float and quantized embedding_bag modules Test Plan: ``` python -m pt.qembeddingbag_test Benchmarking PyTorch: qEmbeddingBag Mode: Eager Name: qEmbeddingBag_embeddingbags10_dim4_modesum_input_size8_offset0_sparseTrue_include_last_offsetTrue_cpu Input: embeddingbags: 10, dim: 4, mode: sum, input_size: 8, offset: 0, sparse: True, include_last_offset: True, device: cpu Forward Execution Time (us) : 35.738 Benchmarking PyTorch: qEmbeddingBag Mode: Eager Name: qEmbeddingBag_embeddingbags10_dim4_modesum_input_size8_offset0_sparseTrue_include_last_offsetFalse_cpu Input: embeddingbags: 10, dim: 4, mode: sum, input_size: 8, offset: 0, sparse: True, include_last_offset: False, device: cpu Forward Execution Time (us) : 62.708 python -m pt.embeddingbag_test Benchmarking PyTorch: embeddingbag Mode: Eager Name: embeddingbag_embeddingbags10_dim4_modesum_input_size8_offset0_sparseTrue_include_last_offsetTrue_cpu Input: embeddingbags: 10, dim: 4, mode: sum, input_size: 8, offset: 0, sparse: True, include_last_offset: True, device: cpu Forward Execution Time (us) : 46.878 Benchmarking PyTorch: embeddingbag Mode: Eager Name: embeddingbag_embeddingbags10_dim4_modesum_input_size8_offset0_sparseTrue_include_last_offsetFalse_cpu Input: embeddingbags: 10, dim: 4, mode: sum, input_size: 8, offset: 0, sparse: True, include_last_offset: False, device: cpu Forward Execution Time (us) : 103.904 ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D23245531 fbshipit-source-id: 81b44fde522238d3eef469434e93dd7f94b528a8	2020-08-24 09:51:03 -07:00
Ann Shan	7cc1efec13	Add lite SequentialSampler to torch mobile (#43299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43299 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23228415 Pulled By: ann-ss fbshipit-source-id: eebe54353a128783f039c7dac0e2dd765a61940d	2020-08-24 09:45:24 -07:00
Richard Zou	c972e6232a	Implement batching rules for basic arithmetic ops (#43362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43362 Batching rules implemented for: addition subtraction division multiplication. I refactored the original `mul_batching_rule` into a templated function so that one can insert arbitrary binary operations into it. add, sub, rsub, mul, and div all work the same way. However, other binary operations work slightly differently (I'm still figuring out the differences and why they're different) so those may need a different implementation. Test Plan: - "pytest test/test_vmap.py -v": new tests Reviewed By: ezyang Differential Revision: D23252317 Pulled By: zou3519 fbshipit-source-id: 6d36cd837a006a2fd31474469323463c1bd797fc	2020-08-24 08:43:36 -07:00
Nikita Shulga	db78c07ced	Enable torch.cuda.nvtx typechecking (#43443 ) Summary: Add pyi file covering torch._C.nvtx submodule Fixes https://github.com/pytorch/pytorch/issues/43436 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43443 Reviewed By: ezyang Differential Revision: D23280188 Pulled By: malfet fbshipit-source-id: 882860cce9feb0b5307c8b7c887f4a2f2c1548a2	2020-08-24 08:20:12 -07:00
generatedunixname89002005287564@sandcastle105.ftw3.facebook.com	2f9c9796f1	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D23290730 fbshipit-source-id: ee3ffbd6f9c0fade4586d8f4f8c8dd3d310d1f33	2020-08-24 05:36:38 -07:00
Hameer Abbasi	c4e841654d	Add alias torch.negative to torch.neg. (#43400 ) Summary: xref https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43400 Reviewed By: albanD Differential Revision: D23266011 Pulled By: mruberry fbshipit-source-id: ca20b30d99206a255cf26438b09c3ca1f99445c6	2020-08-24 01:15:04 -07:00
Zachary DeVito	1f0cfbaaad	[fx] add type annotations (#43083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43083 This adds type annotations to all classes, arguments, and returns for fx. This should make it easier to understand the code, and encourage users of the library to also write typed code. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23145853 Pulled By: zdevito fbshipit-source-id: 648d91df3f9620578c1c51408003cd5152e34514	2020-08-23 15:38:33 -07:00
Zachary DeVito	b349f58c21	[fx] enabling typechecking of fx files (#43082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43082 Fixes all present errors in mypy. Does not try to add annotations everywhere. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23145854 Pulled By: zdevito fbshipit-source-id: 18e483ed605e89ed8125971e84da1a83128765b7	2020-08-23 15:37:29 -07:00
Nikolay Korovaiko	a97ca93c0e	remove prim::profile and special-casing (#43160 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43160 Reviewed By: ZolotukhinM Differential Revision: D23284421 Pulled By: Krovatkin fbshipit-source-id: 35e97aad299509a682ae7e95d7cef53301625309	2020-08-22 23:52:36 -07:00
Chunli Fu	d70b263e3a	[DPER3] Separate user embeddings and ad embeddings in blob reorder Summary: Separate user embeddings and ad embeddings in blobsOrder. New order: 1. meta_net_def 2. preload_blobs 3. user_embeddings (embeddings in remote request only net) 4. ad_embeddings (embeddings in remote other net) Add a field requestOnlyEmbeddings in meta_net_def to record user_embeddings. This is for flash verification. Test Plan: buck test dper3/dper3_backend/delivery/tests:blob_reorder_test Run a flow with canary package f211282476 Check the net: n326826, request_only_embeddings are recorded as expected Reviewed By: ipiszy Differential Revision: D23008305 fbshipit-source-id: 9360ba3d078f205832821005e8f151b8314f0cf2	2020-08-22 23:40:04 -07:00
Mike Ruberry	4dc8f3be8c	Creates test_tensor_creation_ops.py test suite (#43104 ) Summary: As part of our continued refactoring of test_torch.py, this takes tests for tensor creation ops like torch.eye, torch.randint, and torch.ones_like and puts them in test_tensor_creation_ops.py. There hare three test classes in the new test suite: TestTensorCreation, TestRandomTensorCreation, TestLikeTensorCreation. TestViewOps and tests for construction of tensors from NumPy arrays have been left in test_torch.py. These might be refactored separately into test_view_ops.py and test_numpy_interop.py in the future. Most of the tests ported from test_torch.py were left as is or received a signature change to make them nominally "device generic." Future work will need to review test coverage and update the tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43104 Reviewed By: ngimel Differential Revision: D23280358 Pulled By: mruberry fbshipit-source-id: 469325dd1a734509dd478cc7fe0413e276ffb192	2020-08-22 23:18:54 -07:00
Anthony Scopatz	35351ff409	Fix ToC Link (#43427 ) Summary: CC ezyang - no code here Pull Request resolved: https://github.com/pytorch/pytorch/pull/43427 Reviewed By: albanD Differential Revision: D23273866 Pulled By: mrshenli fbshipit-source-id: ca07d286410f367cc78549828e517510a86d63ec	2020-08-22 19:51:24 -07:00
AWSNB	e4af45f3aa	Fix bugs in vec256_float_neon.h (#43321 ) Summary: fixing neon vector conversion problems. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43321 Reviewed By: pbelevich Differential Revision: D23241536 Pulled By: kimishpatel fbshipit-source-id: 37a4e10989c9342ae5e8c78f6875b7aad785dd76	2020-08-22 17:27:18 -07:00
Kimish Patel	b003f2cc28	Enable input pointer caching in XNNPACK integration. (#42840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42840 By caching input/output pointers and input parameters we enable the use of caching allocator and check if we get the same input/output pointers. If so we skip setup steps. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D23044585 fbshipit-source-id: ac676cff77f264d8ccfd792d1a540c76816d5359	2020-08-22 16:50:17 -07:00
Kimish Patel	b52e6d00f9	Change quantizer to account for input tensor's memory format. (#42178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42178 This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Imported from OSS Reviewed By: jerryzh168 Differential Revision: D22796479 fbshipit-source-id: f1ada1c2eeed84991b9b195120699b943ef6e421	2020-08-22 16:48:50 -07:00
Nikolay Korovaiko	b1d31428e7	Reduce number of `prim::profile` (#43147 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43147 Reviewed By: colesbury Differential Revision: D23190137 Pulled By: Krovatkin fbshipit-source-id: bf5f29a76e5ebfb5b9d3b6adee424e213c25891b	2020-08-22 16:06:30 -07:00
BowenBao	8efa898349	[ONNX] Export split_to_sequence as slice when output number is static (#42744 ) Summary: Optimize exported graph to export slice nodes for aten::split when the number of split outputs are fixed. Previously under some cases these are exported as onnx::SplitToSequence, which is dynamic in tensor output count. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42744 Reviewed By: houseroad Differential Revision: D23172465 Pulled By: bzinodev fbshipit-source-id: 11e432b4ac1351f17e48356c16dc46f877fdf7da	2020-08-22 09:11:25 -07:00
Jerry Zhang	ec9e6e07bc	[quant][graphmode][fx] Add support for general value ops (#43439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43439 Porting op tests from test_quantize_jit.py Test Plan: TestQuantizeFxOps Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23278585 fbshipit-source-id: ad29f39482cf4909068ce29555470ef430ea17f6	2020-08-22 08:52:28 -07:00
Yi Zhang	47e1b7a8f1	Set CONSTEXPR_EXCEPT_WIN_CUDA as const while it is not constexpr (#43380 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42467 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43380 Reviewed By: albanD Differential Revision: D23278930 Pulled By: pbelevich fbshipit-source-id: 6ce0bc9fd73cd0ead46c414fdea5f6fb7e9fec3e	2020-08-22 03:25:37 -07:00
Pavel Belevich	d94b10a832	Revert D23223281: Add Enum TorchScript serialization and deserialization support Test Plan: revert-hammer Differential Revision: D23223281 (`f269fb83c1`) Original commit changeset: 716d1866b777 fbshipit-source-id: da1ad8387b7d7aad9ff69e1ebeb5cd0b9394c2df	2020-08-22 02:38:12 -07:00
Basil Hosmer	915fd1c8fc	centralize autograd dispatch key set (#43387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43387 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23258687 Pulled By: bhosmer fbshipit-source-id: 3718f74fc7324db027f87eda0b90893a960aa56e	2020-08-22 00:46:02 -07:00
Jerry Zhang	88b564ce39	[quant][graphmode][fx] Add support for general shape ops (#43438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43438 Porting op tests from test_quantize_jit.py Test Plan: TestQuantizeFxOps Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23278583 fbshipit-source-id: 34b73390d47c7ce60528444da77c4096432ea2cb	2020-08-21 23:07:20 -07:00
Jerry Zhang	192c4b0050	[quant][graphmode][fx] Add support for clamp (#43437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43437 Porting op tests from test_quantize_jit.py Test Plan: TestQuantizeFxOps Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23278584 fbshipit-source-id: 266dc68c9ca30d9160a1dacf28dc7781b3d472c2	2020-08-21 20:21:50 -07:00
Zino Benaissa	40c77f926c	Add prim::TypeCheck operation (#43026 ) Summary: TypeCheck is a new operation to check the shape of tensors against expectd shapes. TypeCheck is a variadic operation. An example, %t0 : Tensor = ... %t1 : Tensor = ... %2 : FLOAT(20, 20), %3 : FLOAT(30, 30), %1 : bool = prim::TypeCheck(%t1, %t2) prim::If(%1) Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43026 Reviewed By: ZolotukhinM Differential Revision: D23115830 Pulled By: bzinodev fbshipit-source-id: fbf142126002173d2d865cf4b932dea3864466b4	2020-08-21 20:03:24 -07:00
XiaobingSuper	98307a2821	Fix bfloat16 erfinv get incorrect value problem for cpu path (#43399 ) Summary: Fix https://github.com/pytorch/pytorch/issues/43344 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43399 Reviewed By: albanD Differential Revision: D23264789 Pulled By: pbelevich fbshipit-source-id: 8b77c0f6ca44346e44599844fb1e172fdbd9df6c	2020-08-21 19:59:37 -07:00
Tristan Rice	5e04bb2c1c	caffe2: expose CPUContext RandSeed for backwards compatibility with external RNG (#43239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43239 This is an incremental step as part of the process to migrate caffe2 random number generator off of std::mt19937 and to instead use at::mt19937+at::CPUGeneratorImpl. The ATen variants are much more performant (10x faster). This adds a way to get the CPUContext RandSeed for tail use cases that require a std::mt19937 and borrow the CPUContext one. Test Plan: This isn't used anywhere within the caffe2 codebase. Compile should be sufficient. Reviewed By: dzhulgakov Differential Revision: D23203280 fbshipit-source-id: 595c1cb447290604ee3ef61d5b5fc079b61a4e14	2020-08-21 19:36:38 -07:00
Kimish Patel	fb12992b5d	Call qnnpack's conv setup only if input pointer has changed. (#42008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42008 With caching allocator we have increased the likelihood of getting the same input pointer. With that we can cache qnnpack operator and input pointer and check if the input pointer is the same. If so we can skip setup step. Test Plan: Ran one of the quantized models to observe 1. No pagefaults due to indirection buffer reallocation. 2. Much less time spent in indirection buffer population. Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22726973 fbshipit-source-id: 2dd2a6a6ecf1b5cfa7dde65e384b36a6eab052d7	2020-08-21 19:10:40 -07:00
Kimish Patel	04aa42a073	Refactor qconv to reduce allocations. (#42007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42007 zero buffer and indirection pointers are allocatoed on every iterations. With this refactor we create op once for qnnpackconv struct and keep repopulating indirection pointer as necessary. For deconv moved much of op creation outside so that we can avoid creating and destroying ops every time. Test Plan: CI quantization tests. deconvolution-test Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22726972 fbshipit-source-id: 07c03a4e90b397c36aae537ef7c0b7d81d4adc1a	2020-08-21 19:10:37 -07:00
Kimish Patel	2a08566b8f	Simple caching allocator for CPU. (#42006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42006 This PR introduces a simple CPU caching allocator. This is specifically intended for mobile use cases and for inference. There is nothing specific to the implementation that can prevent it from other use cases, however its simplicity may not be suitable everywhere. It simply tracks allocation by sizes and relies on deterministic repeatable behavior where allocation of same sizes are made on every inference. Thus after the first allocation when the pointer is returned, instead of returning it to system, allocator caches it for subsequent use. Memory is freed automatically at the end of the process, or it can be explicitly freed. This is enabled at the moment in DefaultMobileCPUAllocator only. Test Plan: android test: cpu_caching_allocator_test Imported from OSS Reviewed By: dreiss Differential Revision: D22726976 fbshipit-source-id: 9a38b1ce34059d5653040a1c3d035bfc97609e6c	2020-08-21 19:09:22 -07:00
Zino Benaissa	abe878ce96	Allow Freezing of Module containing interface attribute (#41860 ) Summary: This patch allows to freeze model that utilizes interfaces. Freezing works under the user assumption that the interfase module dones not aliases with any value used in the model. To enable freezing of such modules, added an extra pramater: torch._C._freeze_module(module, ignoreInterfaces = True) Pull Request resolved: https://github.com/pytorch/pytorch/pull/41860 Reviewed By: eellison Differential Revision: D22670566 Pulled By: bzinodev fbshipit-source-id: 41197a724bc2dca2e8495a0924c224dc569f62a4	2020-08-21 18:57:13 -07:00
Jerry Zhang	490d41aaa6	[quant][graphmode][fx] Add support for instance_norm (#43377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43377 Porting op tests from test_quantize_jit.py Test Plan: TestQuantizeFxOps Imported from OSS Reviewed By: z-a-f Differential Revision: D23257045 fbshipit-source-id: 7f4ad5d81f21bf0b8b9d960b054b20dc889e6c3b	2020-08-21 18:32:50 -07:00
Basil Hosmer	a5a6a3e633	add support for optional int list with scalar fill (#43262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43262 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23212049 Pulled By: bhosmer fbshipit-source-id: c7ceb2318645c07d36c3f932c981c9ee3c414f82	2020-08-21 18:24:36 -07:00
Yanan Cao	f269fb83c1	Add Enum TorchScript serialization and deserialization support (#42963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42963 * Adds code printing for enum type * Enhance enum type to include all contained enum names and values * Adds code parsing for enum type in deserialization * Enabled serialization/deserialization test in most TestCases. (With a few dangling issues to be addressed in later PRs to avoid this PR grows too large) Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23223281 Pulled By: gmagogsfm fbshipit-source-id: 716d1866b7770dfb7bd8515548cfe7dc4c4585f7	2020-08-21 18:13:27 -07:00
Yinghai Lu	aa53b2d427	Workaround bugs in user side embedding meta info and better msgs (#43355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43355 There seem to be some bugs where we cannot guarantees that blobs in `PARAMETERS_BLOB_TYPE_FULLY_REMOTE_REQUEST_ONLY` and `PARAMETERS_BLOB_TYPE_DISAGG_ACC_REMOTE_OTHER` are disjoint. Hence we need to walk around this. Also make the msg more informative. Test Plan: ``` flow-cli test-locally --mode opt dper.workflows.evaluation.eval_workflow --parameters-file=/mnt/shared/yinghai/v0_ctr_mbl_feed_1120_onnx.json ``` Reviewed By: ehsanardestani Differential Revision: D23141538 fbshipit-source-id: 8e311f8fc0e40eff6eb2c778213f78592e6bf079	2020-08-21 17:18:51 -07:00
Jerry Zhang	aec917a408	[quant][graphmode][fx] Add support for layer_norm (#43376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43376 Porting op tests from test_quantize_jit.py Test Plan: TestQuantizeFxOps Imported from OSS Reviewed By: z-a-f Differential Revision: D23257048 fbshipit-source-id: 47a04a5221bcaf930d574f879d515e3dff2d1f6d	2020-08-21 16:38:16 -07:00
Jerry Zhang	089bb1a8e4	[quant][graphmode][fx] Add support for elu (#43375 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43375 Porting op tests from test_quantize_jit.py Test Plan: TestQuantizeFxOps Imported from OSS Reviewed By: z-a-f Differential Revision: D23257043 fbshipit-source-id: 22360610d87ef98d25871daff3fdc3dbb3ec5bdb	2020-08-21 16:07:36 -07:00
Jerry Zhang	5a02c6b158	[quant][graphmode][fx] Add support for hardswish (#43374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43374 Porting op tests from test_quantize_jit.py Test Plan: TestQuantizeFxOps Imported from OSS Reviewed By: z-a-f Differential Revision: D23257044 fbshipit-source-id: 2cdf12e104db6e51ffa0324eb602e68132a646ef	2020-08-21 16:06:32 -07:00
Martin Yuan	93f1b5c8da	Mobile backward compatibility (#42413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42413 When a default argument is added, it does not break backward compatibility (BC) for full-jit, but does break BC for mobile bytecode. For example, https://github.com/pytorch/pytorch/pull/40737. To make bytecode BC in this case, we 1. Introduce kMinSupportedBytecodeVersion. The loaded model version should be between kMinSupportedBytecodeVersion and kProducedBytecodeVersion. 2. If an operator is updated, and we can handle BC, bump the kProducedBytecodeVersion (for example, from 3 to 4). 3. If model version is at the older version of the operator, add an adapter function at loading. For the added default arg, we push this default arg to stack before calling the actual operator function. Test Plan: Imported from OSS Reviewed By: xcheng16 Differential Revision: D22898314 Pulled By: iseeyuan fbshipit-source-id: 90d339f8e1365f4bb178db8db7c147390173372b	2020-08-21 15:45:52 -07:00
Jerry Zhang	e96871ea46	[quant][graphmode][fx] Add support for mul and mul relu (#43373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43373 Porting op tests from test_quantize_jit.py Test Plan: TestQuantizeFxOps Imported from OSS Reviewed By: z-a-f Differential Revision: D23257047 fbshipit-source-id: b7f9fcef965d6368018e05cff09260f0eb6f3b50	2020-08-21 15:31:00 -07:00
Mike Ruberry	6c772515ed	Revert D23252335: Refactor Vulkan context into its own files. Use RAII. Test Plan: revert-hammer Differential Revision: D23252335 (`054073c60d`) Original commit changeset: 43144446f2f3 fbshipit-source-id: 442b914f47a82efee18cfd84aab893e22d1defdd	2020-08-21 15:10:06 -07:00
Yanan Cao	8eb3de76ba	Fix enum constant printing and add FileCheck to all Enum tests (#42874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42874 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23222894 Pulled By: gmagogsfm fbshipit-source-id: 86495a350d388c82276933d24a2ca3c0f59af8da	2020-08-21 14:55:46 -07:00
Jerry Zhang	ff454cc429	[quant][grapphmode][fx][test][refactor] Refactor quantized add test (#43372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43372 So that adding more binary op tests are easier Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D23257046 fbshipit-source-id: 661acd4c38abdc892c9db8493b569226b13e0d0d	2020-08-21 14:53:23 -07:00
Jerry Zhang	109ea59afc	[quant][graphmode][fx] Add support for batchnorm relu (#43335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43335 Porting op tests from test_quantize_jit.py Test Plan: TestQuantizeFxOps Imported from OSS Reviewed By: z-a-f Differential Revision: D23243563 fbshipit-source-id: 3c562f519b90e0157761a00c89eca63af8b909f2	2020-08-21 14:32:51 -07:00
Jerry Zhang	9e87a8ddf4	[quant][graphmode][fx] Add support for batchnorm (#43334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43334 Porting op tests from test_quantize_jit.py Test Plan: TestQuantizeFxOps Imported from OSS Reviewed By: z-a-f Differential Revision: D23243560 fbshipit-source-id: 0a7bc331293bbc3db85616bf43a995d3b112beb6	2020-08-21 14:31:49 -07:00
Ashkan Aliabadi	054073c60d	Refactor Vulkan context into its own files. Use RAII. (#42273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42273 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23252335 Pulled By: AshkanAliabadi fbshipit-source-id: 43144446f2f3530e6cb2a85706a9afc60771347d	2020-08-21 14:28:38 -07:00
Jerry Zhang	3d76f7065e	[quant][graphmode][fx] Add support for cat (#43333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43333 Porting op tests from test_quantize_jit.py Test Plan: TestQuantizeFxOps Imported from OSS Reviewed By: z-a-f Differential Revision: D23243562 fbshipit-source-id: 5c8eab2af592a9ea4afa713fb884e34e0ffd82b1	2020-08-21 12:54:50 -07:00
Jerry Zhang	26be4dcfa1	[quant][graphmode][fx] Add support for add relu (#43332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43332 Porting op tests from test_quantize_jit.py Test Plan: TestQuantizeFxOps Imported from OSS Reviewed By: z-a-f Differential Revision: D23243564 fbshipit-source-id: 3cd1786c6356aaa234d31b50f12ad6ddc38d5664	2020-08-21 12:54:41 -07:00
Jerry Zhang	452a473729	[quant][graphmode][fx] Add support for add (#43331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43331 Porting op tests from test_quantize_jit.py Test Plan: TestQuantizeFxOps Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23243561 fbshipit-source-id: 5a6399d25cc881728cf298c77570ce2aaf3ca22e	2020-08-21 12:52:37 -07:00
Eli Uriegas	6e48c88e09	.circleci: Prefer using env-file for docker run (#43293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43293 'docker run' has the capability to use a file for environment variables, we should prefer to use that instead of having it be sourced per command in the docker container. Also opens the door for cutting down on the total number of commands we need to echo into a script to then execute as a 'docker exec' command. Plus side of this approach is that the BASH_ENV is persisted through all of the steps so there's no need to do any exports / worry about environment variables not persisting through jobs. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23227059 Pulled By: seemethere fbshipit-source-id: be425aa21b420b9c6e96df8b2177f508ee641a20	2020-08-21 12:48:35 -07:00
Raghavan Raman	100649d6a9	Normalize loops with non-zero start. (#43179 ) Summary: This diff normalizes for-loops that have non 0 loop starts to always start from 0. Given a for-loop, this normalization changes the loop start to be 0 and adjusts the loop end and all accesses to the index variable within the loop body appropriately. This diff also adds tests for several cases of normalization and also tests normalization in conjunction with `splitwithTail` transformation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43179 Reviewed By: nickgg Differential Revision: D23220534 Pulled By: navahgar fbshipit-source-id: 64be0c72e4dbc76906084f7089dea81ae07d6020	2020-08-21 12:37:27 -07:00
Alban Desmaison	74781ab5b8	Revert D23242101: [pytorch][PR] Implement first draft of autograd benchmark. Test Plan: revert-hammer Differential Revision: D23242101 (`c2511bdfa4`) Original commit changeset: a2b92d5a4341 fbshipit-source-id: bda562d15565f074b448022d180ec8f959c6ecc9	2020-08-21 12:22:57 -07:00
Jerry Zhang	650590da0d	[quant][graphmode][fx] Add support for conv module + relu (#43287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43287 Porting op tests from test_quantize_jit.py Test Plan: TestQuantizeFxOps Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23221735 fbshipit-source-id: 2513892a1928f92c09d7e9a24b2ea12b00de218d	2020-08-21 12:13:02 -07:00
Supriya Rao	3293fdfa80	[quant] Enable from_float for quantized Embedding_Bag (#43176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43176 Convert floating point nn.EmbeddingBag module to nn.quantized.dynamic.EmbeddingBag module Test Plan: python test/test_quantization.py TestDynamicQuantizedModule.test_embedding_bag_api python test/test_quantization.py TestPostTrainingDynamic.test_embedding_quantization Imported from OSS Reviewed By: vkuzo Differential Revision: D23200196 fbshipit-source-id: 090f47dbf7aceab9c719cbf282fad20fe3e5a983	2020-08-21 11:46:03 -07:00
Supriya Rao	b354b422ee	[quant] Make offsets an optional argument (#43090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43090 To match the floating point module Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23167518 fbshipit-source-id: 29db596e10731be4cfed7efd18f33a0b3dbd0ca7	2020-08-21 11:46:00 -07:00
Supriya Rao	4db8ca1129	[quant] Create nn.quantized.dynamic.EmbeddingBag (#43088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43088 Create quantized module that the user can use to perform embedding bag quantization The module uses the EmbeddingPackedParams to store the weights which can be serialized /deserialized using TorchBind custom classes (C++ get/setstate code) Following PR will add support for `from_float` to convert from float to quantized module Test Plan: python test/test_quantization.py TestDynamicQuantizedModule.test_embedding_bag_api Imported from OSS Reviewed By: vkuzo Differential Revision: D23167519 fbshipit-source-id: 029d7bb44debf78c4ef08bfebf267580ed94d033	2020-08-21 11:45:02 -07:00
Alex Suhan	f20a04fa2d	[TensorExpr] Simplify conditional select (#43350 ) Summary: Fold conditional select when both sides are constant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43350 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.ConditionalSelectFold* Reviewed By: pbelevich Differential Revision: D23256602 Pulled By: asuhan fbshipit-source-id: ec04b1e4ae64f59fa574047f2d7af55a717a5262	2020-08-21 11:15:48 -07:00
Yinghai Lu	743cff4a1a	Fix PackedGemmMatrixFP16 repacking (#43320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43320 Previous impl seem to be buggy although I don't why. New impl is copied from https://fburl.com/diffusion/cing6mxv Reviewed By: jianyuh Differential Revision: D23235964 fbshipit-source-id: 780b6e388ef895232e3ba34b125c2492b1cee60c	2020-08-21 10:58:18 -07:00
Mike Ruberry	e57b89c8dc	Adds arccos, arcsin, arctan aliases (#43319 ) Summary: These aliases are consistent with NumPy (see, for example, https://numpy.org/doc/stable/reference/generated/numpy.arccos.html?highlight=acos). Note that PyTorch's existing names are consistent with Python (see https://docs.python.org/3.10/library/math.html?highlight=acos#math.acos) and C++ (see, for example, https://en.cppreference.com/w/cpp/numeric/math/acos). Pull Request resolved: https://github.com/pytorch/pytorch/pull/43319 Reviewed By: pbelevich Differential Revision: D23260426 Pulled By: mruberry fbshipit-source-id: 98a6c97f69d1f718a396c2182e938a7a260c0889	2020-08-21 10:53:17 -07:00
Mike Ruberry	3aec1185e0	Enables bfloat16 x [float16, complex64, complex128] type promotion (#43324 ) Summary: Implements bfloat16 type promotion consistent with JAX (see https://jax.readthedocs.io/en/latest/type_promotion.html), addressing issue https://github.com/pytorch/pytorch/issues/43049. - bfloat16 x float16 -> float32 - bfloat16 x complex64 -> complex64 - bfloat16 x complex128 -> complex128 Existing tests, after updates, are sufficient to validate the new behavior. cc xuhdev Pull Request resolved: https://github.com/pytorch/pytorch/pull/43324 Reviewed By: albanD Differential Revision: D23259823 Pulled By: mruberry fbshipit-source-id: ca9c2c7d0325faced1f884f3c37edf8fa8c8b089	2020-08-21 10:48:04 -07:00
Dmytro Dzhulgakov	478fb925e6	[jit] PyTorchStreamReader::getAllRecord should omit archive name prefix (#43317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43317 Previous version was returning the path with a prefix so subsequent `getRecord` would fail. There's only one place in PyTorch codebase that uses this function (introduced in https://github.com/pytorch/pytorch/pull/29339 ) and it's unlikely that anyone else is using it - it's not a public API anyway. Test Plan: unittest Reviewed By: houseroad Differential Revision: D23235241 fbshipit-source-id: 6f7363e6981623aa96320f5e39c54e65d716240b	2020-08-21 10:39:57 -07:00
Yanan Cao	0bd35de30e	Add Enum convert back to Python object support (#43121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43121 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23222628 Pulled By: gmagogsfm fbshipit-source-id: 6850c56ced5b52943a47f627b2d1963cc9239408	2020-08-21 10:36:51 -07:00
Hong Xu	f4b6ef9c56	Do not define the macro "isnan" (#43242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43242 This causes "std::isnan" to produce confusing error messages (std::std has not been declared). Instead, simply let isnan be exposed in the global namespace. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23214374 Pulled By: ezyang fbshipit-source-id: 9615116a980340e36376a20f2e546e4d36839d4b	2020-08-21 10:08:38 -07:00
Hong Xu	7b520297dc	Remove erroneous trailing backslashes (#43318 ) Summary: They were likely copied from some macro definition, but they do not belong to macro definitions here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43318 Reviewed By: pbelevich Differential Revision: D23241526 Pulled By: mrshenli fbshipit-source-id: e0b5eddfde2c882bb67f56d84ee79281cc5fc941	2020-08-21 08:21:56 -07:00
albanD	c2511bdfa4	Implement first draft of autograd benchmark. (#40586 ) Summary: It is quite a lot of code because I pulled some code from torchaudio and torchvision to remove issues I had to get latest version with pytorch built from source while I can't build there libs from source (dependency missing for torchaudio). The compare script generates table as follows: \| model \| task \| speedup \| mean (before) \| var (before) \| mean (after) \| var (after) \| \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| \| resnet18 \| vjp \| 1.021151844124464 \| 1.5627719163894653 \| 0.005164200905710459 \| 1.5304011106491089 \| 0.003979875706136227 \| \| resnet18 \| vhp \| 0.9919114430761606 \| 6.8089728355407715 \| 0.019538333639502525 \| 6.86449670791626 \| 0.014775685034692287 \| \| resnet18 \| jvp \| 0.9715963084255123 \| 5.720699310302734 \| 0.08197150379419327 \| 5.887938499450684 \| 0.018408503383398056 \| \| ppl_simple_reg \| vjp \| 0.9529183269165618 \| 0.000362396240234375 \| 7.526952949810095e-10 \| 0.00038030146970413625 \| 7.726220357939795e-11 \| \| ppl_simple_reg \| vhp \| 0.9317708619586977 \| 0.00048058031825348735 \| 5.035701855504726e-10 \| 0.0005157709238119423 \| 3.250243477137538e-11 \| \| ppl_simple_reg \| jvp \| 0.8609755877018406 \| 0.00045447348384186625 \| 9.646707044286273e-11 \| 0.0005278587341308594 \| 1.4493808930815533e-10 \| \| ppl_simple_reg \| hvp \| 0.9764100147808232 \| 0.0005881547695025802 \| 7.618464747949361e-10 \| 0.0006023645401000977 \| 6.370915461850757e-10 \| \| ppl_simple_reg \| jacobian \| 1.0019173715134297 \| 0.0003612995205912739 \| 2.2979899233499523e-11 \| 0.0003606081008911133 \| 1.2609764794835332e-11 \| \| ppl_simple_reg \| hessian \| 1.0358429970264393 \| 0.00206911563873291 \| 2.590938796842579e-09 \| 0.0019975185859948397 \| 2.8916853356264482e-09 \| \| ppl_robust_reg \| vjp \| 1.0669910916521521 \| 0.0017304659122601151 \| 3.1047047155396967e-09 \| 0.0016218185191974044 \| 4.926861585374809e-09 \| \| ppl_robust_reg \| vhp \| 1.0181130455462972 \| 0.0029563189018517733 \| 2.6359153082466946e-08 \| 0.0029037236236035824 \| 1.020585038702393e-08 \| \| ppl_robust_reg \| jvp \| 0.9818360373406179 \| 0.0026934861671179533 \| 6.981357714153091e-09 \| 0.00274331565015018 \| 3.589908459389335e-08 \| \| ppl_robust_reg \| hvp \| 1.0270848910527002 \| 0.005576515104621649 \| 3.2798087801211295e-08 \| 0.005429458804428577 \| 6.438724398094564e-08 \| \| ppl_robust_reg \| jacobian \| 1.0543611284155785 \| 0.00167675013653934 \| 2.3236829349571053e-08 \| 0.001590299652889371 \| 1.2011492245278532e-08 \| \| ppl_robust_reg \| hessian \| 1.0535378727082656 \| 0.01643357239663601 \| 1.8450685956850066e-06 \| 0.015598463825881481 \| 2.1876705602608126e-07 \| \| wav2letter \| vjp \| 1.0060408105086573 \| 0.3516994118690491 \| 1.4463969819189515e-05 \| 0.349587619304657 \| 9.897866402752697e-05 \| \| wav2letter \| vhp \| 0.9873655295086051 \| 1.1196287870407104 \| 0.00474404776468873 \| 1.133955717086792 \| 0.009759620763361454 \| \| wav2letter \| jvp \| 0.9741820317882822 \| 0.7888165712356567 \| 0.0017476462526246905 \| 0.8097219467163086 \| 0.0018235758179798722 \| \| transfo \| vjp \| 0.9883954031921641 \| 2.8865864276885986 \| 0.008410997688770294 \| 2.9204773902893066 \| 0.006901870481669903 \| \| transfo \| vhp \| 1.0111290842971339 \| 8.374398231506348 \| 0.014904373325407505 \| 8.282224655151367 \| 0.04449500888586044 \| \| transfo \| jvp \| 1.0080534543381963 \| 6.293097972869873 \| 0.03796082362532616 \| 6.24282169342041 \| 0.010179692879319191 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/40586 Reviewed By: pbelevich Differential Revision: D23242101 Pulled By: albanD fbshipit-source-id: a2b92d5a4341fe1472711a685ca425ec257d6384	2020-08-21 07:36:26 -07:00
rakshithvasudev	0cb52cb458	Autograd better error (#43308 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/5025 Thanks for the conversation in the issue thread. Hopefully this must fix it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43308 Reviewed By: ezyang Differential Revision: D23241918 Pulled By: suraj813 fbshipit-source-id: e1efac13f5ce590196f227149f011c973c2bbdde	2020-08-21 05:50:33 -07:00
taivu	da036250cd	Add benchmark for performance comparison (#43221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43221 Test Plan: Example: https://www.internalfb.com/intern/paste/P139226521/ Reviewed By: kimishpatel Differential Revision: D23197567 Pulled By: kimishpatel fbshipit-source-id: 7d0f8e653c62f0bee5795618e712d07effbd460a	2020-08-20 23:11:40 -07:00
BowenBao	da70976e66	[ONNX] Add support for operator `add` between tensor list (#41888 ) Summary: E.g. ```python outs = [] outs += [torch.randn(3,4)] outs = outs + [torch.randn(4,5), torch.randn(5,6)] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/41888 Reviewed By: houseroad Differential Revision: D23172880 Pulled By: bzinodev fbshipit-source-id: 93865106e3de5908a993e0cfa82f626ba94dab7e	2020-08-20 22:38:23 -07:00
Mike Ruberry	c64594f5cc	Extends test_unary_ufunc.py with numerics, contiguity, domain tests (#42965 ) Summary: This PR: - ports the tests in TestTorchMathOps to test_unary_ufuncs.py - removes duplicative tests for the tested unary ufuncs from test_torch.py - adds a new test, test_reference_numerics, that validates the behavior of our unary ufuncs vs. reference implementations on empty, scalar, 1D, and 2D tensors that are contiguous, discontiguous, and that contain extremal values, for every dtype the unary ufunc supports - adds support for skipping tests by regex, this behavior is used to make the test suite pass on Windows, MacOS, and ROCm builds, which have a variety of issues, and on Linux builds (see https://github.com/pytorch/pytorch/issues/42952) - adds a new OpInfo helper, `supports_dtype`, to facilitate test writing - extends unary ufunc op info to include reference, domain, and extremal value handling information - adds OpInfos for `torch.acos` and `torch.sin` These improvements reveal that our testing has been incomplete on several systems, especially with larger float values and complex values, and several TODOs have been added for follow-up investigations. Luckily when writing tests that cover many ops we can afford to spend additional time crafting the tests and ensuring coverage. Follow-up PRs will: - refactor TestTorchMathOps into test_unary_ufuncs.py - continue porting tests from test_torch.py to test_unary_ufuncs.py (where appropriate) Pull Request resolved: https://github.com/pytorch/pytorch/pull/42965 Reviewed By: pbelevich Differential Revision: D23238083 Pulled By: mruberry fbshipit-source-id: c6be317551453aaebae9d144f4ef472f0b3d08eb	2020-08-20 22:02:00 -07:00
Hameer Abbasi	e31cd46278	Add alias torch.fix for torch.trunc to be compatible with NumPy. (#43326 ) Summary: xref https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43326 Reviewed By: pbelevich Differential Revision: D23249089 Pulled By: mruberry fbshipit-source-id: 6afa9eb20493983d084e0676022c6245e7463e05	2020-08-20 21:47:39 -07:00
Edmund Williams Jr	17f9edda42	Bias Correction Implementation (#41845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41845 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D22661503 Pulled By: edmundw314 fbshipit-source-id: a88c349c6cc15b1c66aa6dee7593ef3df588eb85	2020-08-20 21:40:33 -07:00
taivu	665da61d2b	Replace Conv1d with Conv2d (#42867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42867 Test Plan: Imported from OSS Reviewed By: kimishpatel Differential Revision: D23177916 Pulled By: kimishpatel fbshipit-source-id: 68cc40cf42d03e5b8432dc08f9933a4409c76e25	2020-08-20 21:36:51 -07:00
Ashkan Aliabadi	e8139624f2	Search on system path for Vulkan headers and libraries as a last resort. (#43301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43301 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23252338 Pulled By: AshkanAliabadi fbshipit-source-id: 8eefe98eedf9dbeb570565bfb13ab61b1d6bca0e	2020-08-20 21:14:09 -07:00
Jerry Zhang	217ddea93a	[quant] Make OP_LIST_TO_FUSER_METHOD public (#43286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43286 We need to use this in graph mode quantization on fx Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23221734 fbshipit-source-id: 7c3c3840ce5bdc185b962e081aff1618f4c58e85	2020-08-20 20:19:13 -07:00
David Reiss	844d469ae7	Remove proprietary notices Summary: These were added accidentally (probably by an IDE) during a refactor. These files have always been Open Source. Test Plan: CI Reviewed By: xcheng16 Differential Revision: D23250761 fbshipit-source-id: 4974430c0e28dd3269424d38edb36f4f71508157	2020-08-20 20:14:59 -07:00
Jerry Zhang	9984d33542	[quant][graphmode][fx] Add support for conv module (#43285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43285 Porting op tests from test_quantize_jit.py (Note: this ignores all push blocking failures!) Test Plan: TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D23221733 fbshipit-source-id: c1f0f7ae0c82379143aa33fc1af7284d8303174b	2020-08-20 19:53:30 -07:00
Edward Yang	7c50c2f79e	Reimplement per-operator selective build (#39401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39401 This uses the technique proposed by smessmer in D16451848 to selectively register operators without codegen. See the Note inside for more details. This PR has feature parity with the old selective build apparatus: it can whitelist schema def()s, impl()s, and on a per dispatch key basis. It has expanded dispatch key whitelisting, whereas previously manually written registrations were not whitelisted at all. (This means we may be dropping dispatch keys where we weren't previously!) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D21905593 Pulled By: ezyang fbshipit-source-id: d4870f800c66be5ce57ec173c9b6e14a52c4a48b	2020-08-20 19:10:02 -07:00
lixinyu	e32d014f46	remove empty override pretty_print (#43341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43341 This is to remove the empty pretty_print() since it overrides the impl within Module base which is not as designed here. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D23244616 Pulled By: glaringlee fbshipit-source-id: 94b8dfd3697dfc450f53b3b4eee6e9c13cafba7b	2020-08-20 18:48:29 -07:00
Ivan Kobzarev	ad8294d35b	[vulkan][ci] Vulkan tests running on linux build via swiftshader (added to docker) (#42614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42614 Vulkan backend linux build (USE_VULKAN=1) and running Vulkan tests using software Vulkan implementation via [swiftshader](https://github.com/google/swiftshader) Vulkan linux build needs VulkanSdk and running tests needs Swiftshader. swiftshader needs to be compiled using clang toolchain, added them to bionic-clang-9 docker image. VulkanSdk will be downloaded from aws; Swiftshader is cloned from github, as it has many submodules , commit hash is fixed in install_swiftshader script. To pass all the tests: Disabled adaptive_avg_pool2d_2 as it needs at::view which will be landed in https://github.com/pytorch/pytorch/pull/42676 and after that can be enabled Change strides, padding, dilation params in tests to vector Docker image rebuild: https://app.circleci.com/pipelines/github/pytorch/pytorch/200251/workflows/465f911f-f170-47e1-954e-b9605d91abd8/jobs/6700311 Vulkan Linux Build: https://app.circleci.com/pipelines/github/pytorch/pytorch/200251/workflows/465f911f-f170-47e1-954e-b9605d91abd8/jobs/6701604 Vulkan Linux Test: https://app.circleci.com/pipelines/github/pytorch/pytorch/200251/workflows/465f911f-f170-47e1-954e-b9605d91abd8/jobs/6703026 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D23174038 Pulled By: IvanKobzarev fbshipit-source-id: 431c72e31743ca0c0b82a497420f6330a311b35b	2020-08-20 18:40:32 -07:00
Nikita Shulga	5cf8592663	Fix backward compatibility test (#43371 ) Summary: Drop `.out` suffix from allow_list pattern added by https://github.com/pytorch/pytorch/issues/43272 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43371 Reviewed By: pbelevich Differential Revision: D23256914 Pulled By: malfet fbshipit-source-id: 10168b55b98c24c84ac2676963049d1eca5c182d	2020-08-20 18:29:10 -07:00
Eli Uriegas	9a1f2b3617	.circleci: Use dynamic docker image for android (#43356 ) Summary: We recently upgraded to a dynamic docker image and this android build job was missed during that transition Fixes https://github.com/pytorch/pytorch/issues/43338 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/43356 Reviewed By: pbelevich Differential Revision: D23253175 Pulled By: seemethere fbshipit-source-id: 4831d4fe554a126e202e788444a63516d34b3d72	2020-08-20 17:42:26 -07:00
Nikita Shulga	e10aa47615	Fix `at::native::view_as_real()` for ComplexHalf Tensors (#43279 ) Summary: Add ComplexHalf case to toValueType, which fixes the logic how view_as_real and view_as_complex slices complex tensor to the floating point one, as it is used to generate tensor of random complex values, see: `018b4d7abb/aten/src/ATen/native/DistributionTemplates.h (L200)` Also add ability to convert python complex object to `c10::complex<at::Half>` Add `torch.half` and `torch.complex32` to the list of `test_randn` dtypes Fixes https://github.com/pytorch/pytorch/issues/43143 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43279 Reviewed By: mrshenli Differential Revision: D23230296 Pulled By: malfet fbshipit-source-id: b4bb66c4c81dd867e72ab7c4563d73f6a4d80a44	2020-08-20 17:38:06 -07:00
Jerry Zhang	b0ec336477	[quant][graphmode][fx][test] Add per op test for graph mode quant on fx (#43229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43229 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D23201692 fbshipit-source-id: 37fa54dcf0a9d5029f1101e11bfd4ca45b422641	2020-08-20 17:32:02 -07:00
Edward Yang	2b7108a96f	Update hardcoded pytorch_android_gradle_custom_build_single hash (#43340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43340 This doesn't fix https://github.com/pytorch/pytorch/issues/43338 but it gets us a little more up to date. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D23243933 Pulled By: ezyang fbshipit-source-id: ce2773c55864d1a6f6628ba60bb9ad6aee4aba14	2020-08-20 15:37:43 -07:00
Yanli Zhao	97d594b9f7	Make grad point to bucket buffer in DDP to save memory usage (#41954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41954 Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in https://github.com/pytorch/pytorch/pull/41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. ghstack-source-id: 110260297 Test Plan: unit tests, For roberta_base model with ~1GB parameters, peak memory dropped ~1GB (8250MB-7183MB). Per iteration latency (0.982s ->0.909s), 8% speed up https://www.internalfb.com/intern/fblearner/details/211713882?tab=operator_details https://www.internalfb.com/intern/fblearner/details/211772923?tab=operator_details For resnet model with ~97M parameters, peak memory dropped ~100MB (3089MB -> 2988MB). Per iteration latency has no change (0.122s -> 0.123s) https://www.internalfb.com/intern/fblearner/details/211713577?tab=operator_details https://www.internalfb.com/intern/fblearner/details/211712582?tab=operator_details accuracy benchmark is expected as well https://www.internalfb.com/intern/fblearner/details/213237067?tab=Outputs Reviewed By: mrshenli Differential Revision: D22707857 fbshipit-source-id: b5e767cfb34ccb3d067db2735482a86d59aea7a4	2020-08-20 15:33:44 -07:00
Ailing Zhang	51bab0877d	Fix torch.hub for new zipfile format. (#42333 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42239 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42333 Reviewed By: VitalyFedyunin Differential Revision: D23215210 Pulled By: ailzhang fbshipit-source-id: 161ead8b457c11655dd2cab5eecfd0edf7ae5c2b	2020-08-20 14:54:02 -07:00
Jerry Zhang	dae2973fae	[quant][graphmode][fx] Add graph mode quantization on fx (#43175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43175 This PR added graph mode quantization on fx: https://github.com/pytorch/pytorch/pull/42741 Currently it matches eager mode quantization for torchvision with static/dynamic/qat ddp/synbn test is still wip Test Plan: python test/test_quantization.py TestQuantizeFx Imported from OSS Reviewed By: vkuzo Differential Revision: D23178602 fbshipit-source-id: 8e7e0322846fbda2cfa79ad188abd7235326f879	2020-08-20 14:50:09 -07:00
Priyanshu	c89d2c6bf2	Replace black_list with block_list (#42088 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41735 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42088 Reviewed By: pbelevich Differential Revision: D22794582 Pulled By: SplitInfinity fbshipit-source-id: e256353befefa2630b99f9bcf0b79df3a7a8dcbd	2020-08-20 14:34:02 -07:00
Shen Li	a12fe1a242	Minor RPC doc fixes (#43337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43337 Test Plan: Imported from OSS Reviewed By: osalpekar Differential Revision: D23242698 Pulled By: osalpekar fbshipit-source-id: 7757fc43824423e3a6efd4da44c69995f64a6015	2020-08-20 14:17:07 -07:00
Shen Li	5006d24302	Make TensorPipe the default backend for RPC (#43246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43246 Test Plan: Imported from OSS Reviewed By: osalpekar Differential Revision: D23206042 Pulled By: osalpekar fbshipit-source-id: 258481ea9e753cd36c2787183827ca3b81d678e3	2020-08-20 14:17:02 -07:00
Jeff Daily	d0a6819b0e	[ROCm] skip test_rpc in .jenkins/pytorch/test.sh (#43305 ) Summary: https://github.com/pytorch/pytorch/issues/42636 added test_rpc, but this test binary is not built for ROCm. Skip this test for ROCm builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43305 Reviewed By: pbelevich Differential Revision: D23233087 Pulled By: mrshenli fbshipit-source-id: 29cd81e88a543c922a988e09d5f789becf4b74e4	2020-08-20 14:15:27 -07:00
Richard Zou	c66ca7a48d	vmap: Fix bug with x * 0.1 (#43218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43218 Previously, `vmap(lambda x: x * 0.1)(torch.ones(3))` would return a float64 tensor(!!). This is because there is a subtle bug in the batching rule: the batching rule receives: - A batched tensor for x - a scalar tensor: tensor(0.1, dtype=torch.float64). The batching rule decides to expand the scalar tensor to be the same size as x and then multiplies the two tensors, promoting the output to be a float64 tensor. However, this isn't correct: we should treat the scalar tensor like a scalar tensor. When adding a FloatTensor to a Double scalar tensor, we don't promote the type usually. Another example of a bug this PR fixes is the following: `vmap(torch.mul)(torch.ones(3), torch.ones(3, dtype=torch.float64))` Multiplying a scalar float tensor with a scalar double tensor produces a float tensor, but the above produced a float64 before this PR due to mistakingly type-promoting the tensors. Test Plan: - new test: `pytest test/test_vmap.py -v` - I refactored some tests a bit. Reviewed By: cpuhrsch Differential Revision: D23195418 Pulled By: zou3519 fbshipit-source-id: 33b7da841e55b47352405839f1f9445c4e0bc721	2020-08-20 13:44:31 -07:00
Ann Shan	0dc41ff465	[pytorch] add flag for autograd ops to mobile builds (#43154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43154 Adds the build flag `BUILD_MOBILE_AUTOGRAD` which toggles whether autograd files should be included for a PyTorch mobile build (default off). ghstack-source-id: 110369406 Test Plan: CI Reviewed By: ljk53 Differential Revision: D23061913 fbshipit-source-id: bc3d6683ab17f158990d83e4fae0a011d5adeca1	2020-08-20 12:39:55 -07:00
Supriya Rao	4fc9e958c4	[quant] Add benchmakrs for embedding_bag coversion ops (#43291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43291 Test Float2Fused and Fused2Float conversion operators for embedding_bag byte and 4-bit ops Test Plan: ``` python -m pt.qembedding_pack_tes ``` Imported from OSS Reviewed By: radkris-git Differential Revision: D23231641 fbshipit-source-id: a2afe51bba52980d2e96dfd7dbc183327e9349fd	2020-08-20 11:26:20 -07:00
Natalia Gimelshein	c8bc298d6c	streamline stride propagation logic in TensorIterator (#42922 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41314 among other things. This PR streamlines layout propagation logic in TensorIterator and removes almost all cases of channels-last hardcoding. The new rules and changes are as follows: 1) behavior of undefined `output` and defined output of the wrong (e.g. 0) size is always the same (before this PR the behavior was divergent) 2) in obvious cases (unary operation on memory-dense tensors, binary operations on memory-dense tensors with the same layout) strides are propagated (before propagation was inconsistent) (see footnote) 3) in other cases the output permutation is obtained as inverse permutation of sorting inputs by strides. Sorting is done with comparator obeying the following rules: strides of broadcasted dimensions are set to 0, and 0 compares equal to anything. Strides of not-broadcasted dimensions (including dimensions of size `1`) participate in sorting. Precedence is given to the first input, in case of a tie in the first input, first the corresponding dimensions are considered, and if that does not indicate that swap is needed, strides of the same dimension in subsequent inputs are considered. See changes in `reorder_dimensions` and `compute_strides`. Note that first inspecting dimensions of the first input allows us to better recover it's permutation (and we select this behavior because it more reliably propagates channels-last strides) but in some rare cases could result in worse traversal order for the second tensor. These rules are enough to recover previously hard-coded behavior related to channels last, so all existing tests are passing. In general, these rules will produce intuitive results, and in most cases permutation of the full size input (in case of broadcasted operation) will be recovered, or permutation of the first input (in case of same sized inputs) will be recovered, including cases with trivial (1) dimensions. As an example of the latter, the following tensor ``` x=torch.randn(2,1,3).permute(1,0,2) ``` will produce output with the same stride (3,3,1) in binary operations with 1d tensor. Another example is a tensor of size N1H1 that has strides `H,H,1,1` when contiguous and `H, 1, 1, 1` when channels-last. The output retains these strides in binary operations when another 1d tensor is broadcasted on this one. Footnote: for ambiguous cases where all inputs are memory dense and have the same physical layout that nevertheless can correspond to different permutations, such as e.g. NC11-sized physically contiguous tensors, regular contiguous tensor is returned, and thus permutation information of the input is lost (so for NC11 channels-last input had the strides `C, 1, C, C`, but output will have the strides `C, 1, 1, 1`). This behavior is unchanged from before and consistent with numpy, but it still makes sense to change it. The blocker for doing it currently is performance of `empty_strided`. Once we make it on par with `empty` we should be able to propagate layouts in these cases. For now, to not slow down common contiguous case, we default to contiguous. The table below shows how in some cases current behavior loses permutation/stride information, whereas new behavior propagates permutation. \| code \| old \| new \| \|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|-------------------------------------------------------\|------------------------------------------------------\| \| #strided tensors<br>a=torch.randn(2,3,8)[:,:,::2].permute(2,0,1)<br>print(a.stride())<br>print(a.exp().stride())<br>print((a+a).stride())<br>out = torch.empty(0)<br>torch.add(a,a,out=out)<br>print(out.stride()) \| (2, 24, 8) <br>(6, 3, 1) <br>(1, 12, 4) <br>(6, 3, 1) \| (2, 24, 8)<br>(1, 12, 4)<br>(1, 12, 4)<br>(1, 12, 4) \| \| #memory dense tensors<br>a=torch.randn(3,1,1).as_strided((3,1,1), (1,3,3))<br>print(a.stride(), (a+torch.randn(1)).stride())<br>a=torch.randn(2,3,4).permute(2,0,1)<br>print(a.stride())<br>print(a.exp().stride())<br>print((a+a).stride())<br>out = torch.empty(0)<br>torch.add(a,a,out=out)<br>print(out.stride()) \| (1, 3, 3) (1, 1, 1)<br>(1, 12, 4)<br>(6, 3, 1)<br>(1, 12, 4)<br>(6, 3, 1) \| (1, 3, 3) (1, 3, 3)<br>(1, 12, 4)<br>(1, 12, 4)<br>(1, 12, 4)<br>(1, 12, 4) \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/42922 Reviewed By: ezyang Differential Revision: D23148204 Pulled By: ngimel fbshipit-source-id: 670fb6188c7288e506e5ee488a0e11efc8442d1f	2020-08-20 10:50:35 -07:00
Eli Uriegas	ca9d4401d4	.circleci: Remove manual docker installation (#43277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43277 Docker added native support for GPUs with the release of 19.03 and CircleCI's infrastructure is all on Docker 19.03 as of now. This also removes all references to `nvidia-docker` in the `.circleci` fodler. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23217570 Pulled By: seemethere fbshipit-source-id: af297c7e82bf264252f8ead10d1a154354b24689	2020-08-20 10:36:03 -07:00
Eli Uriegas	66a79bf114	.circleci: Don't quote glob for conda upload (#43297 ) Summary: Globs don't get expanded if you quote them in a bash script... apparently. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/43297 Reviewed By: malfet Differential Revision: D23227626 Pulled By: seemethere fbshipit-source-id: d124025cfcaacbfb68167a062ca487c08f7f6bc9	2020-08-20 10:24:27 -07:00
Edward Yang	397325a109	Make _compute_linear_combination.out a true out function (#43272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43272 Was missing kwarg-onlyness. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D23215506 Pulled By: ezyang fbshipit-source-id: 2c282c9a534fa8ea1825c31a24cb2441f0d6b234	2020-08-20 09:00:17 -07:00
Sean Lynch	f9a766bb39	Increase deadline time for load_save tests (#43205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43205 A number of tests that forward to `TestLoadSaveBase.load_save` are all marked as flaky due to them regularly taking much longer to start up than hypothesis' default timeout of 200ms. This diff fixes the problem by removing the timeout for `load_save`. This is alright as these tests aren't meant to be testing the performance of these operators. I would set the deadline to 60s if I could however it appears the that caffe2 github CI uses a different version of hypothesis that doesn't allow using `dateutil.timedelta` so instead of trying to figure out an approach that works on both I've just removed the deadline time. I've also tagged all existing tasks WRT these failures. Differential Revision: D23175752 fbshipit-source-id: 324f9ff034df1ac4874797f04f50067149a6ba48	2020-08-20 08:41:24 -07:00
Anthony Scopatz	a2ae2d3203	Nightly Pull (#43294 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40829 This addresses remaining issues/improvements in https://github.com/pytorch/pytorch/issues/40829 that were brought up prior to https://github.com/pytorch/pytorch/issues/42635 being merged. Namely, this changes the name of the script and adds separate `checkout` and `pull` subcommands. I have tested it locally and everything appears to work. Please let me know if you encounter any issues. I hope that this supports a more natural workflow. CC ezyang rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/43294 Reviewed By: pbelevich Differential Revision: D23241849 Pulled By: ezyang fbshipit-source-id: c24556024d7e5d14b9a5006e927819d4ad370dd7	2020-08-20 08:34:18 -07:00
Ashkan Aliabadi	6a09df99e1	Fix ASAN error in QNNPACK's integration of qlinear_dynamic. (#41967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41967 Test Plan: `buck test fbandroid/mode/asan xplat/assistant/oacr/nlu/tests:nlu_testsAndroid` no longer reports an error. Reviewed By: kimishpatel, xuwenfang Differential Revision: D22715307 Pulled By: AshkanAliabadi fbshipit-source-id: bec7296b345125ec5243ee6e6c484246ecfca3b7	2020-08-20 07:46:34 -07:00
suraj813	60b524f271	Update torch.Tensor.is_set_to documentation (#43052 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30350 Preview: ![image](https://user-images.githubusercontent.com/5676233/90250018-69d72200-de09-11ea-8984-7401cfd6c719.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43052 Reviewed By: mrshenli Differential Revision: D23173066 Pulled By: suraj813 fbshipit-source-id: d90a11490739068ea448d975548a71e07180bd77	2020-08-20 07:40:00 -07:00
Nikita Shulga	4e964f3b97	Make Windows CUDA-11 tests master only (#43234 ) Summary: According to the correlation analysis, CUDA-10.1 vs CUDA-11 test failures are quite dependent on each other Pull Request resolved: https://github.com/pytorch/pytorch/pull/43234 Reviewed By: ezyang, seemethere Differential Revision: D23204289 Pulled By: malfet fbshipit-source-id: c53c5f87e55f2dabbb6735a0566c314c204ebc69	2020-08-19 21:05:46 -07:00
Rong Rong	3eb31325fc	refactor torch/cuda/nccl.h to remove direct dependency on NCCL in libtorch_python (#42687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42687 Reviewed By: malfet Differential Revision: D23145834 Pulled By: walterddr fbshipit-source-id: c703a953a54a638852f6e5a1479ca95ae6a10529	2020-08-19 20:16:53 -07:00
Sinan Nasir	6e1127ea3f	[NCCL] Changed FutureNCCL's then callback logic for better efficiency. (#42869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42869 We realized that when we invoke a simple callback that divides the tensors by `world_size` after `allreduce`, the performance was almost 50% lower in terms of QPS compared to the case where a simple `allreduce` hook is used with no `then` callback. The main problem was as we call `work.wait()` before invoking `then` callback, we were synchronizing `work`'s stream with the default PyTorch stream inside [`runHook`](https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/c10d/reducer.cpp#L609) and stalling the backward computation. In that PR, we ensure that FutureNCCL's `then` callback is not stalling the backward computation. Assuming single-process single-device, `FutureNCCL` gets a new stream from device's pool using `at::cuda::getStreamFromPool` to run `callback` and before invoking the `callback` inline it synchronizes `WorkNCCL`'s stream by callback's stream not the default stream. ghstack-source-id: 110208431 Test Plan: Run performance benchmark tests to validate performance issue is resolved. Also, `python test/distributed/test_c10d.py` to avoid any odd issues. Reviewed By: pritamdamania87 Differential Revision: D23055807 fbshipit-source-id: 60e50993f1ed97497514eac5cb1018579ed2a4c5	2020-08-19 19:42:22 -07:00
Rong Rong	97d62bcd19	Modify Circle CI script to upload test report for analysis. (#43180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43180 Reviewed By: VitalyFedyunin Differential Revision: D23195934 Pulled By: walterddr fbshipit-source-id: 5b9b411c3ea769951b5b1a456b5f7696b8ba0a92	2020-08-19 19:38:25 -07:00
Ivan Kobzarev	0617156f0e	[vulkan] fix invalid memory op and tests (#43312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43312 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D23232809 Pulled By: IvanKobzarev fbshipit-source-id: 11b070b6e082bac72e21dd4c25c9c675bbc8c4a3	2020-08-19 19:34:08 -07:00
Radhakrishnan Venkataramani	aad1ff9f18	[quant][cleanup]test_qlinear_legacy should be under TestDynamicQuantizedLinear. (#40084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40084 This is just a nit diff (got merge conflict) while writing some unit-tests. This move was nit as part of D21628596 (`655f1ea176`). Test Plan: buck test test:quantization -- test_qlinear_legacy Reviewed By: supriyar Differential Revision: D22065463 fbshipit-source-id: 96ceaa53355349af7157f38b3a6366c550eeec6f	2020-08-19 18:50:46 -07:00
Zachary DeVito	410d5b95b2	[jit] fix str -> Device implicit conversions (#43213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43213 A reversed isSubtypeOf caused erroreous conversions to be inserted. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23192787 Pulled By: zdevito fbshipit-source-id: 4a90b19d99a4fc889e55568ced850f08dadbc3fe	2020-08-19 16:05:11 -07:00
Facebook Community Bot	018b4d7abb	Automated submodule update: FBGEMM (#43251 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `685149bbc0` Pull Request resolved: https://github.com/pytorch/pytorch/pull/43251 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: YazhiGao Differential Revision: D23207016 fbshipit-source-id: 54e13b246bb5189260ed11316ddf3d26d52c6b24	2020-08-19 11:42:16 -07:00
Eli Uriegas	eb7fc2e98f	.circleci: Simplify binary upload process (#43159 ) Summary: Binary uploads were gated into 3 separate scripts making it difficult to actually contribute changes, this simplifies that by consolidating all 3 scripts into one single script and then further consolidates it by putting them all into the same job. This also further simplifies things by separating upload jobs into their own function under binary_build_definitions.py, since following the conditional logic tree under the generic function was too difficult. Testing this change here: https://github.com/pytorch/pytorch/pull/43161 Proof of success: * [libtorch](https://app.circleci.com/pipelines/github/pytorch/pytorch/201868/workflows/54ce962f-f35b-4d97-93a7-bee186b14ead/jobs/6791347) * [conda](https://app.circleci.com/pipelines/github/pytorch/pytorch/201868/workflows/54ce962f-f35b-4d97-93a7-bee186b14ead/jobs/6794359) * [manywheel](https://app.circleci.com/pipelines/github/pytorch/pytorch/201868/workflows/54ce962f-f35b-4d97-93a7-bee186b14ead/jobs/6794253) Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/43159 Reviewed By: malfet Differential Revision: D23175174 Pulled By: seemethere fbshipit-source-id: a2de64c033df99b03a124d3a0a2c92560af62c37	2020-08-19 11:34:14 -07:00
Hongyi Jia	d467ac8ff0	[GLOO] handle empty split size (#43256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43256 * Handle empty split size by moving to call computeLengthsAndOffsets() * Enable GLOO alltoall python tests ghstack-source-id: 109292763 Test Plan: buck build mode/dev-nosan caffe2/torch/lib/c10d:ProcessGroupGlooTest ./trainer_cmd.sh -p 16 -n 8 -d gloo (modify ./trainer_cmd.sh a bit) Reviewed By: mingzhe09088 Differential Revision: D22961600 fbshipit-source-id: b9e90dadf7b45323b8af2e6cab2e156043b7743b	2020-08-19 11:14:06 -07:00
Richard Zou	7d10298067	Implement Tensor.to batching rule (#43206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43206 The batching rule is the same as the unary pointwise batching rules: given a BatchedTensor, we unwrap it, call Tensor.to, and then re-wrap it. Test Plan: - `pytest test/test_vmap.py -v -k` Reviewed By: ezyang Differential Revision: D23189053 Pulled By: zou3519 fbshipit-source-id: 51b4e41b1cd34bd082082ec4fff3c643002edbaf	2020-08-19 10:54:26 -07:00
Nikita Shulga	1e248caba8	[CircleCI] Use `canary` images until VC++ 14.27 issue is resolved (#43220 ) Summary: Should fix binary build issue on Windows, and promptly error out if images are updated to a different version of VC++ Pull Request resolved: https://github.com/pytorch/pytorch/pull/43220 Reviewed By: ezyang Differential Revision: D23198530 Pulled By: malfet fbshipit-source-id: 0c80361ad7dcfb7aaffccc306b7d741671bedc11	2020-08-19 10:28:19 -07:00
Edward Yang	bc0e1e8ed2	Add dataclasses to base Docker images. (#43217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43217 Dataclasses is part of standard library in Python 3.7 and there is a backport for it in Python 3.6. Our code generation will start using it, so add it to the default library set. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D23214028 Pulled By: ezyang fbshipit-source-id: a2ae20b9fa8f0b22966ae48506d4ddea203e7459	2020-08-19 09:56:23 -07:00
Hector Yuen	06d43dc69a	default ice-ref to c-step (#4812 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/4812 if no compilation options are passed, default to c-step fixed the FC and batchmatmul implementations to match C-step fixed the fakelowp map calling to make sure we use the fp32 substitution of operators updated the accumulator test to make it pass with fp32 Test Plan: fakelowp tests glow/test/numerics net_runner Reviewed By: jfix71 Differential Revision: D23086534 fbshipit-source-id: 3fbb8c4055bb190becb39ce8cdff6671f8558734	2020-08-19 09:50:34 -07:00
Radhakrishnan Venkataramani	fa6b34b54c	2 Bit Embedding Conversion Operator support. (#43077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43077 2 Bit Embedding weight conversion operation is quite similar to 4 bit embedding weight conversion. The diff contains both the 1. 2bit packing op `embedding_bag_2bit_prepack`. 2. 2bit unpacking op `embedding_bag_2bit_unpack`. Comments about the op are inline with the op definition. Test Plan: buck test caffe2/test:quantization -- test_embedding_bag_2bit_unpack Reviewed By: supriyar Differential Revision: D23143262 fbshipit-source-id: fd8877f049ac1f7eb4bc580e588dc95f8b1edef0	2020-08-18 23:20:30 -07:00
Edward Yang	ab366d0f5f	Fix some mistakes in native_functions.yaml (#43156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43156 - supports_named_tensor no longer does anything, so I have removed it. I'm guessing these were cargo culted from some old occurrences of it in native_functions.yaml - comma, not period, in variants In my upcoming codegen rewrite, there will be strict error checking for these cases (indeed, that is how I found these problems), so I do not add error testing here. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23183977 Pulled By: ezyang fbshipit-source-id: a47d342152badfb8aea248a819ad94fd93dd6ab2	2020-08-18 23:13:20 -07:00
Jeff Daily	27ec91b0c9	remove thunk fix now that ROCm CI images are >= ROCm 3.5 (#43226 ) Summary: Also, relax BUILD_ENVIRONMENT exact match to rocm when installing pip packages for tests. CC ezyang xw285cornell sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43226 Reviewed By: colesbury Differential Revision: D23200460 Pulled By: xw285cornell fbshipit-source-id: 11cd889cc320d0249d7ebea4da261bfe779e82ac	2020-08-18 23:10:15 -07:00
Jeff Daily	8094228f26	update path in CI script to access ninja (#43236 ) Summary: This relaxes the assumption that test.sh will be run in the CI environment by the CI user. CC ezyang xw285cornell sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43236 Reviewed By: colesbury Differential Revision: D23205981 Pulled By: ezyang fbshipit-source-id: 302743cb03c9e9c6bfcdd478a6cd920b536dc29b	2020-08-18 21:43:41 -07:00
Nikita Shulga	7c923a1025	Optimize linux CI build/test matrix (#43240 ) Summary: Make CUDA-10.1 configs build-only, as CUDA-10.1 and CUDA-10.2 test matrix is almost identical, and now, since CUDA-11 is out perhaps it's time to stop testing CUDA-10.1. Make CUDA-9.2+GCC_5.4 an important (i.e. running on PR) build only config, because of the big overlap between CUDA-9.2-GCC7 and CUDA-9.2-GCC5.4 test coverage. Make CUDA-11 libtorch tests important rather that CUDA-10.2. As result of the change, every PR will be built against CUDA-9.2, CUDA-10.2 and CUDA-11 and tested against CUDA-10.2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43240 Reviewed By: ezyang Differential Revision: D23205129 Pulled By: malfet fbshipit-source-id: 70932e8b2167cce9fd621115c8bf24b1c81ed621	2020-08-18 20:39:32 -07:00
Nikita Shulga	e41ca2d9fa	In copy_weights_to_flat_buf_views() explicitly construct tuple (#43244 ) Summary: In some versions of GCC, tuple constructor from initializer list is marked as explicit, which results in the following compilation error: ``` /var/lib/jenkins/workspace/aten/src/ATen/native/cudnn/RNN.cpp: In function 'std::tuple<at::Tensor, std::vector<at::Tensor, std::allocator<at::Tensor> > > at::native::cudnn_rnn::copy_weights_to_flat_buf_views(at::TensorList, int64_t, int64_t, int64_t, int64_t, int64_t, bool, bool, cudnnDataType_t, const c10::TensorOptions&, bool, bool, bool)': /var/lib/jenkins/workspace/aten/src/ATen/native/cudnn/RNN.cpp:687:35: error: converting to 'std::tuple<at::Tensor, std::vector<at::Tensor, std::allocator<at::Tensor> > >' from initializer list would use explicit constructor 'constexpr std::tuple<_T1, _T2>::tuple(_U1&&, _U2&&) [with _U1 = at::Tensor&; _U2 = std::vector<at::Tensor>&; <template-parameter-2-3> = void; _T1 = at::Tensor; _T2 = std::vector<at::Tensor>]' return {weight_buf, params_arr}; ``` This regression was introduced by https://github.com/pytorch/pytorch/pull/42385 Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43244 Reviewed By: pbelevich Differential Revision: D23205656 Pulled By: malfet fbshipit-source-id: 51470386ad95290c7c99d733fc1fe655aa27d009	2020-08-18 19:31:51 -07:00
Nikita Shulga	d06f1818ad	Fix `codegen/cuda` gcc-5.4 compilation issues (#43223 ) Summary: Most of the fixes is the same old enum-is-not-hasheable error In manager.cpp use std::unordered_map::emplace rather than `insert` to avoid error triggered by missed copy elision This regression was introduced by https://github.com/pytorch/pytorch/pull/43129 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43223 Reviewed By: albanD, seemethere Differential Revision: D23198330 Pulled By: malfet fbshipit-source-id: 576082f7a4454dd29182892c9c4e0b51a967d456	2020-08-18 17:19:07 -07:00
Xiang Gao	d5bc2a8058	Remove std::complex from c10::Half (#39833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39833 Reviewed By: mrshenli Differential Revision: D22644987 Pulled By: anjali411 fbshipit-source-id: 5ae5db10b12d410560eca43234efa04b711a639c	2020-08-18 15:22:36 -07:00
Bert Maher	6c99d5611d	[tensorexpr] Fix promotion of booleans (#43097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43097 Boolean arguments weren't promoted, so if you tried to write a comparison with types such as `Tensor(Bool) == Int` you'd fail typechecking inside the TE engine. Test Plan: Imported from OSS Reviewed By: protonu, zheng-xq Differential Revision: D23167926 Pulled By: bertmaher fbshipit-source-id: 47091a815d5ae521637142a5c390e8a51a776906	2020-08-18 15:19:38 -07:00
James Gilbert	da5df7e2d2	Remove use of term "blacklist" from tools/autograd/gen_python_functions.py (#42047 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41720 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42047 Reviewed By: colesbury Differential Revision: D23197785 Pulled By: SplitInfinity fbshipit-source-id: 8ef38518f479e5e96b6a51bc420b0df5b35b447c	2020-08-18 15:11:22 -07:00
James Reed	3951457ca5	[FX] Add in resnet + quantization tests (#43157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43157 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23173327 Pulled By: jamesr66a fbshipit-source-id: 724d0f5399d389cdaa53917861b2113c33b9b5f9	2020-08-18 15:00:18 -07:00
Ann Shan	dd194c1612	add _save_parameters to serialize map (#43163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43163 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23175287 Pulled By: ann-ss fbshipit-source-id: ddfd734513c07e8bdbec108f26d1ca1770d098a6	2020-08-18 14:58:04 -07:00
Ann Shan	2e6e295ecc	refactor _save_parameters to _save_data (#43162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43162 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23175286 Pulled By: ann-ss fbshipit-source-id: 6f930b98c367242fd4efbf51cb1d09995f7c4b40	2020-08-18 14:57:03 -07:00
Nikita Vedeneev	888ae1b3d8	Introducing Matrix exponential (#40161 ) Summary: Implements (batched) matrix exponential. Fixes [https://github.com/pytorch/pytorch/issues/9983](https://github.com/pytorch/pytorch/issues/9983). The algorithm follows: ``` Bader, P.; Blanes, S.; Casas, F. Computing the Matrix Exponential with an Optimized Taylor Polynomial Approximation. Mathematics 2019, 7, 1174. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/40161 Reviewed By: zhangguanheng66 Differential Revision: D22951372 Pulled By: ezyang fbshipit-source-id: aa068cb76d5cf71696b333d3e72cee287b3089e3	2020-08-18 14:15:10 -07:00
Hong Xu	dfdd797723	Replace all AT_ASSERTM under ATen CUDA kernels. (#42989 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42989 Test Plan: Imported from OSS Reviewed By: colesbury Differential Revision: D23190011 Pulled By: ezyang fbshipit-source-id: 7489598d7d920f32334943c1bf12bba74208a96c	2020-08-18 13:50:49 -07:00
Hong Xu	493b3c2c7c	Replace all AT_ASSERTM under ATen CPU kernels. (#41876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41876 Test Plan: Imported from OSS Reviewed By: colesbury Differential Revision: D23190010 Pulled By: ezyang fbshipit-source-id: 238f1cd8db283805d6e892de7549763d0aa13316	2020-08-18 13:49:15 -07:00
mariosasko	0744dd6166	Fix shapes in the MarginRankingLoss docs (#43131 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42884 I did some additional research and considering the first few lines of the docs (`Creates a criterion that measures the loss given inputs x1, x2, two 1D mini-batch Tensors, and a label 1D mini-batch tensor y (containing 1 or -1`) and the provided tests, this loss should be used primarily with 1-D tensors. More advanced users (that may use this loss in non-standard ways) can easily check the source and see that the definition accepts inputs/targets of arbitrary dimension as long as they match in shape or are broadcastable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43131 Reviewed By: colesbury Differential Revision: D23192011 Pulled By: mrshenli fbshipit-source-id: c412c28daf9845c0142ea33b35d4287e5b65fbb9	2020-08-18 13:44:16 -07:00
Michael Carilli	fbf274f5a7	Autocast support for cudnn RNNs (#42385 ) Summary: Should close https://github.com/pytorch/pytorch/issues/36428. The cudnn RNN API expects weights to occupy a flat buffer in memory with a particular layout. This PR implements a "speed of light" fix: [`_cudnn_rnn_cast_reflatten`](https://github.com/pytorch/pytorch/pull/42385/files#diff-9ef93b6a4fb5a06a37c562b83737ac6aR327) (the autocast wrapper assigned to `_cudnn_rnn`) copies weights to the right slices of a flat FP16 buffer with a single read/write per weight (as opposed to casting them to FP16 individually then reflattening the individual FP16 weights, which would require 2 read/writes per weight). It isn't pretty but IMO it doesn't make rnn bindings much more tortuous than they already are. The [test](https://github.com/pytorch/pytorch/pull/42385/files#diff-e68a7bc6ba14f212e5e7eb3727394b40R2683) tries a forward under autocast and a backward for the full cross product of RNN options and input/weight/hidden dtypes. As for all FP16list autocast tests, forward output and backward grads are checked against a control where inputs (including RNN module weights in this case) are precasted to FP16 on the python side. Not sure who to ask for review, tagging ezyang and ngimel because Ed wrote this file (almost 2 years ago) and Natalia did the most recent major [surgery](https://github.com/pytorch/pytorch/pull/12600). Side quests discovered: - Should we update [persistent RNN heuristics](`dbdd28207c/aten/src/ATen/native/cudnn/RNN.cpp (L584)`) to include compute capability 8.0? Could be another PR but seems easy enough to include. - Many (maybe all?!) the raw cudnn API calls in [RNN.cpp](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cudnn/RNN.cpp) are deprecated in cudnn 8. I don't mind taking the AI to update them since my mental cache is full of rnn stuff, but that would be a substantial separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42385 Reviewed By: zhangguanheng66 Differential Revision: D23077782 Pulled By: ezyang fbshipit-source-id: a2afb1bdab33ba0442879a703df13dc87f03ec2e	2020-08-18 13:37:42 -07:00
Michael Carilli	0a9c35aba3	maybe minor fix to dispatch/backend_fallback_test.cpp? (#42990 ) Summary: I think you want to push rewrapped `rets`, not `args`, back to the stack. Doesn't matter for test purposes because tests only check if/when fallbacks were called, they don't check outputs for correctness. But it avoids reader confusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42990 Reviewed By: mrshenli Differential Revision: D23168277 Pulled By: ezyang fbshipit-source-id: 2559f0707acdca2e3deac09006bc66ce3c788ea3	2020-08-18 13:01:35 -07:00
Oliver Thomas	e39b43fd76	Issue 43057 (#43063 ) Summary: A small change that adds a docstring that can be found with `getattr(nn.Module, nn.Module.forward.__name__, None).__doc__` Fixes https://github.com/pytorch/pytorch/issues/43057 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43063 Reviewed By: mrshenli Differential Revision: D23161782 Pulled By: ezyang fbshipit-source-id: 95456f858e2b6a0e41ae551ea4ec2e78dd35ee3f	2020-08-18 12:50:53 -07:00
Vinod Kumar S	5d608d45cf	Added Encoder Layer constructor with default parameters (#43130 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43130 Reviewed By: colesbury Differential Revision: D23189803 Pulled By: mrshenli fbshipit-source-id: 53f3fca838828ddd728d8b44c36745bab5acee1f	2020-08-18 11:09:49 -07:00
pacowong	53bbf5a48b	Update README.md (#43100 ) Summary: The changes are minor. 1. Add back the external links so that readers can find out more about external tools on how to accelerate PyTorch. 2. Fix typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/43100 Reviewed By: colesbury Differential Revision: D23192251 Pulled By: mrshenli fbshipit-source-id: dde54b7942ebff5bbe3d58ad95744c6d95fe60fe	2020-08-18 11:04:36 -07:00
Xiang Gao	ee74c2e5be	Compress fatbin to fit into 32bit indexing (#43074 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39968 tested with `TORCH_CUDA_ARCH_LIST='3.5 5.2 6.0 6.1 7.0 7.5 8.0+PTX'`, before this PR, it was failing, and with this PR, the build succeed. With `TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0+PTX'`, `libtorch_cuda.so` with symbols changes from 2.9GB -> 2.2GB cc: ptrblck mcarilli jjsjann123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43074 Reviewed By: mrshenli Differential Revision: D23176095 Pulled By: malfet fbshipit-source-id: 7b3e6d049fc080e519f21e80df05ef68e7bea57e	2020-08-18 09:48:54 -07:00
Yinghai Lu	b92b556a12	Add shape inference to SparseLengthsSumSparse ops (#43181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43181 att Test Plan: ``` buck test caffe2/caffe2/opt:bound_shape_inference_test ``` Reviewed By: ChunliF Differential Revision: D23097145 fbshipit-source-id: 3e4506308446f28fbeb01dcac97dce70c0443975	2020-08-18 09:36:53 -07:00
Christian Sarofeen	b3bda94393	[NVFuser] Enable E2E BCast-PWise-Reduction fusions (#43129 ) Summary: Had a bunch of merged commits that shouldn't have been there, reverted them to prevent conflicts. Lots of new features, highlights listed below. Overall: - Enables pointwise fusion, single (but N-D) broadcast -- pointwise fusion, single (but N-D) broadcast -- pointwise -- single (but N-D) reduction fusion. Integration: - Separate "magic scheduler" logic that takes a fusion and generates code generator schedule - Reduction fusion scheduling with heuristics closely matching eagermode (unrolling supported, but no vectorize support) - 2-Stage caching mechanism, one on contiguity, device, type, and operations, the other one is input size->reduction heuristic Code Generation: - More generic support in code generation for computeAt - Full rework of loop nest generation and Indexing to more generically handle broadcast operations - Code generator has automatic kernel launch configuration (including automatic allocation of grid reduction buffers) - Symbolic (runtime) tilling on grid/block dimensions is supported - Simplified index generation based on user-defined input contiguity - Automatic broadcast support (similar to numpy/pytorch semantics) - Support for compile time constant shared memory buffers - Parallelized broadcast support (i.e. block reduction -> block broadcast support) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43129 Reviewed By: mrshenli Differential Revision: D23162207 Pulled By: soumith fbshipit-source-id: 16deee4074c64de877eed7c271d6a359927111b2	2020-08-18 09:10:08 -07:00
peter	c44b1de54e	Pin VC++ version to 14.26 (#43184 ) Summary: VC++14.27 fails to compile mkl-dnn, see oneapi-src/oneDNN#812 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43184 Reviewed By: glaringlee Differential Revision: D23181803 Pulled By: malfet fbshipit-source-id: 9861c6243673c775374d77d2f51b45a42791b475	2020-08-17 22:17:06 -07:00
Natalia Gimelshein	e8db0425b5	remove dot from TH (#43148 ) Summary: small cleanup of dead code Pull Request resolved: https://github.com/pytorch/pytorch/pull/43148 Reviewed By: mruberry Differential Revision: D23175571 Pulled By: ngimel fbshipit-source-id: b1b0ae9864d373c75666b95c589d090a9ca791b2	2020-08-17 21:40:44 -07:00
Xiang Gao	aef2890a75	Improve zero sized input for addmv (#41824 ) Summary: fixes https://github.com/pytorch/pytorch/issues/41340 Unfortunately, I still can not get a K80 to verify the fix, but it should be working. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41824 Reviewed By: mruberry Differential Revision: D23172775 Pulled By: ngimel fbshipit-source-id: aa6af96fe74e3bb07982c006cb35ecc7f18181bc	2020-08-17 20:05:31 -07:00
Yael Dekel	3c5e3966f4	[ONNX] Squeeze operator should give an error when trying to apply to a dimension with shape > 1 (#38476 ) Summary: The ONNX spec for the Squeeze operator: > Remove single-dimensional entries from the shape of a tensor. Takes a parameter axes with a list of axes to squeeze. If axes is not provided, all the single dimensions will be removed from the shape. If an axis is selected with shape entry not equal to one, an error is raised. Currently, as explained in issue https://github.com/pytorch/pytorch/issues/36796, it is possible to export such a model to ONNX, and this results in an exception from ONNX runtime. Fixes https://github.com/pytorch/pytorch/issues/36796. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38476 Reviewed By: hl475 Differential Revision: D22158024 Pulled By: houseroad fbshipit-source-id: bed625f3c626eabcbfb2ea83ec2f992963defa19	2020-08-17 17:41:46 -07:00
Gregory Chanan	cd96dfd44b	Delete accidentally committed file errors.txt. (#43164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43164 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23175392 Pulled By: gchanan fbshipit-source-id: 0d2d918fdf4a94361cdc3344bf1bc89dd0286ace	2020-08-17 17:37:48 -07:00
Vasiliy Kuznetsov	57af1ec145	observers: use torch.all to check for valid min and max values (#43151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43151 Using `torch.all` instead of `torch.sum` and length check. It's unclear whether the increase in perf (~5% for small inputs) is real, but should be a net benefit, especially for larger channel inputs. Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23170426 fbshipit-source-id: ee5c25eb93cee1430661128ac9458a9c525df8e5	2020-08-17 17:08:57 -07:00
Vasiliy Kuznetsov	3264ba065c	observers: use clamp instead of min/max in calculate_qparams (#43150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43150 The current logic was expensive because it created tensors on CUDA. Switching to clamp since it can work without needing to create tensors. Test Plan: benchmarks Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23170427 fbshipit-source-id: 6fe3a728e737aca9f6c2c4d518c6376738577e21	2020-08-17 17:08:54 -07:00
Vasiliy Kuznetsov	a5dfba0a6e	observers: make eps a buffer (#43149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43149 This value doesn't change, making it a buffer to only pay the cost of creating a tensor once. Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23170428 fbshipit-source-id: 6b963951a573efcc5b5a57649c814590b448dd72	2020-08-17 17:08:51 -07:00
Vasiliy Kuznetsov	5aa61afbfb	quant bench: update observer configs (#42956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42956 In preparation for observer perf improvement, cleans up the micro benchmarks: * disable CUDA for histogram observers (it's too slow) * add larger shapes for better representation of real workloads Test Plan: ``` cd benchmarks/operator_benchmark python -m pt.qobserver_test ``` Imported from OSS Reviewed By: supriyar Differential Revision: D23093996 fbshipit-source-id: 5dc477c9bd5490d79d85ff8537270cd25aca221a	2020-08-17 17:07:56 -07:00
Hong Xu	1f6e6a1166	Remove unused variable vecVecStartIdx (#42257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42257 Reviewed By: gchanan Differential Revision: D23109328 Pulled By: ezyang fbshipit-source-id: dacd438395fedd1050ad3ffb81327bbb746c776c	2020-08-17 15:41:07 -07:00
Pritam Damania	133e9f96e1	Use c10 threadpool for GPU to CPU distributed autograd continuations. (#42511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42511 DistEngine currently only has a single thread to execute GPU to CPU continuations as part of the backward pass. This would be a significant performance bottleneck in cases where we have such continuations and would like to execute these using all CPU cores. To alleviate this in this PR, we have the single thread in DistEngine only dequeue work from the global queue, but then hand off execution of that work to the c10 threadpool where we call "execute_graph_task_until_ready_queue_empty". For more context please see: https://github.com/pytorch/pytorch/issues/40255#issuecomment-663298062. ghstack-source-id: 109997718 Test Plan: waitforbuildbot Reviewed By: albanD Differential Revision: D22917579 fbshipit-source-id: c634b6c97f3051f071fd7b994333e6ecb8c54155	2020-08-17 15:04:19 -07:00
Yuxin Wu	825ec18eed	[jit] better error message (#43093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43093 without this it's hard to tell which module is going wrong Test Plan: ``` > TypeError: > 'numpy.int64' object in attribute 'Linear.in_features' is not a valid constant. > Valid constants are: > 1. a nn.ModuleList > 2. a value of type {bool, float, int, str, NoneType, torch.device, torch.layout, torch.dtype} > 3. a list or tuple of (2) ``` Reviewed By: eellison Differential Revision: D23148516 fbshipit-source-id: b86296cdeb7b47c9fd69b5cfa479914c58ef02e6	2020-08-17 14:57:56 -07:00
Ralf Gommers	864f0cfb2d	Fix type annotations for torch.sparse, enable in CI (#43108 ) Summary: Closes gh-42982 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43108 Reviewed By: malfet Differential Revision: D23167560 Pulled By: ezyang fbshipit-source-id: 0d660ca686ada2347bf440c6349551d1539f99ef	2020-08-17 14:40:11 -07:00
Mike Ruberry	6db0b8785d	Adds movedim method, fixes movedim docs, fixes view doc links (#43122 ) Summary: This PR: - Adds a method variant to movedim - Fixes the movedim docs so it will actually appear in the documentation - Fixes three view doc links which were broken Pull Request resolved: https://github.com/pytorch/pytorch/pull/43122 Reviewed By: ngimel Differential Revision: D23166222 Pulled By: mruberry fbshipit-source-id: 14971585072bbc04b5366d4cc146574839e79cdb	2020-08-17 14:24:52 -07:00
Richard Zou	37252e8f00	Implement batching rules for some unary ops (#43059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43059 This PR implements batching rules for some unary ops. In particular, it implements the batching rules for the unary ops that take a single tensor as input (and nothing else). The batching rule for a unary op is: (1) grab the physical tensor straight out of the BatchedTensor (2) call the unary op (3) rewrap the physical tensor in a BatchedTensor Test Plan: - new tests `pytest test/test_vmap.py -v -k "Operators"` Reviewed By: ezyang Differential Revision: D23132277 Pulled By: zou3519 fbshipit-source-id: 24b9d7535338207531d767155cdefd2c373ada77	2020-08-17 13:38:10 -07:00
Richard Zou	768c2a8c25	vmap: fixed to work with functools.partial (#43028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43028 There was a bug where we always tried to grab the `__name__` attribute of the function passed in by the user. Not all Callables have the `__name__` attribute, an example being a Callable produced by functools.partial. This PR modifies the error-checking code to use `repr` if `__name__` is not available. Furthermore, it moves the "get the name of this function" functionality to the actual error sites as an optimization so we don't spend time trying to compute `__repr__` for the Callable if there is no error. Test Plan: - `pytest test/test_vmap.py -v`, added new tests. Reviewed By: yf225 Differential Revision: D23130235 Pulled By: zou3519 fbshipit-source-id: 937f3640cc4d759bf6fa38b600161f5387a54dcf	2020-08-17 13:36:49 -07:00
Eli Uriegas	9c3f579528	.circleci: Copy LLVM from pre-built image (#43038 ) Summary: LLVM builds took a large amount of time and bogged down docker builds in general. Since we build it the same for everything let's just copy it from a pre-built image instead of building it from source every time. Builds are defined in https://github.com/pytorch/builder/pull/491 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/43038 Reviewed By: malfet Differential Revision: D23119513 Pulled By: seemethere fbshipit-source-id: f44324439d45d97065246caad07c848e261a1ab6	2020-08-17 11:04:35 -07:00
Ailing Zhang	7cb8d68ae1	Rename XLAPreAutograd to AutogradXLA. (#43047 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43047 Reviewed By: ezyang Differential Revision: D23134326 Pulled By: ailzhang fbshipit-source-id: 5fcbc23755daa8a28f9b03af6aeb3ea0603b5c9a	2020-08-17 10:47:43 -07:00
Nikita Shulga	034e6727e7	Set default ATen threading backend to native if USE_OPENMP is false (#43067 ) Summary: Since OpenMP is not available on some platforms, or might be disabled by user, set default `ATEN_THREADING` based on USE_OPENMP and USE_TBB options Fixes https://github.com/pytorch/pytorch/issues/43036 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43067 Reviewed By: houseroad Differential Revision: D23138856 Pulled By: malfet fbshipit-source-id: cc8f9ee59a5559baeb3f19bf461abbc08043b71c	2020-08-17 10:33:31 -07:00
anjali411	aab66602c4	Add torch.dot for complex tensors (#42745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42745 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D23056382 Pulled By: anjali411 fbshipit-source-id: c97f15e057095f78069844dbe0299c14104d2fce	2020-08-17 09:05:41 -07:00
Kimish Patel	472f291375	Fix freeze_module pass for sharedtype (#42457 ) Summary: During cleanup phase, calling recordReferencedAttrs would record the attributes which are referenced and hence kept. However, if you have two instances of the same type which are preserved through freezing process, as the added testcase shows, then during recording the attributes which are referenced, we iterate through the type INSTANCES that we have seen so far and record those ones. Thus if we have another instance of the same type, we will just look at the first instance in the list, and record that instances. This PR fixes that by traversing the getattr chains and getting the actual instance of the getattr output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42457 Test Plan: python test/test_jit.py TestFreezing Fixes #{issue number} Reviewed By: gchanan Differential Revision: D23106921 Pulled By: kimishpatel fbshipit-source-id: ffff52876938f8a1fedc69b8b24a3872ea66103b	2020-08-17 08:27:31 -07:00
lixinyu	269fdb5bb2	prepare to split transformer header file (#43069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43069 The transformer c++ impl need to put TransformerEncoderLayer/DecoderLayer and TransformerEncoder/TransformerDecoder in different header since TransformerEncoder/Decoder's options class need TransformerEncoderLayer/DecoderLayer as input parameter. Split header files to avoid cycle includsion. Test Plan: Imported from OSS Reviewed By: yf225 Differential Revision: D23139437 Pulled By: glaringlee fbshipit-source-id: 3c752ed7702ba18a9742e4d47d049e62d2813de0	2020-08-17 07:54:05 -07:00
Ann Shan	248b6a30f4	add training mode to mobile::Module (#42880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42880 Enable switching between and checking for training and eval mode for torch::jit::mobile::Module using train(), eval(), and is_training(), like exists for torch::jit::Module. Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23063006 Pulled By: ann-ss fbshipit-source-id: b79002148c46146b6e961cbef8aaf738bbd53cb2	2020-08-17 00:20:03 -07:00
Mike Ruberry	e2eb0cb1a9	Adds arccosh alias for acosh and adds an alias consistency test (#43107 ) Summary: This adds the torch.arccosh alias and updates alias testing to validate the consistency of the aliased and original operations. The alias testing is also updated to run on CPU and CUDA, which revealed a memory leak when tracing (see https://github.com/pytorch/pytorch/issues/43119). Pull Request resolved: https://github.com/pytorch/pytorch/pull/43107 Reviewed By: ngimel Differential Revision: D23156472 Pulled By: mruberry fbshipit-source-id: 6155fac7954fcc49b95e7c72ed917c85e0eabfcd	2020-08-16 22:12:25 -07:00
Xiaomeng Yang	4ae832e106	Optimize SiLU (Swish) op in PyTorch (#42976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42976 Optimize SiLU (Swish) op in PyTorch. Some benchmark result input = torch.rand(1024, 32768, dtype=torch.float, device="cpu") forward: 221ms -> 133ms backward: 600ms -> 170ms input = torch.rand(1024, 32768, dtype=torch.double, device="cpu") forward: 479ms -> 297ms backward: 1438ms -> 387ms input = torch.rand(8192, 32768, dtype=torch.float, device="cuda") forward: 24.34ms -> 9.83ms backward: 97.05ms -> 29.03ms input = torch.rand(4096, 32768, dtype=torch.double, device="cuda") forward: 44.24ms -> 30.15ms backward: 126.21ms -> 49.68ms Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "SiLU" Reviewed By: houseroad Differential Revision: D23093593 fbshipit-source-id: 1ba7b95d5926c4527216ed211a5ff1cefa3d3bfd	2020-08-16 13:21:57 -07:00
Mike Ruberry	d4c5f561ec	Updates torch.clone documentation to be consistent with other functions (#43098 ) Summary: `torch.clone` exists but was undocumented, and the method incorrectly listed `memory_format` as a positional argument. This: - documents `torch.clone` - lists `memory_format` as a keyword-only argument - wordsmiths the documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/43098 Reviewed By: ngimel Differential Revision: D23153397 Pulled By: mruberry fbshipit-source-id: c2ea781cdcb8b5ad3f04987c2b3a2f1fe0eaf18b	2020-08-16 04:18:49 -07:00
Muthu Arivoli	5bcf9b017a	Implement hstack, vstack, dstack (#42799 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42799 Reviewed By: izdeby Differential Revision: D23140704 Pulled By: mruberry fbshipit-source-id: 6a36363562c50d0abce87021b84b194bb32825fb	2020-08-15 20:39:14 -07:00
Hao Lu	8864148823	[jit] DeepAndWide benchmark (#43096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43096 Add benchmark script for deep and wide model. Reviewed By: bwasti, yinghai Differential Revision: D23099925 fbshipit-source-id: aef09d8606eba1eccc0ed674dfea59b890d3648b	2020-08-15 01:27:12 -07:00
Elias Ellison	91f3114fc1	[JIT] Represent profiled types as a node attribute (#43035 ) Summary: This changes profiled types from being represented as: `%23 : Float(4:256, 256:1, requires_grad=0, device=cpu) = prim::profile(%0)` -> `%23 : Tensor = prim::profile[profiled_type=Float(4:256, 256:1, requires_grad=0, device=cpu)](%0)` Previously, by representing the profiled type in the IR directly it was very easy for optimizations to accidentally use profiled types without inserting the proper guards that would ensure that the specialized type would be seen. It would be a nice follow up to extend this to prim::Guard as well, however we have short term plans to get rid of prim::Guard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43035 Reviewed By: ZolotukhinM Differential Revision: D23120226 Pulled By: eellison fbshipit-source-id: c78d7904edf314dd65d1a343f2c3a947cb721b32	2020-08-14 20:17:46 -07:00
Rohan Varma	19902f6c0e	Document unavailable reduction ops with NCCL backend (#42822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42822 These ops arent supported with NCCL backend and used to silently error. We disabled them as part of addressing https://github.com/pytorch/pytorch/issues/41362, so document that here. ghstack-source-id: 109957761 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23023046 fbshipit-source-id: 45d69028012e0b6590c827d54b35c66cd17e7270	2020-08-14 19:08:28 -07:00
Shen Li	06aaf8c20d	Add set_device_map to TensorPipeOptions to support GPU args (#42637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42637 This commit enables sending non-CPU tensors through RPC using TensorPipe backend. Users can configure device mappings by calling set_map_location on `TensorPipeRpcBackendOptions`. Internally, the `init_rpc` API verifies the correctness of device mappings. It will shutdown RPC if the check failed, or proceed and pass global mappings to `TensorPipeAgent` if the check was successful. For serde, we added a device indices field to TensorPipe read and write buffers, which should be either empty (all tensors must be on CPU) or match the tensors in order and number in the RPC message. This commit does not yet avoid zero-copy, the tensor is always moved to CPU on the sender and then moved to the specified device on the receiver. Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D23011572 Pulled By: mrshenli fbshipit-source-id: 62b617eed91237d4e9926bc8551db78b822a1187	2020-08-14 18:46:55 -07:00
Ralf Gommers	c84f78470b	Fix type annotations for a number of torch.utils submodules (#42711 ) Summary: Related issue on `torch.utils` type annotation hiccups: gh-41794 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42711 Reviewed By: mrshenli Differential Revision: D23005434 Pulled By: malfet fbshipit-source-id: 151554b1e7582743f032476aeccdfdad7a252095	2020-08-14 18:12:48 -07:00
Nikita Shulga	bcf54f9438	Stop treating ASAN as special case (#43048 ) Summary: Add "asan" node to a `CONFIG_TREE_DATA` rather than hardcoded that non-xla clang-5 is ASAN Pull Request resolved: https://github.com/pytorch/pytorch/pull/43048 Reviewed By: houseroad Differential Revision: D23126296 Pulled By: malfet fbshipit-source-id: 22f02067bb2f5435a0e963a6c722b9c115ccfea4	2020-08-14 17:24:05 -07:00
Nikita Shulga	0cf4a5bccb	Add GCC codecoverage flags (#43066 ) Summary: Rename `CLANG_CODE_COVERAGE` option to `CODE_COVERAGE` and add compiler specific flags for GCC and Clang Pull Request resolved: https://github.com/pytorch/pytorch/pull/43066 Reviewed By: scintiller Differential Revision: D23137488 Pulled By: malfet fbshipit-source-id: a89570469692f878d84f7da6f9d5dc01df423e80	2020-08-14 17:16:18 -07:00
ita	91b090ceaf	Add polygamma where n >= 2 (#42499 ) Summary: https://github.com/pytorch/pytorch/issues/40980 I have a few questions during implementing Polygamma function... so, I made PR prior to complete it. 1. some code blocks brought from cephes library(and I did too) ``` /* * The following function comes with the following copyright notice. * It has been released under the BSD license. * * Cephes Math Library Release 2.8: June, 2000 * Copyright 1984, 1987, 1992, 2000 by Stephen L. Moshier */ ``` is it okay for me to use cephes code with this same copyright notice(already in the Pytorch codebases) 2. There is no linting in internal Aten library. (as far as I know, I read https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md) How do I'm sure my code will follow appropriate guidelines of this library..? 3. Actually, there's a digamma, trigamma function already digamma is needed, however, trigamma function becomes redundant if polygamma function is added. it is okay for trigamma to be there or should be removed? btw, CPU version works fine with 3-rd order polygamma(it's what we need to play with variational inference with beta/gamma distribution) now and I'm going to finish GPU version soon. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42499 Reviewed By: gchanan Differential Revision: D23110016 Pulled By: albanD fbshipit-source-id: 246f4c2b755a99d9e18a15fcd1a24e3df5e0b53e	2020-08-14 17:00:24 -07:00
Zachary DeVito	4011685a8b	[fx] split Node into Node/Proxy (#42991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42991 Have Node both be a record of the operator in the graph, and the way we _build_ the graph made it difficult to keep the IR datastructure separate from the proxying logic in the build. Among other issues this means that typos when using nodes would add things to the graph: ``` for node in graph.nodes: node.grph # does not error, returns an node.Attribute object! ``` This separates the builder into a Proxy object. Graph/Node no longer need to understand `delegate` objects since they are now just pure IR. This separates the `symbolic_trace` (proxy.py/symbolic_trace.py) from the IR (node.py, graph.py). This also allows us to add `create_arg` to the delegate object, allowing the customization of how aggregate arguments are handled when converting to a graph. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23099786 Pulled By: zdevito fbshipit-source-id: 6f207a8c237e5eb2f326b63b0d702c3ebcb254e4	2020-08-14 16:45:21 -07:00
Mikhail Zolotukhin	a1a6e1bc91	Fix warning: dynamic initialization in unreachable code. (#43065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43065 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D23136883 Pulled By: ZolotukhinM fbshipit-source-id: 878f6af13ff8df63fef5f34228f7667ee452dd95	2020-08-14 16:08:32 -07:00
Supriya Rao	66b3382c5b	[quant] Add torchbind support for embedding_bag packed weights (#42881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42881 This enables serialization/de-serialization of embedding packed params using getstate/setstate calls. Added version number to deal with changes to serialization formats in future. This can be extended in the future to support 4-bit/2-bit once we add support for that. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23070634 fbshipit-source-id: 2ca322ab998184c728be6836f9fd12cec98b2660	2020-08-14 16:05:27 -07:00
Supriya Rao	7632a9b090	[quant] Add embeddingbag_prepack function that works on quantized tensor. (#42762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42762 Use a prepack function that accepts qtensor as an input. The output is a byte tensor with packed data. This is currently implemented only for 8-bit. In the future once we add 4-bit support this function will be extended to support that too. Note -In the following change I will add TorchBind support for this to support serialization of packed weights. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Imported from OSS Reviewed By: vkuzo Differential Revision: D23070632 fbshipit-source-id: 502aa1302dffec1298cdf52832c9e2e5b69e44a8	2020-08-14 16:02:57 -07:00
aviloria	450315198a	Fix a casting warning (#42451 ) Summary: Fix an annoying casting warning Pull Request resolved: https://github.com/pytorch/pytorch/pull/42451 Reviewed By: yf225 Differential Revision: D22993194 Pulled By: ailzhang fbshipit-source-id: f317a212d4e768d49d24f50aeff9c003be2fd30a	2020-08-14 15:47:02 -07:00
Heitor Schueroff de Souza	3d8c144400	Implemented torch::nn::Unflatten in libtorch (#42613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42613 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D23030302 Pulled By: heitorschueroff fbshipit-source-id: 954f1cdfcbd3a62a7f0e887fcf5995ef27222a87	2020-08-14 15:32:13 -07:00
Venkata Chintapalli	33c5fe3c1d	Enable test_logit FakeLowP test. (#43073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43073 Enable test_logit FakeLowP test. Test Plan: test_op_nnpi_fp16.py Reviewed By: hyuen Differential Revision: D23141375 fbshipit-source-id: cb7e7879487e33908b14ef401e1ab05fda193d28	2020-08-14 14:49:29 -07:00
Edson Romero	5014cf4a4d	Export MergeIdLists Caffe2 Operator to PyTorch Summary: As titled. Test Plan: buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_merge_id_lists Reviewed By: yf225 Differential Revision: D23076951 fbshipit-source-id: c37dfd93003590eed70b0d46e0151397a402dde6	2020-08-14 14:46:17 -07:00
Hector Yuen	c8e789e06e	add fake fp16 fusions to net transforms (#42927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42927 added fp16 fusion to net transforms refactored the transforms as well as glow_transform to get out of opt/custom so that the OSS builds passed Test Plan: added net runner tests for this Reviewed By: yinghai Differential Revision: D23080881 fbshipit-source-id: ee6451811fedfd07c6560c178229854bca29301f	2020-08-14 13:30:27 -07:00
Nikita Shulga	1c6ace87d1	Embed torch.nn typing annotations (#43044 ) Summary: Delete several .pyi files and embed annotations from those files in respective .py Pull Request resolved: https://github.com/pytorch/pytorch/pull/43044 Reviewed By: ezyang Differential Revision: D23123234 Pulled By: malfet fbshipit-source-id: 4ba361cc84402352090523924b0035e100ba48b1	2020-08-14 13:24:58 -07:00
Meghan Lele	fcc10d75e1	[JIT] Add property support to TorchScript classes (#42389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42389 Summary This commit adds support for properties to TorchScript classes, specifically for getters and setters. They are implemented essentially as pointers to the methods that the corresponding decorators decorate, which are treated like regular class methods. Deleters for properties are considered to be out of scope (and probably useless for TorchScript anyway). Test Plan This commit adds a unit test for a class with a property that has both getter and setter and one that has only a getter. `python test/test_jit.py TestClassType.test_properties` Test Plan: Imported from OSS Reviewed By: eellison, ppwwyyxx Differential Revision: D22880232 Pulled By: SplitInfinity fbshipit-source-id: 4828640f4234cb3b0d4f3da4872a75fbf519e5b0	2020-08-14 12:56:57 -07:00
Nikita Shulga	64a7684219	Enable typechecking of `collect_env.py` during CI (#43062 ) Summary: No type annotations can be added to the script, as it still have to be Python-2 compliant. Make changes to avoid variable type redefinition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43062 Reviewed By: zou3519 Differential Revision: D23132991 Pulled By: malfet fbshipit-source-id: 360c02e564398f555273e5889a99f834a5467059	2020-08-14 12:46:42 -07:00
albanD	1f6d0985d7	fix searchsorted output type (#42933 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41389 Make sure searchsorted that returns integer type does not make them require gradients. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42933 Reviewed By: gchanan Differential Revision: D23109583 Pulled By: albanD fbshipit-source-id: 5af300b2f7f3c140d39fd7f7d87799f7b93a79c1	2020-08-14 12:34:51 -07:00
mattip	059aa34b12	Clip Binomial results for different endpoints in curand_uniform (#42702 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42153 As [documented](https://docs.nvidia.com/cuda/curand/device-api-overview.html) (search for `curand_uniform` on the page), `curand_uniform` returns "from 0.0 to 1.0, where 1.0 is included and 0.0 is excluded." These endpoints are different than the CPU equivalent, and makes the calculation in the PR fail when the value is 1.0. The test from the issue is added, it failed for me consistently before the PR even though I cut the number of samples by 10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42702 Reviewed By: gchanan Differential Revision: D23107451 Pulled By: ngimel fbshipit-source-id: 3575d5b8cd5668e74b5edbecd95154b51aa485a1	2020-08-14 12:01:17 -07:00
Ralf Gommers	71bbd5f1d4	Add back Tensor.nonzero type annotation (#43053 ) Summary: Closes gh-42998 The issue is marked for 1.6.1, if there's anything I need to do for a backport please tell me what that is. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43053 Reviewed By: izdeby Differential Revision: D23131708 Pulled By: malfet fbshipit-source-id: 2744bacce6bdf6ae463c17411b672f09707e0887	2020-08-14 11:41:19 -07:00
Keigo Kawamura	75dfa5a459	Remove `itruediv` because it's already defined in torch/tensor.py (#42962 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42955 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42962 Reviewed By: mruberry Differential Revision: D23111523 Pulled By: malfet fbshipit-source-id: ecab7a4aae1fe556753b8d6528cae1ae201beff3	2020-08-14 11:36:23 -07:00
Adam Thompson	1c616c5ab7	Add complex tensor dtypes for the __cuda_array_interface__ spec (#42918 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42860 The `__cuda_array_interface__` tensor specification is missing the appropriate datatypes for the newly merged complex64 and complex128 tensors. This PR addresses this issue by casting: * `torch.complex64` to 'c8' * `torch.complex128` to 'c16' Pull Request resolved: https://github.com/pytorch/pytorch/pull/42918 Reviewed By: izdeby Differential Revision: D23130219 Pulled By: anjali411 fbshipit-source-id: 5f8ee8446a71cad2f28811afdeae3a263a31ad11	2020-08-14 10:26:23 -07:00
Hong Xu	c3fb152274	Test the type promotion between every two dtypes thoroughly (#42585 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41842 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42585 Reviewed By: izdeby Differential Revision: D23126759 Pulled By: mruberry fbshipit-source-id: 8337e02f23a4136c2ba28c368f8bdbd28400de44	2020-08-14 10:05:10 -07:00
Mirali Ahmadli	ff6a2b0b7a	Add inplace option for torch.nn.Hardsigmoid and torch.nn.Hardswish layers (#42346 ) Summary: `torch.nn.Hardsigmoid` and `torch.nn.Hardswish` classes currently do not support `inplace` operations as it uses `torch.nn.functional.hardsigmoid` and `torch.nn.functional.hardswish` functions with their default inplace argument which is `False`. So, I added `inplace` argument for `torch.nn.Hardsigmoid` and `torch.nn.Hardswish` classes so that forward operation can be done inplace as well while using these layers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42346 Reviewed By: izdeby Differential Revision: D23108487 Pulled By: albanD fbshipit-source-id: 0767334fa10e5ecc06fada2d6469f3ee1cacd957	2020-08-14 10:01:31 -07:00
Nikita Shulga	2f9fd8ad29	Build test_e2e_tensorpipe only if Gloo is enabled (#43041 ) Summary: test_e2e_tensorpipe depends on ProcessGroupGloo, therefore it could not be tested with Gloo disabled Otherwise, it re-introduces https://github.com/pytorch/pytorch/issues/42776 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43041 Reviewed By: lw Differential Revision: D23122101 Pulled By: malfet fbshipit-source-id: a8a088b6522a3bc888238ede5c2d589b83c6ea94	2020-08-14 09:24:47 -07:00
David Reiss	31788ae151	Trim trailing whitespace Test Plan: CI Reviewed By: linbinyu Differential Revision: D23108919 fbshipit-source-id: 913c982351a94080944f350641d7966c6c2cc508	2020-08-14 09:18:40 -07:00
Dongxin Liu	a2b86d95d1	Make Mish support large inputs. (#43037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43037 In the previous version of mish_op.cc, the output would be 'nan' for large inputs. We re-write mish_op.cc to solve this problem. Test Plan: Unit test buck test //dper3/dper3/modules/tests:core_modules_test -- test_linear_compress_embedding_with_attention_with_activation_mish {F284052906} buck test mode/opt //dper3/dper3_models/ads_ranking/tests:model_paradigm_e2e_tests -- test_sparse_nn_with_mish {F284224158} ## Workflow f212113434 {F285281318} Differential Revision: D23102644 fbshipit-source-id: 98f1ea82f8c8e05b655047b4520c600fc1a826f4	2020-08-14 08:53:16 -07:00
vfdev	c7d2774d20	Fix typo in collect_env.py (#43050 ) Summary: Minor typo fix introduced in yesterdays PR: https://github.com/pytorch/pytorch/pull/42961 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43050 Reviewed By: ezyang, malfet Differential Revision: D23130936 Pulled By: zou3519 fbshipit-source-id: e8fa2bf155ab6a5988c74e8345278d8d70855894	2020-08-14 08:33:35 -07:00
Facebook Community Bot	d60d6d0d7b	Automated submodule update: FBGEMM (#42834 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `29d5eb9f3c` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42834 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D23040145 fbshipit-source-id: 1d7209ea1910419b7837703122b8a4c76380ca4a	2020-08-14 05:43:20 -07:00
Luca Wehrstedt	ed242cbec5	Guard TensorPipe agent by USE_TENSORPIPE (#42682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42682 ghstack-source-id: 109834351 Test Plan: CI Reviewed By: malfet Differential Revision: D22978717 fbshipit-source-id: 18b7cbdb532e78ff9259e82f0f92ad279124419d	2020-08-14 02:57:36 -07:00
taivu	ccd9f3244b	Get, save, and load module information for each operator (#42133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42133 Test Plan: We save a module with module debugging information as follows. ``` import torch m = torch.jit.load('./detect.pt') # Save module without debug info m._save_for_lite_interpreter('./detect.bc') # Save module with debug info m._save_for_lite_interpreter('./detect.bc', _save_debug_info_in_bytecode=True) ``` Size of the file without module debugging information: 4.508 MB Size of the file with module debugging information: 4.512 MB Reviewed By: kimishpatel Differential Revision: D22803740 Pulled By: taivu1998 fbshipit-source-id: c82ea62498fde36a1cfc5b073e2cea510d3b7edb	2020-08-14 01:25:27 -07:00
Ren Chen	e182ec97b3	Fix illegal memory acess issue for CUDA versionn of SplitByLengths operator. Summary: 1. Fix illegal memory access issue for SplitByLengths operator in the CUDA context. 2. Add support to scaling lengths vector for SplitByLengths operator. 3. Add support to test SplitByLengths operator in the CUDA context. Example for SplitByLengths operator processing scaling lengths vector: value vector A = [1, 2, 3, 4, 5, 6] length vector B = [1, 2] after execution of SplitByLengths operator, the output should be [1,2] and [3,4,5,6] Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:concat_split_op_test Reviewed By: kennyhorror Differential Revision: D23079841 fbshipit-source-id: 3700e7f2ee0a5a2791850071fdc16e5b054f8400	2020-08-14 01:04:08 -07:00
Muthu Arivoli	b8102b1550	Implement torch.nextafter (#42580 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42580 Reviewed By: smessmer Differential Revision: D23012260 Pulled By: mruberry fbshipit-source-id: ce82a63c4ad407ec6ffea795f575ca7c58cd6137	2020-08-14 00:35:30 -07:00
Will Gan	e4373083a2	torch.complex and torch.polar (#39617 ) Summary: For https://github.com/pytorch/pytorch/issues/35312 and https://github.com/pytorch/pytorch/issues/38458#issuecomment-636066256. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39617 Reviewed By: zhangguanheng66 Differential Revision: D23083926 Pulled By: anjali411 fbshipit-source-id: 1874378001efe2ff286096eaf1e92afe91c55b29	2020-08-14 00:30:11 -07:00
Mikhail Zolotukhin	b9a105bcc0	[TensorExpr] Cleanup logic in the TensorExpr fuser pass. (#42938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42938 1. Structure the logic in a more straight-forward way: instead of magic tricks with node iterators in a block we now have a function that tries to create a fusion group starting from a given node (and pull everything it can into it). 2. The order in which we're pulling nodes into a fusion group is now more apparent. 3. The new pass structure automatically allows us to support fusion groups of size=1. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23084409 Pulled By: ZolotukhinM fbshipit-source-id: d59fc00c06af39a8e1345a4aed8d829494db084c	2020-08-13 23:49:42 -07:00
Mikhail Zolotukhin	fc304bec9f	[TensorExpr] Remove redundant checks from canHandle in TE fuser. (#42937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42937 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23084408 Pulled By: ZolotukhinM fbshipit-source-id: 8e562e25ecc73b4e7b01e30f8b282945b96b4871	2020-08-13 23:49:40 -07:00
Mikhail Zolotukhin	48c183af3d	[TensorExpr] Wrap fuser in a class. (#42936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42936 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23084407 Pulled By: ZolotukhinM fbshipit-source-id: f622874efbcbf8d4e49c8fa519a066161ebe4877	2020-08-13 23:48:16 -07:00
taivu	02c8ad70f2	Reconstruct scopes (#41615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41615 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D22611331 Pulled By: taivu1998 fbshipit-source-id: d4ed4cf6360bc1f72ac9fa24bb4fcf6b7d9e7576	2020-08-13 22:38:16 -07:00
Nikita Shulga	3dc845319f	Add more verbose error message about PackedSequence lengths argument (#42891 ) Summary: Add given tensor dimentionality, device and dtype to the error message Pull Request resolved: https://github.com/pytorch/pytorch/pull/42891 Reviewed By: ezyang Differential Revision: D23068769 Pulled By: malfet fbshipit-source-id: e49d0a5d0c10918795c1770b4f4e02494d799c51	2020-08-13 22:33:34 -07:00
Paul Shao	b992a927a9	Clearer Semantics and Naming for Customized Quantization Range Initialization in Observer (#42602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42602 In this diff, clearer semantics and namings for are introduced by splitting the original `init_dynamic_qrange` into 2 separate `Optional[int]` types `qmin` and `qmax` to avoid the confusion of the parameters with dynamic quantization. The `qmin` and `qmax` parameters allow customers to specify their own customary quantization range and enables specific use cases for lower bit quantization. Test Plan: To assert the correctness and compatibility of the changes with existing observers, on a devvm, execute the following command to run the unit tests: `buck test //caffe2/test:quantization -- observer` Reviewed By: vkuzo, raghuramank100 Differential Revision: D22948334 fbshipit-source-id: 275bc8c9b5db4ba76fc2e79ed938376ea4f5a37c	2020-08-13 21:15:23 -07:00
Jerry Zhang	a55b7e2a6d	[reland][quant][fix] Remove activation_post_process in qat modules (#42343 ) (#43015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43015 Currently activation_post_process are inserted by default in qat modules, which is not friendly to automatic quantization tools, this PR removes them. Test Plan: Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D23105059 fbshipit-source-id: 3439ac39e718ffb0390468163bcbffd384802b57	2020-08-13 20:44:14 -07:00
Jiakai Liu	8cf01c5c35	Back out "change pt_defs.bzl to python file" Summary: Original commit changeset: d720fe2e684d Test Plan: CIs Reviewed By: linbinyu Differential Revision: D23114839 fbshipit-source-id: fda570b5e989a51936a6c5bc68f0e60c6f6b4b82	2020-08-13 20:33:12 -07:00
Vinod Kumar S	830423b80b	Python/C++ API Parity: TransformerDecoderLayer (#42717 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42717 Reviewed By: zhangguanheng66 Differential Revision: D23095841 Pulled By: glaringlee fbshipit-source-id: 327a5a23c9a3cca05e422666a6d7d802a7e8c468	2020-08-13 20:31:13 -07:00
Jerry Zhang	85752b989d	[quant][doc] Print more info for fake quantize module (#43031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43031 fixes: https://github.com/pytorch/pytorch/issues/43023 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23116200 fbshipit-source-id: faa90ce8711da0785d635aacd0362c45717cfacc	2020-08-13 20:27:36 -07:00
Bram Wasti	523b2ce9c6	[jit][static runtime] Simplify the graph and add operator whitelist (#43024 ) Summary: This PR whitelists and simplifies graphs to help with development later on. Key to note in this PR is the use of both a pattern substitution and the registration of custom operators. This will likely be one of the main optimization types done in this folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43024 Reviewed By: hlu1 Differential Revision: D23114262 Pulled By: bwasti fbshipit-source-id: e25aa3564dcc8a2b48cfd1561b3ee2a4780ae462	2020-08-13 20:19:55 -07:00
Pritam Damania	89b0b3bc8c	Allow RPC to be initialized again after shutdown. (#42723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42723 This PR is addressing https://github.com/pytorch/pytorch/issues/39340 and allows users to initialize RPC again after shutdown. Major changes in the PR include: 1. Change to DistAutogradContainer to support this. 2. Ensure PythonRpcHandler is reinitialized appropriately. 3. Use PrefixStore in RPC initialization to ensure each new `init_rpc` uses a different prefix. ghstack-source-id: 109805368 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D22993909 fbshipit-source-id: 9f1c1e0a58b58b97125f41090601e967f96f70c6	2020-08-13 20:18:34 -07:00
Anthony Scopatz	21823aa680	Nightly checkout tool (#42635 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40829 This is cross-platform but I have only tried it on linux, personally. Also, I am not fully certain of the usage pattern, so if there are any additional features / adjustments / tests that you want me to add, please just let me know! CC ezyang rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/42635 Reviewed By: zhangguanheng66 Differential Revision: D23078663 Pulled By: ezyang fbshipit-source-id: 5c8c8abebd1d462409c22dc4301afcd8080922bb	2020-08-13 20:07:18 -07:00
Pritam Damania	a6b69fdd33	Add DDP+RPC tutorial to RPC docs page. (#42828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42828 ghstack-source-id: 109855425 Test Plan: waitforbuildbot Reviewed By: jlin27 Differential Revision: D23037016 fbshipit-source-id: 250f322b652b86257839943309b8f0b8ce1bb25b	2020-08-13 19:41:06 -07:00
Hector Yuen	3544f60f76	make deadline=None for all numerics tests (#43014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43014 changing this behavior mimics the behavior of the hold hypothesis testing library Test Plan: ran all tests on devserver Reviewed By: hl475 Differential Revision: D23085949 fbshipit-source-id: 433fdfbb04b6a609b738eb7c319365049a49579b	2020-08-13 16:48:31 -07:00
Paul Shao	8b5642a786	Fix to Learnable Fake Quantization Op Benchmarking (#43018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43018 In this diff, a fix is added where the original non-learnable fake quantize is provided with trainable scale and zero point, whereas the requires_grad for both parameters should be completely disabled. Test Plan: Use the following command to execute the benchmark test: `buck test mode/dev-nosan pt:quantization_test` Reviewed By: vkuzo Differential Revision: D23107846 fbshipit-source-id: d2213983295f69121e9e6ae37c84d1f37d78ef39	2020-08-13 16:32:13 -07:00
Nikita Shulga	6753157c5a	Enable torch.utils typechecks (#42960 ) Summary: Fix typos in torch.utils/_benchmark/README.md Add empty __init__.py to examples folder to make example invocations from README.md correct Fixed uniform distribution logic generation when mixval and maxval are None Fixes https://github.com/pytorch/pytorch/issues/42984 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42960 Reviewed By: seemethere Differential Revision: D23095399 Pulled By: malfet fbshipit-source-id: 0546ce7299b157d9a1f8634340024b10c4b7e7de	2020-08-13 15:24:56 -07:00
Bert Maher	eb47940c0a	Add executor and fuser options to the fastrnn test fixture (#42946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42946 There are 3 options for the executor and fuser and some of them aren't super interesting so I've combined the options into a single parameter, but made it fairly easy to expand the set if there are other configs we might care about. Test Plan: Benchmark it Imported from OSS Reviewed By: zheng-xq Differential Revision: D23090177 fbshipit-source-id: bd93a93c3fc64e5a4a847d1ce7f42ce0600a586e	2020-08-13 12:45:37 -07:00
Nikita Shulga	fd5ed4b6d6	Update ort-nightly version to dev202008122 (#43019 ) Summary: Fixes caffe2_onnx_ort1_py3_6_clang7_ubuntu16_04 test failures Pull Request resolved: https://github.com/pytorch/pytorch/pull/43019 Reviewed By: gchanan Differential Revision: D23108767 Pulled By: malfet fbshipit-source-id: 0131cf4ac0bf93d3d93cb0c97a888f1524e87472	2020-08-13 11:40:16 -07:00
Supriya Rao	816d37b1d8	[quant] Make PerChannel Observer work with float qparams (#42690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42690 Add implementation for new qscheme per_channel_affine_float_qparams in observer Test Plan: python test/test_quantization.py TestObserver.test_per_channel_observers Imported from OSS Reviewed By: vkuzo Differential Revision: D23070633 fbshipit-source-id: 84d348b0ad91e9214770131a72f7adfd3970349c	2020-08-13 11:22:19 -07:00
Supriya Rao	6f8446840e	[quant] Create PerRowQuantizer for floating point scale and zero_point (#42612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42612 Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Imported from OSS Reviewed By: jerryzh168 Differential Revision: D22960142 fbshipit-source-id: ca9ab6c5b45115d3dcb1c4358897093594313706	2020-08-13 11:20:53 -07:00
Nikita Shulga	0ff51accd8	collect_env.py: Print CPU architecture after Linux OS name (#42961 ) Summary: Missed this case in https://github.com/pytorch/pytorch/pull/42887 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42961 Reviewed By: zou3519 Differential Revision: D23095264 Pulled By: malfet fbshipit-source-id: ff1fb0eba9ecd29bfa3d8f5e4c3dcbcb11deefcb	2020-08-13 10:49:15 -07:00
Nikita Shulga	ebc7ebc74e	Do not ignore `torch/__init__.pyi` (#42958 ) Summary: Delete abovementioned from .gitignore as the file is gone since https://github.com/pytorch/pytorch/issues/42908 and no longer should be autogenerated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42958 Reviewed By: seemethere Differential Revision: D23094391 Pulled By: malfet fbshipit-source-id: af303477301ae89d6f283e34d7aeddeda7a9260f	2020-08-13 10:29:58 -07:00
Nick Gibson	6fb5ce5569	[NNC] Fix some bugs in Round+Mod simplification (#42934 ) Summary: When working on the Cuda Codegen, I found that running the IRSimplifier before generating code lead to test fails. This was due to a bug in Round+Mod simplification (e.g. (x / y * y) + (x % y) => x) to do with the order in which the terms appeared. After fixing it and writing a few tests around those cases, I found another bug in simplification of the same pattern and have fixed it (with some more test coverage). Pull Request resolved: https://github.com/pytorch/pytorch/pull/42934 Reviewed By: zhangguanheng66 Differential Revision: D23085548 Pulled By: nickgg fbshipit-source-id: e780967dcaa7a5fda9f6d7d19a6b7e7b4e94374b	2020-08-13 09:47:21 -07:00
albanD	f03f9ad621	update clone doc (#42931 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42931 Reviewed By: zhangguanheng66 Differential Revision: D23083000 Pulled By: albanD fbshipit-source-id: d76d90476ca294763f204c185a62ff6484381c67	2020-08-13 08:45:46 -07:00
Bram Wasti	ba9025bc1a	[tensorexpr] Autograd for testing (#42548 ) Summary: A simple differentiable abstraction to allow testing of full training graphs. Included in this 1st PR is an example of trivial differentiation. If approved, I can add a full MLP and demonstrate convergence using purely NNC (for performance testing) in the next PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42548 Reviewed By: ZolotukhinM Differential Revision: D23057920 Pulled By: bwasti fbshipit-source-id: 4a239852c5479bf6bd20094c6c35f066a81a832e	2020-08-13 07:58:06 -07:00
Richard Zou	607e49cc83	Revert D22856816: [quant][fix] Remove activation_post_process in qat modules Test Plan: revert-hammer Differential Revision: D22856816 (`8cb42fce17`) Original commit changeset: 988a43bce46a fbshipit-source-id: eff5b9abdfc15b21c02c61eefbda38d349173436	2020-08-13 07:22:20 -07:00
Luca Wehrstedt	8493b0d5d6	Enroll TensorPipe agent in C++-only E2E test (#42680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42680 ghstack-source-id: 109544678 Test Plan: CI Reviewed By: mrshenli Differential Revision: D22978714 fbshipit-source-id: 04d6d190c240c6ead9bd9f3b7f3a5f964d7451e8	2020-08-13 07:07:30 -07:00
Luca Wehrstedt	c88d3a5e76	Remove Python dependency from TensorPipe RPC agent (#42678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42678 ghstack-source-id: 109544679 Test Plan: CI Reviewed By: mrshenli Differential Revision: D22978716 fbshipit-source-id: 31f91d35e9538375b047184cf4a735e4b8809a15	2020-08-13 07:06:10 -07:00
generatedunixname89002005287564	d39cb84f1f	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D23102075 fbshipit-source-id: afb89e061bb9c290df7cf4c58157fc8d67fe78ad	2020-08-13 05:14:21 -07:00
Supriya Rao	c9dcc833bc	[quant][pyper] Make offsets an optional paramter in the qembedding_bag op (#42924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42924 offsets is an optional paramter in the python module currently. So we update the operator to follow suit in order to avoid bad optional access Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Imported from OSS Reviewed By: radkris-git Differential Revision: D23081152 fbshipit-source-id: 847b58f826f5a18e8d4978fc4afc6f3a96dc4230	2020-08-12 20:25:44 -07:00
Jerry Zhang	8cb42fce17	[quant][fix] Remove activation_post_process in qat modules (#42343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42343 Currently activation_post_process are inserted by default in qat modules, which is not friendly to automatic quantization tools, this PR removes them. Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D22856816 fbshipit-source-id: 988a43bce46a992b38fd0d469929f89e5b046131	2020-08-12 20:14:23 -07:00
Sebastian Messmer	7a7424bf91	Remove impl_unboxedOnlyKernel (#42841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42841 There is nothing using those APIs anymore. While we still have ops that require an unboxedOnly implementation (i.e. that aren't c10-full yet), those are all already migrated to the new op registration API and use `.impl_UNBOXED()`. ghstack-source-id: 109693705 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D23045335 fbshipit-source-id: d8e15cea1888262135e0d1d94c515d8a01bddc45	2020-08-12 17:35:09 -07:00
Sebastian Messmer	20e0e54dbe	Allow Tensor& in the unboxing logic (#42712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42712 Previously, operators taking Tensor& as arguments or returning it couldn't be c10-full because the unboxing logic didn't support it. This adds temporary support for that. We're planning to remove this again later, but for now we need it to make those ops c10-full. See https://docs.google.com/document/d/19thMVO10yMZA_dQRoB7H9nTPw_ldLjUADGjpvDmH0TQ for the full plan. This PR also makes some ops c10-full that now can be. ghstack-source-id: 109693706 Test Plan: unit tests Reviewed By: bhosmer Differential Revision: D22989242 fbshipit-source-id: 1bd97e5fa2b90b0860784da4eb772660ca2db5a3	2020-08-12 17:33:23 -07:00
Keigo Kawamura	5d2e9b6ed9	Add missing type annotation for Tensor.ndim (#42909 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42908 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42909 Reviewed By: zhangguanheng66 Differential Revision: D23090364 Pulled By: malfet fbshipit-source-id: 44457fddc86f6abde635aa671e7611b405780ab9	2020-08-12 17:14:20 -07:00
Bert Maher	b8ae563ce6	Add a microbenchmark for LSTM elementwise portion (#42901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42901 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23079714 Pulled By: bertmaher fbshipit-source-id: 28f8c3b5019ee898e82e64a0a674da1b4736d252	2020-08-12 17:11:47 -07:00
Bert Maher	33d209b5f4	Fix TE microbenchmark harness to use appropriate fuser/executor (#42900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42900 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23079715 Pulled By: bertmaher fbshipit-source-id: 6aa2b08a550835b7737e355960a16a7ca83878ea	2020-08-12 17:11:44 -07:00
Bert Maher	1adeed2720	Speed up CUDA kernel launch when block/thread extents are statically known (#42899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42899 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23078708 Pulled By: bertmaher fbshipit-source-id: 237404b47a31672d7145d70996868a3b9b97924e	2020-08-12 17:10:30 -07:00
Natalia Gimelshein	f373cda021	Revert D22994446: [pytorch][PR] CUDA reduction: allow outputs to have different strides Test Plan: revert-hammer Differential Revision: D22994446 (`7f3f5020e6`) Original commit changeset: cc60beebad2e fbshipit-source-id: f4635deac386db0c161f910760cace09f15a1ff9	2020-08-12 17:05:04 -07:00
Daniel van Strien	86841f5f61	Update cuda init docstring to improve clarity (#42923 ) Summary: A small clarity improvement to the cuda init docstring Pull Request resolved: https://github.com/pytorch/pytorch/pull/42923 Reviewed By: zhangguanheng66 Differential Revision: D23080693 Pulled By: mrshenli fbshipit-source-id: aad5ed9276af3b872c1def76c6175ee30104ccb2	2020-08-12 15:41:28 -07:00
James Reed	0134deda0f	[FX] Add interface to reject nodes (#42865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42865 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23056584 Pulled By: jamesr66a fbshipit-source-id: 02db08165ab41be5f3c4b5ff253cbb444eb9a7b8	2020-08-12 14:30:06 -07:00
Muthu Arivoli	92885ebe16	Implement hypot (#42291 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349 Closes https://github.com/pytorch/pytorch/issues/22764 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42291 Reviewed By: malfet Differential Revision: D22951859 Pulled By: mruberry fbshipit-source-id: d0118f2b6437e5c3f775f699ec46e946a8da50f0	2020-08-12 13:18:26 -07:00
Heitor Schueroff de Souza	62bd2ddec7	Implemented non-named version of unflatten (#42563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42563 Moved logic for non-named unflatten from python nn module to aten/native to be reused by the nn module later. Fixed some inconsistencies with doc and code logic. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23030301 Pulled By: heitorschueroff fbshipit-source-id: 7c804ed0baa5fca960a990211b8994b3efa7c415	2020-08-12 13:14:28 -07:00
Xiang Gao	7f3f5020e6	CUDA reduction: allow outputs to have different strides (#42649 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42364 Benchmark: https://github.com/zasdfgbnm/things/blob/master/2020Q3/min-benchmark.ipynb ```python import torch print(torch.__version__) print() for i in range(100): torch.randn(1000, device='cuda') for e in range(7, 15): N = 2 ** e input_ = torch.randn(N, N, device='cuda') torch.cuda.synchronize() %timeit input_.min(dim=0); torch.cuda.synchronize() input_ = torch.randn(N, N, device='cuda').t() torch.cuda.synchronize() %timeit input_.min(dim=0); torch.cuda.synchronize() print() ``` Before ``` 1.7.0a0+5d7c3f9 21.7 µs ± 1.67 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 20.6 µs ± 773 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 22.5 µs ± 294 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 20.2 µs ± 250 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 26.4 µs ± 67 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 20.9 µs ± 316 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 33 µs ± 474 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 21.1 µs ± 218 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 84.2 µs ± 691 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 50.3 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 181 µs ± 2.36 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 145 µs ± 149 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 542 µs ± 753 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 528 µs ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.04 ms ± 9.74 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 2.01 ms ± 22.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` After ``` 1.7.0a0+9911817 21.4 µs ± 695 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 20.6 µs ± 989 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 22.4 µs ± 153 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 20.5 µs ± 58.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 26.6 µs ± 147 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 20.9 µs ± 675 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 35.4 µs ± 560 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 21.7 µs ± 1.17 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) 86.5 µs ± 1.99 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 52.2 µs ± 1.57 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 195 µs ± 2.97 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 153 µs ± 4.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 550 µs ± 7.72 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 527 µs ± 3.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.05 ms ± 7.87 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 2 ms ± 4.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42649 Reviewed By: ezyang Differential Revision: D22994446 Pulled By: ngimel fbshipit-source-id: cc60beebad2e04c26ebf3ca702a6cb05846522c9	2020-08-12 13:09:36 -07:00
Bram Wasti	ada8404f2d	[jit] Scaffold a static runtime (#42753 ) Summary: The premise of this approach is that a small subset of neural networks are well represented by a data flow graph. The README contains more information. The name is subject to change, but I thought it was a cute reference to fire. suo let me know if you'd prefer this in a different spot. Since it lowers a JIT'd module directly I assumed the JIT folder would be appropriate. There is no exposed Python interface yet (but is mocked up in `test_accelerant.py`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/42753 Reviewed By: zou3519 Differential Revision: D23043771 Pulled By: bwasti fbshipit-source-id: 5353731e3aae31c08b5b49820815da98113eb551	2020-08-12 13:05:27 -07:00
Ivan Kobzarev	59f8692350	[pytorch] BUCK build for Vulkan backend Summary: Introducing `//xplat/caffe2:aten_vulkan` target which contains pytorch Vulkan backend and its ops. `//xplat/caffe2:aten_vulkan` depends on ` //xplat/caffe2:aten_cpu` Just inclusion it to linking registers Vulkan Backend and its ops. Code generation: 1. `VulkanType.h`, `VulkanType.cpp` Tensor Types for Vulkan backend are generated by `//xplat/caffe2:gen_aten_vulkan` which runs aten code generation (`aten/src/ATen/gen.py`) with `--vulkan` argument. 2. Shaders compilation `//xplat/caffe2:gen_aten_vulkan_spv` genrule runs `//xplat/caffe2:gen_aten_vulkan_spv_bin` which is a wrapper on `aten/src/ATen/native/vulkan/gen_spv.py` GLSL files are listed in `aten/src/ATen/native/vulkan/glsl/*` and to compile them `glslc` (glsl compiler) is required. `glslc` is in opensource https://github.com/google/shaderc , that also has a few dependencies on other libraries, that porting this build to BUCK will take significant amount of time. To use `glslc` in BUCK introducing dotslash `xplat/caffe2/fb/vulkan/dotslash/glslc` which is stored on manifold the latest prebuilt binaries of `glslc` from ANDROID_NDK for linux, macos and windows. Not using it from ANDROID_NDK directly allows to update it without dependency on ndk. Test Plan: Building aten_vulkan target: ``` buck build //xplat/caffe2:aten_vulkan ``` Building vulkan_test that contains vulkan unittests for android: ``` buck build //xplat/caffe2:pt_vulkan_test_binAndroid#android-armv7 ``` And running it on the device with vulkan support. Reviewed By: iseeyuan Differential Revision: D22770299 fbshipit-source-id: 843af8df226d4b5395b8e480eb47b233d57201df	2020-08-12 10:34:41 -07:00
Nikita Shulga	ea65a56854	Use `string(APPEND FOO " bar")` instead of `set(FOO "${FOO} bar") (#42844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42844 Reviewed By: scintiller Differential Revision: D23067577 Pulled By: malfet fbshipit-source-id: e4380ce02fd6aca37c955a7bc24435222c5d8b19	2020-08-12 10:33:11 -07:00
Richard Zou	3d3752d716	Revert D22898051: [pytorch][PR] Fix freeze_module pass for sharedtype Test Plan: revert-hammer Differential Revision: D22898051 (`4665f3fc8d`) Original commit changeset: 8b1d80f0eb40 fbshipit-source-id: 4dc0ba274282a157509db16df13269eed6cd5be9	2020-08-12 10:28:03 -07:00
Richard Zou	bda0007620	Improve calling backward() and grad() inside vmap error messages (#42876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42876 Previously, the error messages were pretty bad. This PR adds nice error messages for the following cases: - user attempts to call .backward() inside vmap for any reason whatsoever - user attempts to call autograd.grad(outputs, inputs, grad_outputs), where outputs or inputs is being vmapped over (so they are BatchedTensors). The case we do support is calling autograd.grad(outputs, inputs, grad_outputs) where `grad_outputs` is being vmapped over. This is the case for batched gradient support (e.g., user passes in a batched grad_output). Test Plan: - new tests: `pytest test/test_vmap.py -v` Reviewed By: ezyang Differential Revision: D23059836 Pulled By: zou3519 fbshipit-source-id: 2fd4e3fd93f558e67e2f0941b18f0d00d8ab439f	2020-08-12 10:05:31 -07:00
Nikita Shulga	5c39146c34	Fix get_writable_path (#42895 ) Summary: As name suggests, this function should always return a writable path Call `mkdtemp` to create temp folder if path is not writable This fixes `TestNN.test_conv_backcompat` if PyTorch is installed in non-writable location Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/42895 Reviewed By: dzhulgakov Differential Revision: D23070320 Pulled By: malfet fbshipit-source-id: ed6a681d46346696a0de7e71f0b21cba852a964e	2020-08-12 09:38:24 -07:00
Hector Yuen	5157afcf59	fix int8 FC (#42691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42691 fix quantization of FC bias to match nnpi quantize biases to fp16 Test Plan: improved the unit test to have input tensors in fp32 Reviewed By: tracelogfb Differential Revision: D22941521 fbshipit-source-id: 00afb70610f8a149110344d52595c39e3fc988ab	2020-08-12 09:30:34 -07:00
Mingfei Ma	686705c98b	Optimize LayerNorm performance on CPU both forward and backward (#35750 ) Summary: This PR aims at improving `LayerNorm` performance on CPU for both forward and backward. Results on Xeon 6248: 1. single socket inference 1.14x improvement 2. single core inference 1.77x improvement 3. single socket training 6.27x improvement The fine tuning of GPT2 on WikiTest2 dataset time per iteration on dual socket reduced from 4.69s/it to 3.16s/it, 1.48x improvement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35750 Reviewed By: zhangguanheng66 Differential Revision: D20810026 Pulled By: glaringlee fbshipit-source-id: c5801bd76eb944f2e46c2fe4991d9ad4f40495c3	2020-08-12 09:17:20 -07:00
Hameer Abbasi	75a15d3d01	Follow-up for pytorch/pytorch#37091. (#42806 ) Summary: This is a follow-up PR for https://github.com/pytorch/pytorch/issues/37091, fixing some of the quirks of that PR as that one was landed early to avoid merge conflicts. This PR addresses the following action items: - [x] Use error-handling macros instead of a `try`-`catch`. - [x] Renamed and added comments to clarify the use of `HANDLED_FUNCTIONS_WRAPPERS` in tests. `HANDLED_FUNCTIONS_NAMESPACES` was already removed in the last PR as we had a way to test for methods. This PR does NOT address the following action item, as it proved to be difficult: - [ ] Define `__module__` for whole API. Single-line repro-er for why this is hard: ```python >>> torch.Tensor.grad.__get__.__module__ = "torch.Tensor.grad" Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'method-wrapper' object has no attribute '__module__' ``` Explanation: Methods defined in C/properties don't always have a `__dict__` attribute or a mutable `__module__` slot for us to modify. The documentation action items were addressed in the following commit, with the additional future task of adding the rendered RFCs to the documentation: `552ba37c05` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42806 Reviewed By: smessmer Differential Revision: D23031501 Pulled By: ezyang fbshipit-source-id: b781c97f7840b8838ede50a0017b4327f96bc98a	2020-08-12 09:11:33 -07:00
Peter Bell	2878efb35d	Use `C10_API_ENUM` to fix invalid attribute warnings (#42464 ) Summary: Using the macro added in https://github.com/pytorch/pytorch/issues/38988 to fix more attribute warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42464 Reviewed By: malfet Differential Revision: D22916943 Pulled By: ezyang fbshipit-source-id: ab9ca8755cd8b89aaf7f8718b4107b4b94d95005	2020-08-12 09:02:49 -07:00
Kurt Mohler	2f1baf6c25	Fix coding style and safety issues in CuBLAS nondeterministic unit test (#42627 ) Summary: Addresses some comments that were left unaddressed after PR https://github.com/pytorch/pytorch/issues/41377 was merged: * Use `check_output` instead of `Popen` to run each subprocess sequentially * Use f-strings rather than old python format string style * Provide environment variables to subprocess through the `env` kwarg * Check for correct error behavior inside the subprocess, and raise another error if incorrect. Then the main process fails the test if any error is raised Pull Request resolved: https://github.com/pytorch/pytorch/pull/42627 Reviewed By: malfet Differential Revision: D22969231 Pulled By: ezyang fbshipit-source-id: 38d5f3f0d641c1590a93541a5e14d90c2e20acec	2020-08-12 08:54:28 -07:00
mattip	77bd4d3426	MAINT: speed up istft by using col2im (the original python code used … (#42826 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42213 The [original python code](https://github.com/pytorch/audio/blob/v0.5.0/torchaudio/functional.py#L178) from `torchaudio` was converted to a native function, but used `eye` to allocate a Tensor and was much slower. Using `at::col2im` (which is the equivalent of `torch.nn.functional.fold`) solved the slowdown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42826 Reviewed By: smessmer Differential Revision: D23043673 Pulled By: mthrok fbshipit-source-id: 3f5d0779a87379b002340ea19c9ae5042a43e94e	2020-08-12 08:39:12 -07:00
Kimish Patel	4665f3fc8d	Fix freeze_module pass for sharedtype (#42457 ) Summary: During cleanup phase, calling recordReferencedAttrs would record the attributes which are referenced and hence kept. However, if you have two instances of the same type which are preserved through freezing process, as the added testcase shows, then during recording the attributes which are referenced, we iterate through the type INSTANCES that we have seen so far and record those ones. Thus if we have another instance of the same type, we will just look at the first instance in the list, and record that instances. This PR fixes that by traversing the getattr chains and getting the actual instance of the getattr output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42457 Test Plan: python test/test_jit.py TestFreezing Fixes #{issue number} Reviewed By: zou3519 Differential Revision: D22898051 Pulled By: kimishpatel fbshipit-source-id: 8b1d80f0eb40ab99244f931d4a1fdb28290a4683	2020-08-12 08:35:05 -07:00
Ehsan K. Ardestani	ecb9e790ed	Remove excessive logging in plan_executor (#42888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42888 as title Test Plan: flow-cli test-locally dper.workflows.evaluation.eval_workflow --parameters-file /mnt/public/ehsanardestani/temp/quant_eval_inputs_all.json Reviewed By: amylittleyang Differential Revision: D23066529 fbshipit-source-id: f925afd1734e617e412b0f171e16c781d13272d9	2020-08-11 23:57:17 -07:00
Ophir Romano	a346e90c49	Update to NNP-I v1.0.0.5 (#4770 ) Summary: Align code to NNP-I v1.0.0.5 (glow tracing changes). Pull Request resolved: https://github.com/pytorch/glow/pull/4770 Reviewed By: arunm-git Differential Revision: D22927904 Pulled By: hl475 fbshipit-source-id: 3746a6b07f3fcffc662d80a95513427cfccac7a5	2020-08-11 23:53:23 -07:00
kshitij12345	ab0a04dc9c	Add `torch.nansum` (#38628 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/38349 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38628 Reviewed By: VitalyFedyunin Differential Revision: D22860549 Pulled By: mruberry fbshipit-source-id: 87fcbfd096d83fc14b3b5622f2301073729ce710	2020-08-11 22:26:04 -07:00
Basil Hosmer	38c7b9a168	avoid redundant isCustomClassRegistered() checks (#42852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42852 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D23048381 Pulled By: bhosmer fbshipit-source-id: 40b71670a84cb6f7e5a03279f58ce227d676aa03	2020-08-11 21:53:19 -07:00
Mike Ruberry	bee174dc3f	Adds linalg.det alias, fixes outer alias, updates alias testing (#42802 ) Summary: This PR: - updates test_op_normalization.py, which verifies that aliases are correctly translated in the JIT - adds torch.linalg.det as an alias for torch.det - moves the torch.linalg.outer alias to torch.outer (to be consistent with NumPy) The torch.linalg.outer alias was put the linalg namespace erroneously as a placeholder since it's a "linear algebra op" according to NumPy but is actually still in the main NumPy namespace. The updates to test_op_normalization are necessary. Previously it was using method_tests to generate tests, and method_tests assumes test suites using it also use the device generic framework, which test_op_normalization did not. For example, some ops require decorators like `skipCPUIfNoLapack`, which only works in device generic test classes. Moving test_op_normalization to the device generic framework also lets these tests run on CPU and CUDA. Continued reliance on method_tests() is excessive since the test suite is only interested in testing aliasing, and a simpler and more readable `AliasInfo` class is used for the required information. An example impedance mismatch between method_tests and the new tests, for example, was how to handle ops in namespaces like torch.linalg.det. In the future this information will likely be folded into a common 'OpInfo' registry in the test suite. The actual tests performed are similar to what they were previously: a scripted and traced version of the op is run and the test verifies that both graphs do not contain the alias name and do contain the aliased name. The guidance for adding an alias has been updated accordingly. cc mattip Note: ngimel suggests: - deprecating and then removing the `torch.ger` name - reviewing the implementation of `torch.outer` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42802 Reviewed By: zou3519 Differential Revision: D23059883 Pulled By: mruberry fbshipit-source-id: 11321c2a7fb283a6e7c0d8899849ad7476be42d1	2020-08-11 21:48:31 -07:00
Alex Şuhan	cd756ee3d4	Support boolean key in dictionary (#42833 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41449 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/42833 Test Plan: `python test/test_jit.py TestDict` Reviewed By: zou3519 Differential Revision: D23056250 Pulled By: asuhan fbshipit-source-id: 90dabe1490c99d3e57a742140a4a2b805f325c12	2020-08-11 21:37:37 -07:00
Jerry Zhang	ac93d45906	[quant] Attach qconfig to all modules (#42576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42576 Previously we have qconfig propagate list and we only attach qconfig for modules in the list, this works when everything is quantized in the form of module. but now we are expanding quantization for functional/torch ops, we'll need to attach qconfig to all modules Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22939453 fbshipit-source-id: 7d6a1f73ff9bfe461b3afc75aa266fcc8f7db517	2020-08-11 20:34:34 -07:00
Ksenija Stanojevic	e845b0ab51	[Resending] [ONNX] Add eliminate_unused_items pass (#42743 ) Summary: This PR: - Adds eliminate_unused_items pass that removes unused inputs and initializers. - Fixes run_embed_params function so it doesn't export unnecessary parameters. - Removes test_modifying_params in test_verify since it's no longer needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42743 Reviewed By: hl475 Differential Revision: D23058954 Pulled By: houseroad fbshipit-source-id: cd1e81463285a0bf4e60766c8c87fc9a350d9c7e	2020-08-11 20:30:50 -07:00
Jerry Zhang	a846ed5ce7	[quant] Reduce number of variants of add/mul (#42769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42769 Some of the quantized add and mul can have the same name Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D23054822 fbshipit-source-id: c1300f3f0f046eaf0cf767d03b957835e22cfb4b	2020-08-11 20:01:06 -07:00
Kurt Mohler	5edd9aa95a	Fix manual seed to unpack unsigned long (#42206 ) Summary: `torch.manual_seed` was unpacking its argument as an `int64_t`. This fix changes it to a `uint64_t`. Fixes https://github.com/pytorch/pytorch/issues/33546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42206 Reviewed By: ezyang Differential Revision: D22822098 Pulled By: albanD fbshipit-source-id: 97c978139c5cb2d5b62cc2c963550c758ee994f7	2020-08-11 18:05:34 -07:00
Nikita Shulga	b0b8340065	Collect more data in collect_env (#42887 ) Summary: Collect Python runtime bitness (32 vs 64 bit) Collect Mac/Linux OS machine time (x86_64, arm, Power, etc) Collect Clang version Pull Request resolved: https://github.com/pytorch/pytorch/pull/42887 Reviewed By: seemethere Differential Revision: D23064788 Pulled By: malfet fbshipit-source-id: df361bdbb79364dc521b8e1ecbed1b4bd08f9742	2020-08-11 18:01:14 -07:00
Christopher Whelan	7a9ae52550	[hypothesis] Deadline followup (#42842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42842 Test Plan: `buck test` Reviewed By: thatch Differential Revision: D23045269 fbshipit-source-id: 8a3f4981869287a0f5fb3f0009e13548b7478086	2020-08-11 15:33:23 -07:00
Basil Hosmer	eeb43ffab9	format for readability (#42851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42851 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D23048382 Pulled By: bhosmer fbshipit-source-id: 55d84d5f9c69be089056bf3e3734c1b1581dc127	2020-08-11 14:46:42 -07:00
Hector Yuen	3bf2978497	remove deadline enforcement for hypothesis (#42871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42871 old version of hypothesis.testing was not enforcing deadlines after the library got updated, default deadline=200ms, but even with 1s or more, tests are flaky. Changing deadline to non-enforced which is the same behavior as the old version Test Plan: tested fakelowp/tests Reviewed By: hl475 Differential Revision: D23059033 fbshipit-source-id: 79b6aec39a2714ca5d62420c15ca9c2c1e7a8883	2020-08-11 14:28:53 -07:00
James Reed	0ff0fea42b	[FX] fix lint (#42866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42866 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23056813 Pulled By: jamesr66a fbshipit-source-id: d30cdffe6f0465223354dec00f15658eb0b08363	2020-08-11 14:01:26 -07:00
Yanan Cao	43613b4236	Fix incorrect aten::sorted.str return type (#42853 ) Summary: aten::sorted.str output type was incorrectly set to bool[] due to a copy-paste error. This PR fixes it. Fixes https://fburl.com/0rv8amz7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42853 Reviewed By: yf225 Differential Revision: D23054907 Pulled By: gmagogsfm fbshipit-source-id: a62968c90f0301d4a5546e6262cb9315401a9729	2020-08-11 14:01:23 -07:00
Edson Romero	71dbfc79b3	Export BatchBucketOneHot Caffe2 Operator to PyTorch Summary: As titled. Test Plan: ``` buck test caffe2/caffe2/python/operator_test:torch_integration_test -- test_batch_bucket_one_hot_op ``` Reviewed By: yf225 Differential Revision: D23005981 fbshipit-source-id: 1daa8d3e7d6ad75e97e94964db95ccfb58541672	2020-08-11 14:00:19 -07:00
Nikita Shulga	4afbf39737	Add nn.functional.adaptive_avg_pool size empty tests (#42857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42857 Reviewed By: seemethere Differential Revision: D23053677 Pulled By: malfet fbshipit-source-id: b3d0d517cddc96796461332150e74ae94aac8090	2020-08-11 12:59:58 -07:00
Yury Gitman	9c8f5cb61d	Ensure IDEEP transpose operator works correctly Summary: I found out that without exporting to public format IDEEP transpose operator in the middle of convolution net produces incorrect results (probably reading some out-of-bound memory). Exporting to public format might not be the most efficient solution, but at least it ensures correct behavior. Test Plan: Running ConvFusion followed by transpose should give identical results on CPU and IDEEP Reviewed By: bwasti Differential Revision: D22970872 fbshipit-source-id: 1ddca16233e3d7d35a367c93e72d70632d28e1ef	2020-08-11 12:58:31 -07:00
Heitor Schueroff de Souza	c660d2a9ae	Initial quantile operator implementation (#42755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42755 Attempting to land quantile again after being landed here https://github.com/pytorch/pytorch/pull/39417 and reverted here https://github.com/pytorch/pytorch/pull/41616. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23030338 Pulled By: heitorschueroff fbshipit-source-id: 124a86eea3aee1fdaa0aad718b04863935be26c7	2020-08-11 12:08:17 -07:00
Hong Xu	6471b5dc66	Correct the type of some floating point literals in calc_digamma (#42846 ) Summary: They are double, but they are supposed to be of accscalar_t or a faster type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42846 Reviewed By: zou3519 Differential Revision: D23049405 Pulled By: mruberry fbshipit-source-id: 29bb5d5419dc7556b02768f0ff96dfc28676f257	2020-08-11 11:39:06 -07:00
Mike Ruberry	4bafca1a69	Adds list of operator-related information for testing (#41662 ) Summary: This PR adds: - an "OpInfo" class in common_method_invocations that can contain useful information about an operator, like what dtypes it supports - a more specialized "UnaryUfuncInfo" class designed to help test the unary ufuncs - the `ops` decorator, which can generate test variants from lists of OpInfos - test_unary_ufuncs.py, a new test suite stub that shows how the `ops` decorator and operator information can be used to improve the thoroughness of our testing The single test in test_unary_ufuncs.py simply ensures that the dtypes associated with a unary ufunc operator in its OpInfo entry are correct. Writing a test like this previously, however, would have required manually constructing test-specific operator information and writing a custom test generator. The `ops` decorator and a common place to put operator information make writing tests like this easier and allows what would have been test-specific information to be reused. The `ops` decorator extends and composes with the existing device generic test framework, allowing its decorators to be reused. For example, the `onlyOnCPUAndCUDA` decorator works with the new `ops` decorator. This should keep the tests readable and consistent. Future PRs will likely: - continue refactoring the too large test_torch.py into more verticals (unary ufuncs, binary ufuncs, reductions...) - add more operator information to common_method_invocations.py - refactor tests for unary ufuncs into test_unary_ufunc Examples of possible future extensions are [here](`616747e50d`), where an example unary ufunc test is added, and [here](`d0b624f110`), where example autograd tests are added. Both tests leverage the operator info in common_method_invocations to simplify testing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41662 Reviewed By: ngimel Differential Revision: D23048416 Pulled By: mruberry fbshipit-source-id: ecce279ac8767f742150d45854404921a6855f2c	2020-08-11 11:34:53 -07:00
Nick Gibson	aabdef51f9	[NNC] Registerizer for GPU [1/x] (#42606 ) Summary: Adds a new optimization pass, the Registerizer, which looks for common Stores and Loads to a single item in a buffer and replaces them with a local temporary scalar which is cheaper to write. For example it can replace: ``` A[0] = 0; for (int x = 0; x < 10; x++) { A[0] = (A[0]) + x; } ``` with: ``` int A_ = 0; for (int x = 0; x < 10; x++) { A_ = x + A_; } A[0] = A_; ``` This is particularly useful on GPUs when parallelizing, since after replacing loops with metavars we have a lot of accesses like this. Early tests of simple reductions on a V100 indicates this can speed them up by ~5x. This diff got a bit unwieldy with the integration code so that will come in a follow up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42606 Reviewed By: bertmaher Differential Revision: D22970969 Pulled By: nickgg fbshipit-source-id: 831fd213f486968624b9a4899a331ea9aeb40180	2020-08-11 11:17:50 -07:00
Vasiliy Kuznetsov	57b056b5f2	align qlinear benchmark to linear benchmark (#42767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42767 Same as previous PR, forcing the qlinear benchmark to follow the fp one Test Plan: ``` cd benchmarks/operator_benchmark python -m pt.linear_test python -m pt.qlinear_test ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23013937 fbshipit-source-id: fffaa7cfbfb63cea41883fd4d70cd3f08120aaf8	2020-08-11 10:35:16 -07:00
Vasiliy Kuznetsov	a7bdf575cb	align qconv benchmark to conv benchmark (#42761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42761 Makes the qconv benchmark follow the conv benchmark exactly. This way it will be easy to compare q vs fp with the same settings. Test Plan: ``` cd benchmarks/operator_benchmark python -m pt.qconv_test python -m pt.conv_test ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23012533 fbshipit-source-id: af30ee585389395569a6322f5210828432963077	2020-08-11 10:33:19 -07:00
Kurt Mohler	2c8cbd78bd	Fix orgqr input size conditions (#42825 ) Summary: * Adds support for `n > k` * Throw error if `m >= n >= k` is not true * Updates existing error messages to match argument names shown in public docs * Adds error tests Fixes https://github.com/pytorch/pytorch/issues/41776 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42825 Reviewed By: smessmer Differential Revision: D23038916 Pulled By: albanD fbshipit-source-id: e9bec7b11557505e10e0568599d0a6cb7e12ab46	2020-08-11 10:17:39 -07:00
James Reed	575e7497f6	Introduce experimental FX library (#42741 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42741 Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D23006383 Pulled By: jamesr66a fbshipit-source-id: 6cb6d921981fcae47a07df581ffcf900fb8a7fe8	2020-08-11 10:01:47 -07:00
Yujun Zhao	7524699d58	Modify clang code coverage to CMakeList.txt (for MacOS) (#42837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42837 Originally we use ``` list(APPEND CMAKE_C_FLAGS -fprofile-instr-generate -fcoverage-mapping) list(APPEND CMAKE_CXX_FLAGS -fprofile-instr-generate -fcoverage-mapping) ``` But when compile project on mac with Coverage On, it has the error: `clang: error: no input files /bin/sh: -fprofile-instr-generate: command not found /bin/sh: -fcoverage-mapping: command not found` The reason behind it, is `list(APPEND CMAKE_CXX_FLAGS` will add an additional `;` to the variable. This means, if we do `list(APPEND foo a)` and then `list(APPEND foo b)`, then `foo` will be `a;b` -- with the additional `;`. Since we have `CMAKE_CXX_FLAGS` defined before in the `CMakeList.txt`, we can only use `set(...)` here After changing it to ``` set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fprofile-instr-generate -fcoverage-mapping") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fprofile-instr-generate -fcoverage-mapping") ``` Test successufully in local mac machine. Test Plan: Test locally on mac machine Reviewed By: malfet Differential Revision: D23043057 fbshipit-source-id: ff6f4891b35b7f005861ee2f8e4c550c997fe961	2020-08-11 09:57:55 -07:00
jgulian	42114a0154	Update the documentation for scatter to include streams parameter. (#42814 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41827 ![Screenshot from 2020-08-10 13-41-20](https://user-images.githubusercontent.com/46765601/89813181-41041380-db0f-11ea-88c2-a97d7b994ac5.png) Current: https://pytorch.org/docs/stable/cuda.html#communication-collectives Pull Request resolved: https://github.com/pytorch/pytorch/pull/42814 Reviewed By: smessmer Differential Revision: D23033544 Pulled By: mrshenli fbshipit-source-id: 88747fbb06e88ef9630c042ea9af07dafd422296	2020-08-11 09:28:14 -07:00
X Wang	1041bdebb0	Fix a typo in EmbeddingBag.cu (#42742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42742 Reviewed By: smessmer Differential Revision: D23011029 Pulled By: mrshenli fbshipit-source-id: 615f8b876ef1881660af71b6e145fb4ca97d2ebb	2020-08-11 09:24:38 -07:00
Gao, Xiang	916235284c	[JIT] Fix typing.Final for python 3.8 (#39568 ) Summary: fixes https://github.com/pytorch/pytorch/issues/39566 `typing.Final` is a thing since python 3.8, and on python 3.8, `typing_extensions.Final` is an alias of `typing.Final`, therefore, `ann.__module__ == 'typing_extensions'` will become False when using 3.8 and `typing_extensions` is installed. ~~I don't know why the test is skipped, seems like due to historical reason when python 2.7 was still a thing?~~ Edit: I know now, the `Final` for `<3.7` don't have `__origin__` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39568 Reviewed By: smessmer Differential Revision: D23043388 Pulled By: malfet fbshipit-source-id: cc87a9e4e38090d784e9cea630e1c543897a1697	2020-08-11 08:51:46 -07:00
Paul Shao	d28639a080	Optimization with Backward Implementation of Learnable Fake Quantize Per Channel Kernel (CPU and GPU) (#42810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42810 In this diff, the original backward pass implementation is sped up by merging the 3 iterations computing dX, dScale, and dZeroPoint separately. In this case, a native loop is directly used on a byte-wise level (referenced by `strides`). In addition, vectorization is used such that scale and zero point are expanded to share the same shape and the element-wise corresponding values to X along the channel axis. In the benchmark test on the operators, for an input of shape `3x3x256x256`, we have observed the following improvement in performance: Speedup from python operator: ~10x Speedup from original learnable kernel: ~5.4x Speedup from non-backprop kernel: ~1.8x Test Plan: To assert correctness of the new kernel, on a devvm, enter the command `buck test //caffe2/test:quantization -- learnable_backward_per_channel` To benchmark the operators, on a devvm, enter the command 1. Set the kernel size to 3x3x256x256 or a reasonable input size. 2. Run `buck test //caffe2/benchmarks/operator_benchmark/pt:quantization_test` 3. The relevant outputs for CPU are as follows: ``` # Benchmarking PyTorch: FakeQuantizePerChannelOpBenchmark # Mode: Eager # Name: FakeQuantizePerChannelOpBenchmark_N3_C3_H256_W256_cpu_op_typepy_module # Input: N: 3, C: 3, H: 256, W: 256, device: cpu, op_type: py_module Backward Execution Time (us) : 989024.686 # Benchmarking PyTorch: FakeQuantizePerChannelOpBenchmark # Mode: Eager # Name: FakeQuantizePerChannelOpBenchmark_N3_C3_H256_W256_cpu_op_typelearnable_kernel # Input: N: 3, C: 3, H: 256, W: 256, device: cpu, op_type: learnable_kernel Backward Execution Time (us) : 95654.079 # Benchmarking PyTorch: FakeQuantizePerChannelOpBenchmark # Mode: Eager # Name: FakeQuantizePerChannelOpBenchmark_N3_C3_H256_W256_cpu_op_typeoriginal_kernel # Input: N: 3, C: 3, H: 256, W: 256, device: cpu, op_type: original_kernel Backward Execution Time (us) : 176948.970 ``` 4. The relevant outputs for GPU are as follows: The relevant outputs are as follows Pre-optimization: ``` # Benchmarking PyTorch: FakeQuantizePerChannelOpBenchmark # Mode: Eager # Name: FakeQuantizePerChannelOpBenchmark_N3_C3_H256_W256_cuda_op_typepy_module # Input: N: 3, C: 3, H: 256, W: 256, device: cpu, op_type: py_module Backward Execution Time (us) : 6795.173 # Benchmarking PyTorch: FakeQuantizePerChannelOpBenchmark # Mode: Eager # Name: FakeQuantizePerChannelOpBenchmark_N3_C3_H256_W256_cuda_op_typelearnable_kernel # Input: N: 3, C: 3, H: 256, W: 256, device: cpu, op_type: learnable_kernel Backward Execution Time (us) : 4321.351 # Benchmarking PyTorch: FakeQuantizePerChannelOpBenchmark # Mode: Eager # Name: FakeQuantizePerChannelOpBenchmark_N3_C3_H256_W256_cuda_op_typeoriginal_kernel # Input: N: 3, C: 3, H: 256, W: 256, device: cpu, op_type: original_kernel Backward Execution Time (us) : 1052.066 ``` Post-optimization: ``` # Benchmarking PyTorch: FakeQuantizePerChannelOpBenchmark # Mode: Eager # Name: FakeQuantizePerChannelOpBenchmark_N3_C3_H256_W256_cuda_op_typepy_module # Input: N: 3, C: 3, H: 256, W: 256, device: cpu, op_type: py_module Backward Execution Time (us) : 6737.106 # Benchmarking PyTorch: FakeQuantizePerChannelOpBenchmark # Mode: Eager # Name: FakeQuantizePerChannelOpBenchmark_N3_C3_H256_W256_cuda_op_typelearnable_kernel # Input: N: 3, C: 3, H: 256, W: 256, device: cpu, op_type: learnable_kernel Backward Execution Time (us) : 2112.484 # Benchmarking PyTorch: FakeQuantizePerChannelOpBenchmark # Mode: Eager # Name: FakeQuantizePerChannelOpBenchmark_N3_C3_H256_W256_cuda_op_typeoriginal_kernel # Input: N: 3, C: 3, H: 256, W: 256, device: cpu, op_type: original_kernel Backward Execution Time (us) : 1078.79 Reviewed By: vkuzo Differential Revision: D22946853 fbshipit-source-id: 1a01284641480282b3f57907cc7908d68c68decd	2020-08-11 08:41:53 -07:00
Kurt Mohler	42b4a7132e	Raise error if `at::native::embedding` is given 0-D weight (#42550 ) Summary: Previously, `at::native::embedding` implicitly assumed that the `weight` argument would be 1-D or greater. Given a 0-D tensor, it would segfault. This change makes it throw a RuntimeError instead. Fixes https://github.com/pytorch/pytorch/issues/41780 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42550 Reviewed By: smessmer Differential Revision: D23040744 Pulled By: albanD fbshipit-source-id: d3d315850a5ee2d2b6fcc0bdb30db2b76ffffb01	2020-08-11 08:26:45 -07:00
Heitor Schueroff de Souza	d396d135db	Added torch::cuda::manual_seed(_all) to mirror torch.cuda.manual_seed(_all) (#42638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42638 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D23030317 Pulled By: heitorschueroff fbshipit-source-id: b0d7bdf0bc592a913ae5b1ffc14c3a5067478ce3	2020-08-11 08:22:20 -07:00
Richard Zou	e8f4b04d9a	vmap: temporarily disable support for random functions (#42617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42617 While we figure out the random plan, I want to initially disable support for random operations. This is because there is an ambiguity in what randomness means. For example, ``` tensor = torch.zeros(B0, 1) vmap(lambda t: t.normal_())(tensor) ``` in the above example, should tensor[0] and tensor[1] be equal (i.e., use the same random seed), or should they be different? The mechanism for disabling random support is as follows: - We add a new dispatch key called VmapMode - Whenever we're inside vmap, we enable VmapMode for all tensors. This is done via at::VmapMode::increment_nesting and at::VmapMode::decrement_nesting. - DispatchKey::VmapMode's fallback kernel is the fallthrough kernel. - We register kernels that raise errors for all random functions on DispatchKey::VmapMode. This way, whenever someone calls a random function on any tensor (not just BatchedTensors) inside of a vmap block, an error gets thrown. Test Plan: - pytest test/test_vmap.py -v -k "Operators" Reviewed By: ezyang Differential Revision: D22954840 Pulled By: zou3519 fbshipit-source-id: cb8d71062d4087e10cbf408f74b1a9dff81a226d	2020-08-11 07:19:51 -07:00
Heitor Schueroff de Souza	ffc3da35f4	Don't materialize output grads (#41821 ) Summary: Added a new option in AutogradContext to tell autograd to not materialize output grad tensors, that is, don't expand undefined/None tensors into tensors full of zeros before passing them as input to the backward function. This PR is the second part that closes https://github.com/pytorch/pytorch/issues/41359. The first PR is https://github.com/pytorch/pytorch/pull/41490. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41821 Reviewed By: albanD Differential Revision: D22693163 Pulled By: heitorschueroff fbshipit-source-id: a8d060405a17ab1280a8506a06a2bbd85cb86461	2020-08-11 04:27:07 -07:00
Mike Ruberry	ddcf3ded3e	Revert D23002043: add net transforms for fusion Test Plan: revert-hammer Differential Revision: D23002043 (`a4b763bc2c`) Original commit changeset: f0b13d51d68c fbshipit-source-id: d43602743af35db825e951358992e979283a26f6	2020-08-10 21:22:57 -07:00
Zafar	59b10f7929	[quant] Sorting the list of dispathes (#42758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42758 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23011764 Pulled By: z-a-f fbshipit-source-id: df87acdcf77ae8961a109eaba20521bc4f27ad0e	2020-08-10 21:05:30 -07:00
Mike Ruberry	dedcc30c84	Fix ROCm CI by increasing test timeout (#42827 ) Summary: ROCm is failing to run this test in the allotted time. See, for example, https://app.circleci.com/pipelines/github/pytorch/pytorch/198759/workflows/f6066acf-b289-46c5-aad0-6f4f663ce820/jobs/6618625. cc jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/42827 Reviewed By: pbelevich Differential Revision: D23042220 Pulled By: mruberry fbshipit-source-id: 52b426b0733b7b52ac3b311466d5000334864a82	2020-08-10 20:26:20 -07:00
Hector Yuen	a4b763bc2c	add net transforms for fusion (#42763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42763 add the fp16 fusions as net transforms: -layernorm fused with mul+add -swish int8 Test Plan: added unit test, ran flows Reviewed By: yinghai Differential Revision: D23002043 fbshipit-source-id: f0b13d51d68c240b05d2a237a7fb8273e996328b	2020-08-10 20:16:14 -07:00
NTT123	103887892c	Fix "non-negative integer" error messages (#42734 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42662 Use "positive integer" error message for consistency with: `17f76f9a78/torch/optim/lr_scheduler.py (L958-L959)` `ad7133d3c1/torch/utils/data/sampler.py (L102-L104)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42734 Reviewed By: zdevito Differential Revision: D23039575 Pulled By: smessmer fbshipit-source-id: 1be1e0caa868891540ecdbe6f471a6cd51c40ede	2020-08-10 19:39:37 -07:00
Nikita Shulga	c14a7f6808	adaptive_avg_pool[23]d: check output_size.size() (#42831 ) Summary: Return an error if output_size is unexpected Fixes https://github.com/pytorch/pytorch/issues/42578 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42831 Reviewed By: ezyang Differential Revision: D23039295 Pulled By: malfet fbshipit-source-id: d14a5e6dccdf785756635caee2c87151c9634872	2020-08-10 19:27:18 -07:00
Hongyi Jia	c9e825640a	[c10d] Template computeLengthsAndOffsets() (#42706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42706 Different backends accept different type of length to, like MPI_Alltoallv, nccSend/Recv(), gloo::alltoallv(). So to make computeLengthsAndOffsets() template Test Plan: Sandcastle CI HPC: ./trainer_cmd.sh -p 16 -n 8 -d nccl Reviewed By: osalpekar Differential Revision: D22961459 fbshipit-source-id: 45ec271f8271b96f2dba76cd9dce3e678bcfb625	2020-08-10 19:21:46 -07:00
Rohan Varma	a414bd69de	Skip test_c10d.ProcessGroupNCCLTest under TSAN (#42750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42750 All of these tests fail under TSAN since we fork in a multithreaded environment. ghstack-source-id: 109566396 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D23007746 fbshipit-source-id: 65571607522b790280363882d61bfac8a52007a1	2020-08-10 19:13:52 -07:00
Richard Zou	a2559652ab	Rename some BatchedTensorImpl APIs (#42700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42700 I was about to use `isBatched` somewhere not in the files used to implement vmap but then realized how silly that sounds due to ambiguity. This PR renames some of the BatchedTensor APIs to make a bit more sense to onlookers. - isBatched(Tensor) -> isBatchedTensor(Tensor) - unsafeGetBatched(Tensor) -> unsafeGetBatchedImpl(Tensor) - maybeGetBatched(Tensor) -> maybeGetBatchedImpl(Tensor) Test Plan: - build Pytorch, run tests. Reviewed By: ezyang Differential Revision: D22985868 Pulled By: zou3519 fbshipit-source-id: b8ed9925aabffe98085bcf5c81d22cd1da026f46	2020-08-10 17:43:20 -07:00
Richard Zou	8f67c7a624	BatchedTensor fallback: extended to support ops with multiple Tensor returns (#42628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42628 This PR extends the BatchedTensor fallback to support operators with multiple Tensor returns. If an operator has multiple returns, we stack shards of each return to create the full outputs. Test Plan: - `pytest test/test_vmap.py -v`. Added a new test for an operator with multiple returns (torch.var_mean). Reviewed By: izdeby Differential Revision: D22957095 Pulled By: zou3519 fbshipit-source-id: 5c0ec3bf51283cc4493b432bcfed1acf5509e662	2020-08-10 17:42:03 -07:00
Nikita Shulga	64a7939ee5	test_cpp_rpc: Build test_e2e_process_group.cpp only if USE_GLOO is true (#42836 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42776 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42836 Reviewed By: seemethere Differential Revision: D23041274 Pulled By: malfet fbshipit-source-id: 8605332701271bea6d9b3a52023f548c11d8916f	2020-08-10 16:54:26 -07:00
Ivan Kobzarev	8718524571	[vulkan] cat op (concatenate) (#41434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41434 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22754941 Pulled By: IvanKobzarev fbshipit-source-id: cd03577e1c2f639b2592d4b7393da4657422e23c	2020-08-10 16:24:20 -07:00
Nikita Shulga	3cf2551f2f	Fix `torch.nn.functional.grid_sample` crashes if `grid` has NaNs (#42703 ) Summary: In `clip_coordinates` replace `minimum(maximum(in))` composition with `clamp_max(clamp_min(in))` Swap order of `clamp_min` operands to clamp NaNs in grid to 0 Fixes https://github.com/pytorch/pytorch/issues/42616 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42703 Reviewed By: ezyang Differential Revision: D22987447 Pulled By: malfet fbshipit-source-id: a8a2d6de8043d6b77c8707326c5412d0250efae6	2020-08-10 16:20:09 -07:00
Linbin Yu	e06b4be5ae	change pt_defs.bzl to python file (#42725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42725 This diff changes pt_defs.bzl to pt_defs.py, so that it can be included as python source file. The reason is if we remove base ops, pt_defs.bzl becomes too big (8k lines) and we cannot pass its content to gen_oplist (python library). The easy solution is to change it to a python source file so that it can be used in gen_oplist. Test Plan: sandcastle Reviewed By: ljk53, iseeyuan Differential Revision: D22968258 fbshipit-source-id: d720fe2e684d9a2bf5bd6115b6e6f9b812473f12	2020-08-10 16:12:43 -07:00
Sinan Nasir	752f433a24	DDP communication hook: skip dividing grads by world_size if hook registered. (#42400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42400 mcarilli spotted that in the original DDP communication hook design described in [39272](https://github.com/pytorch/pytorch/issues/39272), the hooks receive grads that are already predivided by world size. It makes sense to skip the divide completely if hook registered. The hook is meant for the user to completely override DDP communication. For example, if the user would like to implement something like GossipGrad, always dividing by the world_size would not be a good idea. We also included a warning in the register_comm_hook API as: > GradBucket bucket's tensors will not be predivided by world_size. User is responsible to divide by the world_size in case of operations like allreduce. ghstack-source-id: 109548696 Update: We discovered and fixed a bug with the sparse tensors case. See new unit test called `test_ddp_comm_hook_sparse_gradients` and changes in `reducer.cpp`. Test Plan: python test/distributed/test_c10d.py and perf benchmark tests. Reviewed By: ezyang Differential Revision: D22883905 fbshipit-source-id: 3277323fe9bd7eb6e638b7ef0535cab1fc72f89e	2020-08-10 13:55:42 -07:00
Eli Uriegas	d7aaa3327b	.circleci: Only do comparisons when available (#42816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42816 Comparisons were being done on branches where the '<< pipeline.git.base_revision >>' didn't exist before so let's just move it so that comparison / code branch is only run when that variable is available Example: https://app.circleci.com/pipelines/github/pytorch/pytorch/198611/workflows/8a316eef-d864-4bb0-863f-1454696b1e8a/jobs/6610393 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23032900 Pulled By: seemethere fbshipit-source-id: 98a49c78b174d6fde9c6b5bd3d86a6058d0658bd	2020-08-10 12:33:37 -07:00
Spandan Tiwari	d83cc92948	[ONNX] Add support for scalar src in torch.scatter ONNX export. (#42765 ) Summary: `torch.scatter` supports two overloads – one where `src` input tensor is same size as the `index` tensor input, and second, where `src` is a scalar. Currrently, ONNX exporter only supports the first overload. This PR adds export support for the second overload of `torch.scatter`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42765 Reviewed By: hl475 Differential Revision: D23025189 Pulled By: houseroad fbshipit-source-id: 5c2a3f3ce3b2d69661a227df8a8e0ed7c1858dbf	2020-08-10 11:45:42 -07:00
Venkata Chintapalli	e7b5a23607	include missing settings import Summary: from hypothesis import given, settings Test Plan: test_op_nnpi_fp16.py Differential Revision: D23031038 fbshipit-source-id: 751547e6a6e992d8816d4cc2c5a699ba19a97796	2020-08-10 10:45:34 -07:00
Facebook Community Bot	77305c1e44	Automated submodule update: FBGEMM (#42781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42781 This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `fbd813e29f` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42771 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: dskhudia Differential Revision: D23015890 Pulled By: jspark1105 fbshipit-source-id: f0f62969f8744df96a4e7f5aff2ce95baabb2f76	2020-08-10 10:14:56 -07:00
Yujun Zhao	e5adf45dde	Add python unittest target to `caffe2/test/TARGETS` (#42766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42766 Summary Some python tests are missing in `caffe2/test/TARGETS`, add them to be more comprehension. According to [run_test.py](https://github.com/pytorch/pytorch/blob/master/test/run_test.py#L125), some tests are slower. Slow tests are added as independent targets and others are put together into one `others` target. The reason is because we want to reduce overhead, especially for code covarge collection. Tests in one target can be run as a bundle, and then coverage can be collected together. Typically coverage collection procedure is time-expensive, so this helps us save time. Test Plan: Run all the new test targets locally in dev server and record the time they cost. Statistics ``` # jit target real 33m7.694s user 653m1.181s sys 58m14.160s --------- Compare to Initial Jit Target runtime: ---------------- real 32m13.057s user 613m52.843s sys 54m58.678s ``` ``` # others target real 9m2.920s user 164m21.927s sys 12m54.840s ``` ``` # serialization target real 4m21.090s user 23m33.501s sys 1m53.308s ``` ``` # tensorexpr real 11m28.187s user 33m36.420s sys 1m15.925s ``` ``` # type target real 3m36.197s user 51m47.912s sys 4m14.149s ``` Reviewed By: malfet Differential Revision: D22979219 fbshipit-source-id: 12a30839bb76a64871359bc024e4bff670c5ca8b	2020-08-10 09:48:59 -07:00
Jeff Daily	bc779667d6	generalize circleci docker build.sh and add centos support (#41255 ) Summary: Add centos Dockerfile and support to circleci docker builds, and allow generic image names to be parsed by build.sh, so both hardcoded images and custom images can be built. Currently only adds a ROCm centos Dockerfile. CC ezyang xw285cornell sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41255 Reviewed By: mrshenli Differential Revision: D23003218 Pulled By: malfet fbshipit-source-id: 562c53533e7fb9637dc2e81edb06b2242afff477	2020-08-10 09:42:05 -07:00
Luca Wehrstedt	05f00532f5	Fix TensorPipe submodule (#42789 ) Summary: Not sure what happened, but possibly I landed a PR on PyTorch which updated the TensorPipe submodule to a commit hash of a PR of TensorPipe. Now that the latter PR has been merged though that same commit has a different hash. The commit referenced by PyTorch, therefore, has become orphaned. This is causing some issues. Hence here I am updating the commit, which however does not change a single line of code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42789 Reviewed By: houseroad Differential Revision: D23023238 Pulled By: lw fbshipit-source-id: ca2dcf6b7e07ab64fb37e280a3dd7478479f87fd	2020-08-10 02:15:44 -07:00
BowenBao	55ac240589	[ONNX] Fix scalar type cast for comparison ops (#37787 ) Summary: Always promote type casts for comparison operators, regardless if the input is tensor or scalar. Unlike arithmetic operators, where scalars are implicitly cast to the same type as tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37787 Reviewed By: hl475 Differential Revision: D21440585 Pulled By: houseroad fbshipit-source-id: fb5c78933760f1d1388b921e14d73a2cb982b92f	2020-08-09 23:00:57 -07:00
Mike Ruberry	162972e980	Fix op benchmark (#42757 ) Summary: A benchmark relies on abs_ having a functional variant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42757 Reviewed By: ngimel Differential Revision: D23011037 Pulled By: mruberry fbshipit-source-id: c04866015fa259e4c544e5cf0c33ca1e11091d92	2020-08-09 17:31:51 -07:00
Mike Ruberry	87970b70a7	Adds 'clip' alias for clamp (#42770 ) Summary: Per title. Also updates our guidance for adding aliases to clarify interned_string and method_test requirements. The alias is tested by extending test_clamp to also test clip. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42770 Reviewed By: ngimel Differential Revision: D23020655 Pulled By: mruberry fbshipit-source-id: f1d8e751de9ac5f21a4f95d241b193730f07b5dc	2020-08-09 02:46:02 -07:00
Basil Hosmer	b6810c1064	Include/ExcludeDispatchKeySetGuard API (#42658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42658 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22971426 Pulled By: bhosmer fbshipit-source-id: 4d63e0cb31745e7b662685176ae0126ff04cdece	2020-08-08 16:27:05 -07:00
Vasiliy Kuznetsov	79b8328aaf	optimize_for_mobile: bring packed params to root module (#42740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42740 Adds a pass to hoist conv packed params to root module. The benefit is that if there is nothing else in the conv module, subsequent passes will delete it, which will reduce module size. For context, freezing does not handle this because conv packed params is a custom object. Test Plan: ``` PYTORCH_JIT_LOG_LEVEL=">hoist_conv_packed_params.cpp" python test/test_mobile_optimizer.py TestOptimizer.test_hoist_conv_packed_params ``` Imported from OSS Reviewed By: kimishpatel Differential Revision: D23005961 fbshipit-source-id: 31ab1f5c42a627cb74629566483cdc91f3770a94	2020-08-08 15:53:20 -07:00
Vasiliy Kuznetsov	d8801f590c	fix asan failure for module freezing in conv bn folding (#42739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42739 This is a test case which fails with ASAN on at the module freezing step. Test Plan: ``` USE_ASAN=1 USE_CUDA=0 python setup.py develop LD_PRELOAD=/usr/lib64/libasan.so.4 python test/test_mobile_optimizer.py TestOptimizer.test_optimize_for_mobile_asan // output tail: https://gist.github.com/vkuzo/7a0018b9e10ffe64dab0ac7381479f23 ``` Imported from OSS Reviewed By: kimishpatel Differential Revision: D23005962 fbshipit-source-id: b7d4492e989af7c2e22197c16150812bd2dda7cc	2020-08-08 15:51:59 -07:00
Christopher Whelan	5cd0f5e8ec	[PyFI] Update hypothesis and switch from tp2 (#41645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41645 Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1405 Test Plan: buck test Reviewed By: thatch Differential Revision: D20323893 fbshipit-source-id: 54665d589568c4198e96a27f0ed8e5b41df7b86b	2020-08-08 12:13:04 -07:00
Mike Ruberry	b7a9bc0802	Revert D22217029: Add fake quantize operator that works in backward pass Test Plan: revert-hammer Differential Revision: D22217029 (`48e978ba18`) Original commit changeset: 7055a2cdafcf fbshipit-source-id: f57a27be412c6fbfd5a5b07a26f758ac36be3b67	2020-08-07 23:04:40 -07:00
Hector Yuen	18ca999e1a	integrate int8 swish with net transformer Summary: add a fuse path for deq->swish->quant update swish fake op interface to take arguments accordingly Test Plan: net_runner passes unit tests need to be updated Reviewed By: venkatacrc Differential Revision: D22962064 fbshipit-source-id: cef79768db3c8af926fca58193d459d671321f80	2020-08-07 23:01:06 -07:00
Basil Hosmer	c889de7e25	update DispatchKey::toString() (#42619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42619 Added missing entries to `DispatchKey::toString()` and reordered to match declaration order in `DispatchKey.h` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22963407 Pulled By: bhosmer fbshipit-source-id: 34a012135599f497c308ba90ea6e8117e85c74ac	2020-08-07 22:39:23 -07:00
Ivan Kobzarev	5dd230d6a2	[vulkan] inplace add_, relu_ (#41380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41380 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22754939 Pulled By: IvanKobzarev fbshipit-source-id: 19b0bbfc5e1f149f9996b5043b77675421ecb2ed	2020-08-07 21:18:17 -07:00
aviloria	6755e49cad	Set proper return type (#42454 ) Summary: This function was always expecting to return a `size_t` value Pull Request resolved: https://github.com/pytorch/pytorch/pull/42454 Reviewed By: ezyang Differential Revision: D22993168 Pulled By: ailzhang fbshipit-source-id: 044df8ce17983f04681bda8c30cd742920ef7b1e	2020-08-07 19:22:35 -07:00
Venkata Chintapalli	e95fbaaba3	Adding Peter's Swish Op ULP analysis. (#42573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42573 * Generate the ULP png files for different ranges. Test Plan: test_op_ulp_error.py Reviewed By: hyuen Differential Revision: D22938572 fbshipit-source-id: 6374bef6d44c38e1141030d44029dee99112cd18	2020-08-07 19:13:01 -07:00
Sinan Nasir	0a804be47d	[NCCL] DDP communication hook: getFuture() without cudaStreamAddCallback (#42335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42335 Main goal: For DDP communication hook, provide an API called "get_future" to retrieve a future associated with the completion of c10d.ProcessGroupNCCL.work. Enable NCCL support for this API in this diff. We add an API `c10::intrusive_ptr<c10::ivalue::Future> getFuture()` to `c10d::ProcessGroup::Work`. This API will only be supported by NCCL in the first version, the default implementation will throw UnsupportedOperation. We no longer consider a design that involves cudaStreamAddCallback which potentially was causing performance regression in [#41596](https://github.com/pytorch/pytorch/pull/41596). ghstack-source-id: 109461507 Test Plan: ```(pytorch) [sinannasir@devgpu017.ash6 ~/local/pytorch] python test/distributed/test_c10d.py Couldn't download test skip set, leaving all tests enabled... ..............................s.....................................................s................................ ---------------------------------------------------------------------- Ran 117 tests in 298.042s OK (skipped=2) ``` ### Facebook Internal: 2\. HPC PT trainer run to validate no regression. Check the QPS number: Master: QPS after 1000 iters: around ~34100 ``` hpc_dist_trainer --fb-data=none --mtml-fusion-level=1 --target-model=ifr_video --max-ind-range=1000000 --embedding-partition=row-wise mast --domain $USER"testvideo_master" --trainers 16 --trainer-version 1c53912 ``` ``` [0] I0806 142048.682 metrics_publishers.py:50] Finished iter 999, Local window NE: [0.963963 0.950479 0.953704], lifetime NE: [0.963963 0.950479 0.953704], loss: [0.243456 0.235225 0.248375], QPS: 34199 ``` [detailed logs](https://www.internalfb.com/intern/tupperware/details/task/?handle=priv3_global%2Fmast_hpc%2Fhpc.sinannasirtestvideo_mastwarm.trainer.trainer%2F0&ta_tab=logs) getFuture/new design: QPS after 1000 iters: around ~34030 ``` hpc_dist_trainer --fb-data=none --mtml-fusion-level=1 --target-model=ifr_video --max-ind-range=1000000 --embedding-partition=row-wise mast --domain $USER"testvideo_getFutureCyclicFix" --trainers 16 --trainer-version 8553aee ``` ``` [0] I0806 160149.197 metrics_publishers.py:50] Finished iter 999, Local window NE: [0.963959 0.950477 0.953704], lifetime NE: [0.963959 0.950477 0.953704], loss: [0.243456 0.235225 0.248375], QPS: 34018 ``` [detailed logs](https://www.internalfb.com/intern/tupperware/details/task/?handle=priv3_global%2Fmast_hpc%2Fhpc.sinannasirtestvideo_getFutureCyclicFix.trainer.trainer%2F0&ta_tab=logs) getFuture/new design Run 2: QPS after 1000 iters: around ~34200 ``` hpc_dist_trainer --fb-data=none --mtml-fusion-level=1 --target-model=ifr_video --max-ind-range=1000000 --embedding-partition=row-wise mast --domain $USER"test2video_getFutureCyclicFix" --trainers 16 --trainer-version 8553aee ``` ``` [0] I0806 160444.650 metrics_publishers.py:50] Finished iter 999, Local window NE: [0.963963 0.950482 0.953706], lifetime NE: [0.963963 0.950482 0.953706], loss: [0.243456 0.235225 0.248375], QPS: 34201 ``` [detailed logs](https://www.internalfb.com/intern/tupperware/details/task/?handle=priv3_global%2Fmast_hpc%2Fhpc.sinannasirtest2video_getFutureCyclicFix.trainer.trainer%2F0&ta_tab=logs) getFuture/old design (Regression): QPS after 1000 iters: around ~31150 ``` hpc_dist_trainer --fb-data=none --mtml-fusion-level=1 --target-model=ifr_video --max-ind-range=1000000 --embedding-partition=row-wise mast --domain $USER”testvideo_OLDgetFutureD22583690 (`d904ea5972`)" --trainers 16 --trainer-version 1cb5cbb ``` ``` priv3_global/mast_hpc/hpc.sinannasirtestvideo_OLDgetFutureD22583690 (`d904ea5972`).trainer.trainer/0 [0] I0805 101320.407 metrics_publishers.py:50] Finished iter 999, Local window NE: [0.963964 0.950482 0.953703], lifetime NE: [0.963964 0.950482 0.953703], loss: [0.243456 0.235225 0.248375], QPS: 31159 ``` 3\. `flow-cli` tests; roberta_base; world_size=4: Master: f210039922 ``` total: 32 GPUs -- 32 GPUs: p25: 0.908 35/s p50: 1.002 31/s p75: 1.035 30/s p90: 1.051 30/s p95: 1.063 30/s forward: 32 GPUs -- 32 GPUs: p25: 0.071 452/s p50: 0.071 449/s p75: 0.072 446/s p90: 0.072 445/s p95: 0.072 444/s backward: 32 GPUs -- 32 GPUs: p25: 0.821 38/s p50: 0.915 34/s p75: 0.948 33/s p90: 0.964 33/s p95: 0.976 32/s optimizer: 32 GPUs -- 32 GPUs: p25: 0.016 2037/s p50: 0.016 2035/s p75: 0.016 2027/s p90: 0.016 2019/s p95: 0.016 2017/s ``` getFuture new design: f210285797 ``` total: 32 GPUs -- 32 GPUs: p25: 0.952 33/s p50: 1.031 31/s p75: 1.046 30/s p90: 1.055 30/s p95: 1.070 29/s forward: 32 GPUs -- 32 GPUs: p25: 0.071 449/s p50: 0.072 446/s p75: 0.072 445/s p90: 0.072 444/s p95: 0.072 443/s backward: 32 GPUs -- 32 GPUs: p25: 0.865 37/s p50: 0.943 33/s p75: 0.958 33/s p90: 0.968 33/s p95: 0.982 32/s optimizer: 32 GPUs -- 32 GPUs: p25: 0.016 2037/s p50: 0.016 2033/s p75: 0.016 2022/s p90: 0.016 2018/s p95: 0.016 2017/s ``` Reviewed By: ezyang Differential Revision: D22833298 fbshipit-source-id: 1bb268d3b00335b42ee235c112f93ebe2f25b208	2020-08-07 18:48:35 -07:00
Jianyu Huang	d4a4c62df3	[caffe2] Fix the timeout (stuck) issues of dedup SparseAdagrad C2 kernel Summary: Backout D22800959 (`f30ac66e79`). This one is causing the timeout (machine stuck) issues for dedup kernels. Reverting it make the unit test pass. Still need to investigate why this is the culprit... Original commit changeset: 641d52a51070 Test Plan: ``` buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient $caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps$' --print-passing-details ``` Reviewed By: jspark1105 Differential Revision: D23008389 fbshipit-source-id: 4f1b9a41c78eaa5541d57b9d8aa12401e1d495f2	2020-08-07 18:42:36 -07:00
Jongsoo Park	3fa0581cf2	[fbgemm] use new more general depthwise 3d conv interface (#42697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42697 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/401 As title Test Plan: CI Reviewed By: dskhudia Differential Revision: D22972233 fbshipit-source-id: a2c8e989dee84b2c0587faccb4f8e3bcb05c797c	2020-08-07 18:30:56 -07:00
Ann Shan	13bc542829	Fix lite trainer unit test submodule registration (#42714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42714 Change two unit tests for the lite trainer to register two instances/objects of the same submodule type instead of the same submodule object twice. Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D22990736 Pulled By: ann-ss fbshipit-source-id: 2bf56b5cc438b5a5fc3db90d3f30c5c431d3ae77	2020-08-07 18:26:56 -07:00
Presley Graham	48e978ba18	Add fake quantize operator that works in backward pass (#40532 ) Summary: This diff adds FakeQuantizeWithBackward. This works the same way as the regular FakeQuantize module, allowing QAT to occur in the forward pass, except it has an additional quantize_backward parameter. When quantize_backward is enabled, the gradients are fake quantized as well (dynamically, using hard-coded values). This allows the user to see whether there would be a significant loss of accuracy if the gradients were quantized in their model. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40532 Test Plan: The relevant test for this can be run using `python test/test_quantization.py TestQATBackward.test_forward_and_backward` Reviewed By: supriyar Differential Revision: D22217029 Pulled By: durumu fbshipit-source-id: 7055a2cdafcf022f1ea11c3442721ae146d2b3f2	2020-08-07 17:47:01 -07:00
Edson Romero	2b04712205	Exposing Percentile Caffe2 Operator in PyTorch Summary: As titled. Test Plan: ``` buck test caffe2/caffe2/python/operator_test:torch_integration_test -- test_percentile ``` Reviewed By: yf225 Differential Revision: D22999896 fbshipit-source-id: 2e3686cb893dff1518d533cb3d78c92eb2a6efa5	2020-08-07 16:22:37 -07:00
Mike Ruberry	55b1706775	Skips some complex tests on ROCm (#42759 ) Summary: Fixes ROCm build on OSS master. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42759 Reviewed By: ngimel Differential Revision: D23011560 Pulled By: mruberry fbshipit-source-id: 3339ecbd5a0ca47aede6f7c3f84739af1ac820d5	2020-08-07 16:12:32 -07:00
Sebastian Messmer	95f4f67552	Restrict conversion to SmallVector (#42694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42694 The old implementation allowed calling SmallVector constructor and operator= for any type without restrictions, but then failed with a compiler error when the type wasn't a collection. Instead, we should only use it if Container follows a container concept and just not match the constructor otherwise. This fixes an issue kimishpatel was running into. ghstack-source-id: 109370513 Test Plan: unit tests Reviewed By: kimishpatel, ezyang Differential Revision: D22983020 fbshipit-source-id: c31264f5c393762d822f3d64dd2a8e3279d8da44	2020-08-07 15:47:29 -07:00
Vasiliy Kuznetsov	faca3c43e6	fix celu in quantized benchmark (#42756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42756 Similar to ELU, CELU was also broken in the quantized benchmark, fixing. Test Plan: ``` cd benchmarks/operator_benchmark python -m pt.qactivation_test ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23010863 fbshipit-source-id: 203e63f9cff760af6809f6f345b0d222dc1e9e1b	2020-08-07 15:23:50 -07:00
Facebook Community Bot	4eb66b814e	Automated submodule update: FBGEMM (#42713 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `a989b99279` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42713 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: amylittleyang Differential Revision: D22990108 Pulled By: jspark1105 fbshipit-source-id: 3252a0f5ad9546221ef2fe908ce6b896252e1887	2020-08-07 13:41:54 -07:00
Adam Simpkins	02f58bdbd7	[caffe2] add type annotations for caffe2.distributed.python Summary: Add Python type annotations for the `caffe2.distributed.python` module. Test Plan: Will check sandcastle results. Reviewed By: jeffdunn Differential Revision: D22994012 fbshipit-source-id: 30565cc41dd05b5fbc639ae994dfe2ddd9e56cb1	2020-08-07 13:12:53 -07:00
Darius Tan	6ebc0504ca	BAND, BOR and BXOR for NCCL (all_)reduce should throw runtime errors (#42669 ) Summary: cc rohan-varma Fixes https://github.com/pytorch/pytorch/issues/41362 #39708 # Description NCCL doesn't support `BAND, BOR, BXOR`. Since the [current mapping](`0642d17efc/torch/lib/c10d/ProcessGroupNCCL.cpp (L39)`) doesn't contain any of the mentioned bitwise operator, a default value of `ncclSum` is used instead. This PR should provide the expected behaviour where a runtime exception is thrown. # Notes - The way I'm throwing exceptions is derived from [ProcessGroupGloo.cpp](`0642d17efc/torch/lib/c10d/ProcessGroupGloo.cpp (L101)`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/42669 Reviewed By: ezyang Differential Revision: D22996295 Pulled By: rohan-varma fbshipit-source-id: 83a9fedf11050d2890f9f05ebcedf53be0fc3516	2020-08-07 13:09:07 -07:00
Presley Graham	7332c21f7a	Speed up HistogramObserver by vectorizing critical path (#41041 ) Summary: 22x speedup over the code this replaces. Tested on ResNet18 on a devvm using CPU only, using default parameters for HistogramObserver (i.e. 2048 bins). Pull Request resolved: https://github.com/pytorch/pytorch/pull/41041 Test Plan: To run the test against the reference (old) implementation, you can use `python test/test_quantization.py TestRecordHistogramObserver.test_histogram_observer_against_reference`. To run the benchmark, while in the folder `benchmarks/operator_benchmark`, you can use `python -m benchmark_all_quantized_test --operators HistogramObserverCalculateQparams`. Benchmark results before speedup: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: HistogramObserverCalculateQparams # Mode: Eager # Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_affine # Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_affine Forward Execution Time (us) : 185818.566 # Benchmarking PyTorch: HistogramObserverCalculateQparams # Mode: Eager # Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_symmetric # Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_symmetric Forward Execution Time (us) : 165325.916 ``` Benchmark results after speedup: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: HistogramObserverCalculateQparams # Mode: Eager # Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_affine # Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_affine Forward Execution Time (us) : 12242.241 # Benchmarking PyTorch: HistogramObserverCalculateQparams # Mode: Eager # Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_symmetric # Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_symmetric Forward Execution Time (us) : 12655.354 ``` Reviewed By: raghuramank100 Differential Revision: D22400755 Pulled By: durumu fbshipit-source-id: 639ac796a554710a33c8a930c1feae95a1148718	2020-08-07 12:29:23 -07:00
lixinyu	98de150381	C++ API TransformerEncoderLayer (#42633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42633 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22994332 Pulled By: glaringlee fbshipit-source-id: 873abdf887d135fb05bde560d695e2e8c992c946	2020-08-07 11:49:42 -07:00
Meghan Lele	eba35025e0	[JIT] Exclude staticmethods from TS class compilation (#42611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42611 Summary This commit modifies the Python frontend to ignore static functions on Torchscript classes when compiling them. They are currently included along with methods, which causes the first argument of the staticfunction to be unconditionally inferred to be of the type of the class it belongs to (regardless of how it is annotated or whether it is annotated at all). This can lead to compilation errors depending on how that argument is used in the body of the function. Static functions are instead imported and scripted as if they were standalone functions. Test Plan This commit augments the unit test for static methods in `test_class_types.py` to test that static functions can call each other and the class constructor. Fixes This commit fixes #39308. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D22958163 Pulled By: SplitInfinity fbshipit-source-id: 45c3c372792299e6e5288e1dbb727291e977a2af	2020-08-07 11:22:04 -07:00
vfdev	9f88bcb5a2	Minor typo fix (#42731 ) Summary: Just fixed a typo in test/test_sparse.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/42731 Reviewed By: ezyang Differential Revision: D22999930 Pulled By: mrshenli fbshipit-source-id: 1b5b21d7cb274bd172fb541b2761f727ba06302c	2020-08-07 11:17:51 -07:00
Ivan Kobzarev	04c62d4a06	[vulkan] Fix warnings: static_cast, remove unused (#42195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42195 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22803035 Pulled By: IvanKobzarev fbshipit-source-id: d7bf256437eccb5c421a7fd0aa8ec23a8fec0470	2020-08-07 11:12:54 -07:00
Peter Bell	586399c03f	Remove duplicate definitions of CppTypeToScalarType (#42640 ) Summary: I noticed that `TensorIteratorDynamicCasting.h` defines a helper meta-function `CPPTypeToScalarType` which does exactly the same thing as the `c10::CppTypeToScalarType` meta-function I added in gh-40927. No need for two identical definitions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42640 Reviewed By: malfet Differential Revision: D22969708 Pulled By: ezyang fbshipit-source-id: 8303c7f4a75ae248f393a4811ae9d2bcacab44ff	2020-08-07 11:02:42 -07:00
Nick Gibson	944ac133d0	[NNC] Remove VarBinding and go back to Let stmts (#42634 ) Summary: Awhile back when commonizing the Let and LetStmt nodes, I ended up removing both and adding a separate VarBinding section the Block. At the time I couldn't find a counter example, but I found it today: Local Vars and Allocations dependencies may go in either direction and so we need to support interleaving of those statements. So, I've removed all the VarBinding logic and reimplemented Let statements. ZolotukhinM I think you get to say "I told you so". No new tests, existing tests should cover this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42634 Reviewed By: mruberry Differential Revision: D22969771 Pulled By: nickgg fbshipit-source-id: a46c5193357902d0f59bf30ab103fe123b1503f1	2020-08-07 10:50:38 -07:00
Stephen Chen	2971bc23a6	Handle fused scale and bias in fake fp16 layernorm Summary: Allow passing scale and bias to fake fp16 layernorm. Test Plan: net_runner. Now matches glow's fused layernorm. Reviewed By: hyuen Differential Revision: D22952646 fbshipit-source-id: cf9ad055b14f9d0167016a18a6b6e26449cb4de8	2020-08-07 10:48:33 -07:00
Nikita Shulga	dcee8933fb	Fix some linking rules to allow path with whitespaces (#42718 ) Summary: Essentially, replace `-Wl,--whole-archive,$<TARGET_FILE:FOO>` with `-Wl,--whole-archive,\"$<TARGET_FILE:FOO>\"` as TARGET_FILE might return path containing whitespaces Fixes https://github.com/pytorch/pytorch/issues/42657 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42718 Reviewed By: ezyang Differential Revision: D22993568 Pulled By: malfet fbshipit-source-id: de878b17d20e35b51dd350f20d079c8b879f70b5	2020-08-07 10:23:23 -07:00
Mike Ruberry	9c8021c0b1	Adds torch.linalg namespace (#42664 ) Summary: This PR adds the `torch.linalg` namespace as part of our continued effort to be more compatible with NumPy. The namespace is tested by adding a single function, `torch.linalg.outer`, and testing it in a new test suite, test_linalg.py. It follows the same pattern that https://github.com/pytorch/pytorch/pull/41911, which added the `torch.fft` namespace, did. Future PRs will likely: - add more functions to torch.linalg - expand the testing done in test_linalg.py, including legacy functions, like torch.ger - deprecate existing linalg functions outside of `torch.linalg` in preference to the new namespace Pull Request resolved: https://github.com/pytorch/pytorch/pull/42664 Reviewed By: ngimel Differential Revision: D22991019 Pulled By: mruberry fbshipit-source-id: 39258d9b116a916817b3588f160b141f956e5d0b	2020-08-07 10:18:30 -07:00
anjali411	c9346ad3b8	[CPU] Added torch.bmm for complex tensors (#42383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42383 Test Plan - Updated existing tests to run for complex dtypes as well. Also added tests for `torch.addmm`, `torch.badmm` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22960339 Pulled By: anjali411 fbshipit-source-id: 0805f21caaa40f6e671cefb65cef83a980328b7d	2020-08-07 10:04:20 -07:00
Nikita Shulga	31ed468905	Fix cmake warning (#42707 ) Summary: If argumenets in set_target_properties are not separated by whitespace, cmake raises a warning: ``` CMake Warning (dev) at cmake/public/cuda.cmake:269: Syntax Warning in cmake code at column 54 Argument not separated from preceding token by whitespace. ``` Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/42707 Reviewed By: ailzhang Differential Revision: D22988055 Pulled By: malfet fbshipit-source-id: c3744f23b383d603788cd36f89a8286a46b6c00f	2020-08-07 09:57:21 -07:00
Ivan Kobzarev	3c66a3795a	[vulkan] Ops registration to TORCH_LIBRARY_IMPL (#42194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42194 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22803036 Pulled By: IvanKobzarev fbshipit-source-id: 2f402541aecf887d78f650bf05d758a0e403bc4d	2020-08-07 09:06:22 -07:00
DeepakVelmurugan	4eb02add51	Blacklist to Blocklist in onnxifi_transformer (#42590 ) Summary: Fixes issues in https://github.com/pytorch/pytorch/issues/41704 and https://github.com/pytorch/pytorch/issues/41705 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42590 Reviewed By: ailzhang Differential Revision: D22977357 Pulled By: malfet fbshipit-source-id: ab61b964cfdf8bd2b469f4ff8f6486a76bc697de	2020-08-07 08:05:32 -07:00
Jordan Fix	fb8aa0046c	Add use_glow_aot, and include ONNX again as a backend for onnxifiGlow (#4787 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/4787 Resurrect ONNX as a backend through onnxifiGlow (was killed as part of D16215878). Then look for the `use_glow_aot` argument in the Onnxifi op. If it's there and true, then we override whatever `backend_id` is set and use the ONNX backend. Reviewed By: yinghai, rdzhabarov Differential Revision: D22762123 fbshipit-source-id: abb4c3458261f8b7eeae3016dda5359fa85672f0	2020-08-07 04:31:24 -07:00
Mike Ruberry	73642d9425	Updates alias pattern (and torch.absolute to use it) (#42586 ) Summary: This PR canonicalizes our (current) pattern for adding aliases to PyTorch. That pattern is: - Copy the original functions native_functions.yaml entry, but replace the original function's name with their own. - Implement the corresponding functions and have them redispatch to the original function. - Add docstrings to the new functions that reference the original function. - Update the alias_map in torch/csrc/jit/passes/normalize_ops.cpp. - Update the op_alias_mappings in torch/testing/_internal/jit_utils.py. - Add a test validating the alias's behavior is the same as the original function's. An alternative pattern would be to use Python and C++ language features to alias ops directly. For example in Python: ``` torch.absolute = torch.abs ``` Let the pattern in this PR be the "native function" pattern, and the alternative pattern be the "language pattern." There are pros/cons to both approaches: Pros of the "Language Pattern" - torch.absolute is torch.abs. - no (or very little) overhead for calling the alias. - no native_functions.yaml redundancy or possibility of "drift" between the original function's entries and the alias's. Cons of the "Language Pattern" - requires manually adding doc entries - requires updating Python alias and C++ alias lists - requires hand writing alias methods on Tensor (technically this should require a C++ test to validate) - no single list of all PyTorch ops -- have to check native_functions.yaml and one of the separate alias lists Pros of the "Native Function" pattern - alias declarations stay in native_functions.yaml - doc entries are written as normal Cons of the "Native Function" pattern - aliases redispatch to the original functions - torch.absolute is not torch.abs (requires writing test to validate behavior) - possibility of drift between original's and alias's native_functions.yaml entries While either approach is reasonable, I suggest the "native function" pattern since it preserves "native_functions.yaml" as a source of truth and minimizes the number of alias lists that need to be maintained. In the future, entries in native_functions.yaml may support an "alias" argument and replace whatever pattern we choose now. Ops that are likely to use aliasing are: - div (divide, true_divide) - mul (multiply) - bucketize (digitize) - cat (concatenate) - clamp (clip) - conj (conjugate) - rad2deg (degrees) - trunc (fix) - neg (negative) - deg2rad (radians) - round (rint) - acos (arccos) - acosh (arcosh) - asin (arcsin) - asinh (arcsinh) - atan (arctan) - atan2 (arctan2) - atanh (arctanh) - bartlett_window (bartlett) - hamming_window (hamming) - hann_window (hanning) - bitwise_not (invert) - gt (greater) - ge (greater_equal) - lt (less) - le (less_equal) - ne (not_equal) - ger (outer) Pull Request resolved: https://github.com/pytorch/pytorch/pull/42586 Reviewed By: ngimel Differential Revision: D22991086 Pulled By: mruberry fbshipit-source-id: d6ac96512d095b261ed2f304d7dddd38cf45e7b0	2020-08-07 00:24:06 -07:00
Chunli Fu	cb1ac94069	[blob reorder] Seperate user embeddings and ad embeddings in large model loading script Summary: Put user embedding before ads embedding in blobReorder, for flash verification reason. Test Plan: ``` buck run mode/opt-clang -c python.package_style=inplace sigrid/predictor/scripts:enable_large_model_loading -- --model_path_src="/home/$USER/models/" --model_path_dst="/home/$USER/models_modified/" --model_file_name="182560549_0.predictor" ``` https://www.internalfb.com/intern/anp/view/?id=320921 to check blobsOrder Reviewed By: yinghai Differential Revision: D22964332 fbshipit-source-id: 78b4861476a3c889a5ff62492939f717c307a8d2	2020-08-06 23:54:03 -07:00
Yanan Cao	9597af01ca	Support iterating through an Enum class (#42661 ) Summary: [5/N] Implement Enum JIT support Implement Enum class iteration Add aten.ne for EnumType Supported: Enum-typed function arguments using Enum type and comparing them Support getting name/value attrs of enums Using Enum value as constant Support Enum-typed return values Support iterating through Enum class (enum value list) TODO: Support serialization and deserialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/42661 Reviewed By: SplitInfinity Differential Revision: D22977364 Pulled By: gmagogsfm fbshipit-source-id: 1a0216f91d296119e34cc292791f9aef1095b5a8	2020-08-06 22:56:34 -07:00
Bert Maher	952526804c	Print TE CUDA kernel (#42692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42692 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D22986112 Pulled By: bertmaher fbshipit-source-id: 52ec3389535c8b276858bef8c470a59aeba4946f	2020-08-06 20:42:04 -07:00
BowenBao	a6c8730045	[ONNX] Add preprocess pass for onnx export (#41832 ) Summary: in `_jit_pass_onnx`, symbolic functions are called for each node for conversion. However, there are nodes that cannot be converted without additional context. For example, the number of outputs from split (and whether it is static or dynamic) is unknown until the point where it is unpacked by listUnpack node. This pass does a preprocess, and prepares the nodes such that enough context can be received by the symbolic function. * After preprocessing, `_jit_pass_onnx` should have enough context to produce valid ONNX nodes, instead of half baked nodes that replies on fixes from later postpasses. * `_jit_pass_onnx_peephole` should be a pass that does ONNX specific optimizations instead of ONNX specific fixes. * Producing more valid ONNX nodes in `_jit_pass_onnx` enables better utilization of the ONNX shape inference https://github.com/pytorch/pytorch/issues/40628. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41832 Reviewed By: ZolotukhinM Differential Revision: D22968334 Pulled By: bzinodev fbshipit-source-id: 8226f03c5b29968e8197d242ca8e620c6e1d42a5	2020-08-06 20:34:12 -07:00
Paul Shao	9152f2f73a	Optimization of Backward Implementation for Learnable Fake Quantize Per Tensor Kernels (CPU and GPU) (#42384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42384 In this diff, the original backward pass implementation is sped up by merging the 3 iterations computing dX, dScale, and dZeroPoint separately. In this case, a native loop is directly used on a byte-wise level (referenced by `strides`). In the benchmark test on the operators, for an input of shape `3x3x256x256`, we have observed the following improvement in performance: - original python operator: 1021037 microseconds - original learnable kernel: 407576 microseconds - optimized learnable kernel: 102584 microseconds - original non-backprop kernel: 139806 microseconds Speedup from python operator: ~10x Speedup from original learnable kernel: ~4x Speedup from non-backprop kernel: ~1.2x Test Plan: To assert correctness of the new kernel, on a devvm, enter the command `buck test //caffe2/test:quantization -- learnable_backward_per_tensor` To benchmark the operators, on a devvm, enter the command 1. Set the kernel size to 3x3x256x256 or a reasonable input size. 2. Run `buck test //caffe2/benchmarks/operator_benchmark/pt:quantization_test` 3. The relevant outputs are as follows: (CPU) ``` # Benchmarking PyTorch: FakeQuantizePerTensorOpBenchmark # Mode: Eager # Name: FakeQuantizePerTensorOpBenchmark_N3_C3_H256_W256_nbits4_cpu_op_typepy_module # Input: N: 3, C: 3, H: 256, W: 256, device: cpu, op_type: py_module Backward Execution Time (us) : 1021036.957 # Benchmarking PyTorch: FakeQuantizePerTensorOpBenchmark # Mode: Eager # Name: FakeQuantizePerTensorOpBenchmark_N3_C3_H256_W256_nbits4_cpu_op_typelearnable_kernel # Input: N: 3, C: 3, H: 256, W: 256, device: cpu, op_type: learnable_kernel Backward Execution Time (us) : 102583.693 # Benchmarking PyTorch: FakeQuantizePerTensorOpBenchmark # Mode: Eager # Name: FakeQuantizePerTensorOpBenchmark_N3_C3_H256_W256_nbits4_cpu_op_typeoriginal_kernel # Input: N: 3, C: 3, H: 256, W: 256, device: cpu, op_type: original_kernel Backward Execution Time (us) : 139806.086 ``` (GPU) ``` # Benchmarking PyTorch: FakeQuantizePerChannelOpBenchmark # Mode: Eager # Name: FakeQuantizePerChannelOpBenchmark_N3_C3_H256_W256_cuda_op_typepy_module # Input: N: 3, C: 3, H: 256, W: 256, device: cuda, op_type: py_module Backward Execution Time (us) : 6548.350 # Benchmarking PyTorch: FakeQuantizePerChannelOpBenchmark # Mode: Eager # Name: FakeQuantizePerChannelOpBenchmark_N3_C3_H256_W256_cuda_op_typelearnable_kernel # Input: N: 3, C: 3, H: 256, W: 256, device: cuda, op_type: learnable_kernel Backward Execution Time (us) : 1340.724 # Benchmarking PyTorch: FakeQuantizePerChannelOpBenchmark # Mode: Eager # Name: FakeQuantizePerChannelOpBenchmark_N3_C3_H256_W256_cuda_op_typeoriginal_kernel # Input: N: 3, C: 3, H: 256, W: 256, device: cuda, op_type: original_kernel Backward Execution Time (us) : 656.863 ``` Reviewed By: vkuzo Differential Revision: D22875998 fbshipit-source-id: cfcd62c327bb622270a783d2cbe97f00508c4a16	2020-08-06 19:54:17 -07:00
shubhambhokare1	4959981cff	[ONNX] Export tensor (#41872 ) Summary: Adding tensor symbolic for opset 9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41872 Reviewed By: houseroad Differential Revision: D22968426 Pulled By: bzinodev fbshipit-source-id: 70e1afc7397e38039e2030e550fd72f09bac7c7c	2020-08-06 19:33:11 -07:00
Spandan Tiwari	40ac95dd3c	[ONNX] Update ONNX export of torch.where to support ByteTensor as input. (#42264 ) Summary: `torch.where` supports `ByteTensor` and `BoolTensor` types for the first input argument (`condition` predicate). Currently, ONNX exporter assumes that the first argument is `BoolTensor`. This PR updates the export for `torch.where` to correctly support export when first argument is a `ByteTensor`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42264 Reviewed By: houseroad Differential Revision: D22968473 Pulled By: bzinodev fbshipit-source-id: 7306388c8446ef3faeb86dc89d72d1f72c1c2314	2020-08-06 19:16:39 -07:00
Ilia Cherniavskii	f9a6c14364	Fix sequence numbers in profiler output (#42565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42565 After recent changes to the record function we record more ranges in profiler output and also keep emitting sequence numbers for all ranges. Sequence numbers are used by external tools to correlate forward and autograd ranges and with many ranges having the same sequence number it becomes impossible to do this. This PR ensures that we set sequence numbers only for the top-level ranges and only in case when autograd is enabled. Test Plan: nvprof -fo trace.nvvp --profile-from-start off python test_script.py test_script https://gist.github.com/ilia-cher/2baffdd98951ee2a5f2da56a04fe15d0 then examining ranges in nvvp Reviewed By: ngimel Differential Revision: D22938828 Pulled By: ilia-cher fbshipit-source-id: 9a5a076706a6043dfa669375da916a1708d12c19	2020-08-06 19:12:05 -07:00
Nikita Shulga	dab9bbfce7	Move jit_profiling tests into test1 on Windows (#42650 ) Summary: Test takes 5 min to finish and 5 min to spin up the environment, so it doesn't make much sense to keep it as separate config Limit those tests to be run only when `USE_CUDA` environment variable is set to tru Pull Request resolved: https://github.com/pytorch/pytorch/pull/42650 Reviewed By: ailzhang Differential Revision: D22967817 Pulled By: malfet fbshipit-source-id: c6c26df140059491e7ff53ee9cbbc93433d2f36f	2020-08-06 16:16:40 -07:00
Peter Bell	33519e19ab	Fix 64-bit indexing in GridSampler (#41923 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41656 For the CPU version, this is a regression introduced in https://github.com/pytorch/pytorch/issues/10980 which vectorized the `grid_sampler_2d` implementation. It uses the AVX2 gather intrinsic which for `float` requires 32-bit indexing to match the number of floats in the AVX register. There is also an `i64gather_ps` variant but this only utilizes half of the vector width so would be expected to give worse performance in the more likely case where 32-bit indexing is acceptable. So, I've left the optimised AVX version as-is and reinstated the old non-vectorized version as a fallback. For the CUDA version, this operation has never supported 32-bit indexing so this isn't a regression. I've templated the kernel on index type and added 64-bit variants. Although I gather in some places a simple `TORCH_CHECK(canUse32BitIndexMath(...))` is used instead. So, there is a decision to be made here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41923 Reviewed By: glaringlee Differential Revision: D22925931 Pulled By: zou3519 fbshipit-source-id: 920816107aae26360c5e7f4e9c729fa9057268bb	2020-08-06 16:08:09 -07:00
Nikita Shulga	eaace3e10e	Skip CUDA benchmarks on nogpu configs (#42704 ) Summary: Avoids timeouts when the benchmark is launched on nogpu configs Pull Request resolved: https://github.com/pytorch/pytorch/pull/42704 Reviewed By: mruberry Differential Revision: D22987725 Pulled By: malfet fbshipit-source-id: aa9aece16557c0af8e05e612277ae1d9e0173a51	2020-08-06 15:47:48 -07:00
Mike Ruberry	6cb0807f88	Fixes ROCm CI (#42701 ) Summary: Per title. ROCm CI doesn't have MKL so this adds a couple missing test annotations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42701 Reviewed By: ngimel Differential Revision: D22986273 Pulled By: mruberry fbshipit-source-id: efa717e2e3771562e9e82d1f914e251918e96f64	2020-08-06 15:24:50 -07:00
Mikhail Zolotukhin	cc596ac3a8	[JIT] Add debug dumps in between passes in graph executor. (#42688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42688 Both the profiling executor and the legacy executor have the debug loggin now. Ideally, if we had a pass manager, this could be done as a part of it, but since we have none, I had to insert the debug statements manually. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D22981675 Pulled By: ZolotukhinM fbshipit-source-id: 22b8789e860aa90d5802fc72a4113b22c6fc4da5	2020-08-06 15:16:35 -07:00
Stephen Chen	cdd7db1ffc	Bound shape inferencer: fix int8fc scale and bias Summary: Previous when inferring Int8FC, we failed to carry over the scale and zero point properly. Also fixed int8 FC weight data type to be int8 instead of uint8 as that's what C2 actually uses. Test Plan: Use net_runner to lower a single Int8Dequantize op. Previous scale and bias would always be 1 and 0. Now the proper value is set. Reviewed By: yinghai Differential Revision: D22912186 fbshipit-source-id: a6620c3493e492bdda91da73775bfc9117db12d1	2020-08-06 14:40:25 -07:00
Sebastian Messmer	b44a10c179	List[index]::toOptionalStringRef (#42263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42263 Allow a way to get a reference to the stored string in a `List<optional<string>>` without having to copy the string. This for example improves perf of the map_lookup op by 3x. ghstack-source-id: 109162026 Test Plan: unit tests Reviewed By: ezyang Differential Revision: D22830381 fbshipit-source-id: e6af2bc8cebd6e68794eb18daf183979bc6297ae	2020-08-06 13:44:33 -07:00
Rohan Varma	f22aa601ce	All Gather and gather APIs for Python Objects (#42189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42189 Rehash of https://github.com/pytorch/pytorch/pull/28811, which was several months old. As part of addressing https://github.com/pytorch/pytorch/issues/23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. ghstack-source-id: 109322433 Reviewed By: mrshenli Differential Revision: D22785387 fbshipit-source-id: a265a44ec0aa3aaffc3c6966023400495904c7d8	2020-08-06 13:30:25 -07:00
Basil Hosmer	1f689b6ef9	suppress all Autograd keys in AutoNonVariableTypeMode (#42610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42610 Fix for https://github.com/pytorch/pytorch/issues/42609: `AutoNonVariableTypeMode` should suppress all autograd dispatch keys, not just `Autograd` (e.g. `XLAPreAutograd`, `PrivateUse<N>_PreAutograd`) Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22963408 Pulled By: bhosmer fbshipit-source-id: 2f3516580ce0c9136aff5e025285d679394f2f18	2020-08-06 13:15:42 -07:00
Mike Ruberry	85a00c4c92	Skips spectral tests to prevent ROCm build from timing out (#42667 ) Summary: Per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42667 Reviewed By: ailzhang Differential Revision: D22978531 Pulled By: mruberry fbshipit-source-id: 0c3ba116836ed6c433e2c6a0e1a0f2e3c94c7803	2020-08-06 12:41:32 -07:00
Edward Yang	40b6dacb50	Delete dead is_named_tensor_only (#42672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42672 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D22978389 Pulled By: ezyang fbshipit-source-id: ef1302c57fe26a58a46ca1f4a4a7c3e2cdbfdc5d	2020-08-06 12:19:44 -07:00
Presley Graham	5ca08b8891	Add benchmark for calculate_qparams (#42138 ) Summary: Adds a benchmark for `HistogramObserver.calculate_qparams` to the quantized op benchmarks. The next diff in this stack adds a ~15x speedup for this benchmark. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42138 Test Plan: While in the folder `benchmarks/operator_benchmark`, the benchmark can be run using `python -m benchmark_all_quantized_test --operators HistogramObserverCalculateQparams`. Benchmark results before speedup: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: HistogramObserverCalculateQparams # Mode: Eager # Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_affine # Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_affine Forward Execution Time (us) : 185818.566 # Benchmarking PyTorch: HistogramObserverCalculateQparams # Mode: Eager # Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_symmetric # Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_symmetric Forward Execution Time (us) : 165325.916 ``` Benchmark results after speedup: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: HistogramObserverCalculateQparams # Mode: Eager # Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_affine # Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_affine Forward Execution Time (us) : 12242.241 # Benchmarking PyTorch: HistogramObserverCalculateQparams # Mode: Eager # Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_symmetric # Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_symmetric Forward Execution Time (us) : 12655.354 ``` Reviewed By: supriyar Differential Revision: D22779291 Pulled By: durumu fbshipit-source-id: 1fe17d20eda5dd99e0e2590480142034c3574d4e	2020-08-06 11:10:12 -07:00
Xiang Gao	79de9c028a	Remove VS2017 workaround for autocasting (#42352 ) Summary: Because VS2017 is no longer supported after https://github.com/pytorch/pytorch/pull/42144 cc: mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/42352 Reviewed By: malfet Differential Revision: D22962809 Pulled By: ngimel fbshipit-source-id: 0346cde87bf5d617dfc0d7b34c92ac6ec5bbf568	2020-08-06 11:03:34 -07:00
Zino Benaissa	e28a98a904	Turn on non ASCII string literals serialization (#40719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40719 This is a follow up patch to turn on this feature in order to handle breaking forward compatibility. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D22457952 Pulled By: bzinodev fbshipit-source-id: fac0dfed8b8b5fa2d52d342ee8cf06742959b3c5	2020-08-06 10:47:09 -07:00
Mikhail Zolotukhin	57854e7f08	[JIT] Clone runOptimizations and similar functions for profiling executor. (#42656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42656 Thing change will allow us to more freely experiment with pass pipelines in the profiling executor without affecting passes in the legacy executor. Also, it somewhat helps to keep all passes in one place to be able to tell what's going on. Currently this change should not affect any behavior as I copied the passes exactly as they've been invoked before, but we will probably want to change these pipelines in a near future. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D22971050 Pulled By: ZolotukhinM fbshipit-source-id: f5bb60783a553c7b51c5343eec7f8fe40037ff99	2020-08-06 10:43:28 -07:00
Bert Maher	a4dbc64800	Add documentation for PYTORCH_JIT_TYPE_VERBOSITY (#42241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42241 that's it Test Plan: docs only Reviewed By: SplitInfinity Differential Revision: D22818705 fbshipit-source-id: 22cdf4f23c3ed0a15c23f116457fc842d7f7b520	2020-08-06 10:39:39 -07:00
Will Constable	65066d779b	Add fastrnns benchmark to CI and upload data to scribe (#42030 ) Summary: Run fastrnns benchmark using pytest-benchmark infra, then parse its json format and upload to scribe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42030 Reviewed By: malfet Differential Revision: D22970270 Pulled By: wconstab fbshipit-source-id: 87da9b7ddf741da14b80d20779771d19123be3c5	2020-08-06 10:30:27 -07:00
Ehsan K. Ardestani	a5af2434fe	NVMified NE Eval Summary: This diff NVMifies the NE Eval Flow. - It defines a `LoadNVM` operator which either - receives a list of nvm blobs, or - extracts the blobs that could be NVMified from the model. - dumps NVMified blobs into NVM - and deallocates from DRAM - NVMify the Eval net on dper and C2 backend Specific NVMOp for SLS is pushed through different diffs. Test Plan: flow-cli test-locally dper.workflows.evaluation.eval_workflow --parameters-file=/mnt/public/ehsaardestani/temp/small_model.json 2>&1 \| tee log Reviewed By: yinghai, amylittleyang Differential Revision: D22469973 fbshipit-source-id: ed8379ad404e96d04ac05e580176d3aca984575b	2020-08-06 10:25:31 -07:00
Jeff Daily	049c1b97be	pin numpy version to 1.18.5 (#42670 ) Summary: Using numpy 1.19.x instead of 1.18.x breaks certain unit tests. Fixes https://github.com/pytorch/pytorch/issues/42561. Likely also fixes https://github.com/pytorch/pytorch/issues/42583. CC ezyang xw285cornell sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42670 Reviewed By: ezyang Differential Revision: D22978369 Pulled By: malfet fbshipit-source-id: ce1f35c7ba620c2b9dd10613f39354cebee8b87d	2020-08-06 10:01:56 -07:00
Ralf Gommers	bcab2d6848	And type annotations for cpp_extension, utils.data, signal_handling (#42647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42647 Reviewed By: ezyang Differential Revision: D22967041 Pulled By: malfet fbshipit-source-id: 35e124da0be56934faef56834a93b2b400decf66	2020-08-06 09:42:07 -07:00
Xiang Gao	608f99e4ea	Fix cudnn version on build_environment of Windows CI (#42615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42615 Reviewed By: mrshenli Differential Revision: D22958660 Pulled By: malfet fbshipit-source-id: 97a6a0e769143bd161667d0ee081ea0751995775	2020-08-06 09:36:24 -07:00
Xiang Gao	576aab5084	Bump up NCCL to 2.7.6 (#42645 ) Summary: Because 2.7.3 has some bug on GA100 which is fixed in 2.7.6 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42645 Reviewed By: malfet Differential Revision: D22977280 Pulled By: mrshenli fbshipit-source-id: 74779eff90d7d660a988ff33659f3a2237ca7e29	2020-08-06 08:45:59 -07:00
Shen Li	0642d17efc	Enable C++ RPC tests (#42636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42636 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D22967777 Pulled By: mrshenli fbshipit-source-id: 8816c190a4ead7d7f906c140c8a4e76b992f5502	2020-08-06 07:15:02 -07:00
Luca Wehrstedt	c30bc6d4d7	Update TensorPipe submodule (#42522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42522 Main changes: - Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch. - Changed the way the preprocessor flags are provided, and changed their name. There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header. So instead we link those targets to the tensorpipe target in order for them to pick up the correct include directories. I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one. Test Plan: CI Reviewed By: malfet Differential Revision: D22959472 fbshipit-source-id: 1959a41c4a66ef78bf0f3bd5e3964969a2a1bf67	2020-08-06 02:14:58 -07:00
Luca Wehrstedt	bd458b7d02	Don't reference TensorPipe headers in our headers (#42521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42521 PyTorch's usage of TensorPipe is entirely wrapped within the RPC agent, which means we only need access to TensorPipe within the implementation (the .cpp file) and not in the interface (the .h file). We were however including the TensorPipe headers from the public PyTorch headers, which meant that PyTorch's downstream users had to have the TensorPipe include directories for that to work. By forward-declaring the symbols we need in the PyTorch header, and then including the TensorPipe header in the PyTorch implementation, we avoid "leaking" the dependency on TensorPipe, thus effectively keeping it private. Test Plan: Imported from OSS Reviewed By: beauby Differential Revision: D22944238 Pulled By: lw fbshipit-source-id: 2b12d59bd5beeaa439e50f9088a792c9d9bae9e8	2020-08-06 02:14:00 -07:00
Ilia Cherniavskii	a53fdaa23f	Remove ProfiledType (#42570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42570 ProfiledType doesn't do anything and is not used atm, removing Test Plan: CI Reviewed By: ezyang Differential Revision: D22938664 Pulled By: ilia-cher fbshipit-source-id: 037c512938028f44258b702bbcde3f8c144f4aa0	2020-08-06 01:52:08 -07:00
Mike Ruberry	ccfce9d4a9	Adds fft namespace (#41911 ) Summary: This PR creates a new namespace, torch.fft (torch::fft) and puts a single function, fft, in it. This function is analogous to is a simplified version of NumPy's [numpy.fft.fft](https://numpy.org/doc/1.18/reference/generated/numpy.fft.fft.html?highlight=fft#numpy.fft.fft) that accepts no optional arguments. It is intended to demonstrate how to add and document functions in the namespace, and is not intended to deprecate the existing torch.fft function. Adding this namespace was complicated by the existence of the torch.fft function in Python. Creating a torch.fft Python module makes this name ambiguous: does it refer to a function or module? If the JIT didn't exist, a solution to this problem would have been to make torch.fft refer to a callable class that mimicked both the function and module. The JIT, however, cannot understand this pattern. As a workaround it's required to explicitly `import torch.fft` to access the torch.fft.fft function in Python: ``` import torch.fft t = torch.randn(128, dtype=torch.cdouble) torch.fft.fft(t) ``` See https://github.com/pytorch/pytorch/issues/42175 for future work. Another possible future PR is to get the JIT to understand torch.fft as a callable class so it need not be imported explicitly to be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41911 Reviewed By: glaringlee Differential Revision: D22941894 Pulled By: mruberry fbshipit-source-id: c8e0b44cbe90d21e998ca3832cf3a533f28dbe8d	2020-08-06 00:20:50 -07:00
Akash Patel	644d787cd8	find rccl properly (#42072 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/42072 Reviewed By: malfet Differential Revision: D22969778 Pulled By: ezyang fbshipit-source-id: 509178775d4d99460bcb147bcfced29f04cabdc4	2020-08-05 21:46:38 -07:00
Kurt Mohler	23607441c2	Create CuBLAS PointerModeGuard (#42639 ) Summary: Adds an RAII guard for `cublasSetPointerMode()`. Updates `dot_cuda` to use the guard, rather than exception catching. Addresses this comment: https://github.com/pytorch/pytorch/pull/41377#discussion_r465754082 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42639 Reviewed By: malfet Differential Revision: D22969985 Pulled By: ezyang fbshipit-source-id: b05c35d1884bb890f8767d6a4ef8b4724a329471	2020-08-05 21:40:42 -07:00
Masaki Kozuki	eb9ae7c038	Implement `gpu_kernel_multiple_outputs` (#37969 ) Summary: This PR introduces a variant of `gpu_kernel` for functions that return multiple values with `thrust::tuple`. With this I simplified `prelu_cuda_backward_share_weights_kernel`. ### Why using `thrust::tuple`? Because `std::tuple` does not support `operator=` on device code which makes the implementation complicated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37969 Reviewed By: paulshaoyuqiao Differential Revision: D22868670 Pulled By: ngimel fbshipit-source-id: eda0a29ac0347ad544b24bf60e3d809a7db1a929	2020-08-05 21:17:08 -07:00
Alexandru Suhan	1848b43c4d	[NNC] Add loop unroll transformation (#42465 ) Summary: Unroll a loop with constant boundaries, replacing it with multiple instances of the loop body. For example: ``` for x in 0..3: A[x] = x*2 ``` becomes: ``` A[0] = 0 A[1] = 2 A[2] = 4 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42465 Test Plan: `test_tensorexpr` unit tests. Reviewed By: agolynski Differential Revision: D22914418 Pulled By: asuhan fbshipit-source-id: 72ca10d7c0b1ac7f9a3688ac872bd94a1c53dc51	2020-08-05 20:46:32 -07:00
Hameer Abbasi	3d46e02ea1	Add __torch_function__ for methods (#37091 ) Summary: According to pytorch/rfcs#3 From the goals in the RFC: 1. Support subclassing `torch.Tensor` in Python (done here) 2. Preserve `torch.Tensor` subclasses when calling `torch` functions on them (done here) 3. Use the PyTorch API with `torch.Tensor`-like objects that are _not_ `torch.Tensor` subclasses (done in https://github.com/pytorch/pytorch/issues/30730) 4. Preserve `torch.Tensor` subclasses when calling `torch.Tensor` methods. (done here) 5. Propagating subclass instances correctly also with operators, using views/slices/indexing/etc. (done here) 6. Preserve subclass attributes when using methods or views/slices/indexing. (done here) 7. A way to insert code that operates on both functions and methods uniformly (so we can write a single function that overrides all operators). (done here) 8. The ability to give external libraries a way to also define functions/methods that follow the `__torch_function__` protocol. (will be addressed in a separate PR) This PR makes the following changes: 1. Adds the `self` argument to the arg parser. 2. Dispatches on `self` as well if `self` is not `nullptr`. 3. Adds a `torch._C.DisableTorchFunction` context manager to disable `__torch_function__`. 4. Adds a `torch::torch_function_enabled()` and `torch._C._torch_function_enabled()` to check the state of `__torch_function__`. 5. Dispatches all `torch._C.TensorBase` and `torch.Tensor` methods via `__torch_function__`. TODO: - [x] Sequence Methods - [x] Docs - [x] Tests Closes https://github.com/pytorch/pytorch/issues/28361 Benchmarks in https://github.com/pytorch/pytorch/pull/37091#issuecomment-633657778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37091 Reviewed By: ngimel Differential Revision: D22765678 Pulled By: ezyang fbshipit-source-id: 53f8aa17ddb8b1108c0997f6a7aa13cb5be73de0	2020-08-05 20:44:13 -07:00
Rui Liu	92b7347fd7	Enforce counter value to double type in rowwise_counter Summary: Enforce counter value to double type in rowwise_counter. Context: The existing implementation is using float type for counter value. But due to the precision limit of a floating number [1], we observed that the counter value can't increment beyond 16777216.0 (i.e., the max value is 16777216.0) in our earlier experiments. We decide to enforce double type to avoid this issue. [1] https://stackoverflow.com/questions/12596695/why-does-a-float-variable-stop-incrementing-at-16777216-in-c Test Plan: op test ``` ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/python/operator_test(f0b0b48c)$ buck test :rowwise_counter_test Trace available for this run at /tmp/testpilot.20200728-083200.729292.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision cd2638f1f47250eac058b8c36561760027d16add fbpkg f88726c8ebde4ba288e1172a348c7f46 at Mon Jul 27 18:11:43 2020 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/887/t.par Discovering tests Running 1 test Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/7881299364977047 ✓ caffe2/caffe2/python/operator_test:rowwise_counter_test - test_rowwise_counter (caffe2.caffe2.python.operator_test.rowwise_counter_test.TestRowWiseCounter) 0.265 1/1 (passed) ✓ caffe2/caffe2/python/operator_test:rowwise_counter_test - main 14.414 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7881299364977047 Summary (total time 18.51s): PASS: 2 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` optimizer test ``` ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/python(7d66fbb9)$ buck test :optimizer_test Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7036874434841896 Summary (total time 64.87s): PASS: 48 FAIL: 0 SKIP: 24 caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestMomentumSgd) caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestGFtrl) caffe2/caffe2/python:optimizer_test - test_caffe2_cpu_vs_numpy (caffe2.caffe2.python.optimizer_test.TestYellowFin) caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestSparseRAdam) caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestRowWiseAdagradWithCounter) caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestAdagrad) caffe2/caffe2/python:optimizer_test - test_caffe2_gpu_vs_numpy (caffe2.caffe2.python.optimizer_test.TestYellowFin) caffe2/caffe2/python:optimizer_test - testDense (caffe2.caffe2.python.optimizer_test.TestRowWiseAdagrad) caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestFtrl) caffe2/caffe2/python:optimizer_test - testSparse (caffe2.caffe2.python.optimizer_test.TestRmsProp) ...and 14 more not shown... FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` param download test ``` ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/fb/net_transforms/tests(7ef20a38)$ sudo buck test :param_download_test Finished test run: Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/6473924481526935 ``` e2e flow: f208394929 f207991149 f207967273 ANP notebook to check the counter value loaded from the flows https://fburl.com/anp/5fdcbnoi screenshot of the loaded counter (note that counter max is larger than 16777216.0) {F250926501} Reviewed By: ellie-wen Differential Revision: D22711514 fbshipit-source-id: 426fed7415270aa3f276dda8141907534734337f	2020-08-05 20:40:51 -07:00
Xiang Gao	c14fbc36ed	Update docs about CUDA stream priority (#41364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41364 Reviewed By: malfet Differential Revision: D22962856 Pulled By: ngimel fbshipit-source-id: 47f65069516cb555579455e8680deb937fc1f544	2020-08-05 20:03:18 -07:00
Max Berrendorf	ddb8849ffc	Fix method stub used for fixing mypy issue to work with pylint (#42356 ) Summary: Make function from method Since _forward_unimplemented is defined within the nn.Module class, pylint (correctly) complains about not implementing this method in subclasses. Fixes https://github.com/pytorch/pytorch/issues/42305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42356 Reviewed By: mruberry Differential Revision: D22867255 Pulled By: ezyang fbshipit-source-id: ccf3e45e359d927e010791fadf70b2ef231ddb0b	2020-08-05 19:57:38 -07:00
Darius Tan	04d7e1679d	[quant] Quantized Average Pool Refactoring (#42009 ) Summary: cc z-a-f. Refactor `qavg_pool(2,3)d_nhwc_kernel` as mentioned in https://github.com/pytorch/pytorch/issues/40316. # Benchmarks ## Python Before \| After ![before_after](https://user-images.githubusercontent.com/37529096/88401550-fea7ba80-ce1d-11ea-81c5-3ae912e81e8f.png) ## C++ ![before_after_cpp](https://user-images.githubusercontent.com/37529096/88401845-5ba37080-ce1e-11ea-9bf2-3c95ac2b4b49.png) ## Notes - It does seem that for `qint8` and `quint8` there is a noticeable 2x increase in speed at least when the `channels > 64` in the benchmarks. ## Reproduce ### Python ``` import time import numpy as np import torch from termcolor import colored def time_avg_pool2d(X, kernel, stride, padding, ceil_mode, count_include_pad, divisor_override, iterations): X, (scale, zero_point, torch_type) = X qX_nchw = torch.quantize_per_tensor(torch.from_numpy(X), scale=scale, zero_point=zero_point, dtype=torch_type) qX_nhwc = qX_nchw.contiguous(memory_format=torch.channels_last) assert(qX_nhwc.stride() != sorted(qX_nhwc.stride())) assert(qX_nchw.is_contiguous(memory_format=torch.contiguous_format)) assert(qX_nhwc.is_contiguous(memory_format=torch.channels_last)) start = time.time() for _ in range(iterations): X_hat = torch.nn.quantized.functional.avg_pool2d(qX_nchw, kernel_size=kernel, stride=stride, padding=padding, ceil_mode=ceil_mode, count_include_pad=count_include_pad, divisor_override=divisor_override) qnchw_end = time.time() - start start = time.time() for _ in range(iterations): X_hat = torch.nn.quantized.functional.avg_pool2d(qX_nhwc, kernel_size=kernel, stride=stride, padding=padding, ceil_mode=ceil_mode, count_include_pad=count_include_pad, divisor_override=divisor_override) qnhwc_end = time.time() - start return qnchw_end1000/iterations, qnhwc_end1000/iterations def time_avg_pool3d(X, kernel, stride, padding, ceil_mode, count_include_pad, divisor_override, iterations): X, (scale, zero_point, torch_type) = X qX_ncdhw = torch.quantize_per_tensor(torch.from_numpy(X), scale=scale, zero_point=zero_point, dtype=torch_type) qX_ndhwc = qX_ncdhw.contiguous(memory_format=torch.channels_last_3d) assert(qX_ndhwc.stride() != sorted(qX_ndhwc.stride())) assert(qX_ncdhw.is_contiguous(memory_format=torch.contiguous_format)) assert(qX_ndhwc.is_contiguous(memory_format=torch.channels_last_3d)) start = time.time() for _ in range(iterations): X_hat = torch.nn.quantized.functional.avg_pool3d(qX_ncdhw, kernel_size=kernel, stride=stride, padding=padding, ceil_mode=ceil_mode, count_include_pad=count_include_pad, divisor_override=divisor_override) qncdhw_end = time.time() - start start = time.time() for _ in range(iterations): X_hat = torch.nn.quantized.functional.avg_pool3d(qX_ndhwc, kernel_size=kernel, stride=stride, padding=padding, ceil_mode=ceil_mode, count_include_pad=count_include_pad, divisor_override=divisor_override) qndhwc_end = time.time() - start return qncdhw_end1000/iterations, qndhwc_end1000/iterations iterations = 10000 print("iterations = {}".format(iterations)) print("Benchmark", "Time(ms)", sep="\t\t\t\t\t") for torch_type in (torch.qint8, torch.quint8, torch.qint32): for channel in (4,8,64,256): X = np.random.rand(1, channel, 56, 56).astype(np.float32), (0.5, 1, torch_type) ts = time_avg_pool2d(X, 4, None, 0, True, True, None, iterations) print(colored("avg_pool2d({}, {}, {})".format(str(torch_type), channel, "nchw"), 'green'), colored(ts[0], 'yellow'), sep="\t") print(colored("avg_pool2d({}, {}, {})".format(str(torch_type), channel, "nhwc"), 'green'), colored(ts[1], 'yellow'), sep="\t") for torch_type in (torch.qint8, torch.quint8, torch.qint32): for channel in (4,8,64,256): X = np.random.rand(1, channel, 56, 56, 4).astype(np.float32), (0.5, 1, torch_type) ts = time_avg_pool3d(X, 4, None, 0, True, True, None, iterations) print(colored("avg_pool3d({}, {}, {})".format(str(torch_type), channel, "ncdhw"), 'green'), colored(ts[0], 'yellow'), sep="\t") print(colored("avg_pool3d({}, {}, {})".format(str(torch_type), channel, "ndhwc"), 'green'), colored(ts[1], 'yellow'), sep="\t") ``` ### C++ 1. `git clone https://github.com/google/benchmark.git` 2. `git clone https://github.com/google/googletest.git benchmark/googletest` ``` # CMakeLists.txt cmake_minimum_required(VERSION 3.10 FATAL_ERROR) project(time_avg_pool VERSION 0.1.0) find_package(Torch REQUIRED) set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}") add_subdirectory(benchmark) add_executable(time_average_pool time_average_pool.cpp) target_link_libraries(time_average_pool ${TORCH_LIBRARIES}) set_property(TARGET time_average_pool PROPERTY CXX_STANDARD 14) target_link_libraries(time_average_pool benchmark::benchmark) ``` ``` // time_average_pool.cpp #include <benchmark/benchmark.h> #include <torch/torch.h> torch::Device device(torch::kCPU); static void BM_TORCH_QAVG_POOL2D_NCHW_SINGLE_THREADED(benchmark::State& state) { torch::init_num_threads(); torch::set_num_threads(1); auto x_nchw = torch::rand({1, state.range(0), 56, 56}, device); auto qx_nchw = torch::quantize_per_tensor(x_nchw, 0.5, 1, torch::kQUInt8); torch::Tensor X_hat; for (auto _ : state) X_hat = torch::nn::functional::avg_pool2d( qx_nchw, torch::nn::AvgPool2dOptions({4, 4}).ceil_mode(true).count_include_pad( true)); } static void BM_TORCH_QAVG_POOL2D_NHWC_SINGLE_THREADED(benchmark::State& state) { torch::init_num_threads(); torch::set_num_threads(1); auto x_nchw = torch::rand({1, state.range(0), 56, 56}, device); auto qx_nchw = torch::quantize_per_tensor(x_nchw, 0.5, 1, torch::kQUInt8); auto qx_nhwc = qx_nchw.contiguous(torch::MemoryFormat::ChannelsLast); torch::Tensor X_hat; for (auto _ : state) X_hat = torch::nn::functional::avg_pool2d( qx_nhwc, torch::nn::AvgPool2dOptions({4, 4}).ceil_mode(true).count_include_pad( true)); } static void BM_TORCH_QAVG_POOL2D_NCHW(benchmark::State& state) { auto x_nchw = torch::rand({1, state.range(0), 56, 56}, device); auto qx_nchw = torch::quantize_per_tensor(x_nchw, 0.5, 1, torch::kQUInt8); torch::Tensor X_hat; for (auto _ : state) X_hat = torch::nn::functional::avg_pool2d( qx_nchw, torch::nn::AvgPool2dOptions({4, 4}).ceil_mode(true).count_include_pad( true)); } static void BM_TORCH_QAVG_POOL2D_NHWC(benchmark::State& state) { auto x_nchw = torch::rand({1, state.range(0), 56, 56}, device); auto qx_nchw = torch::quantize_per_tensor(x_nchw, 0.5, 1, torch::kQUInt8); auto qx_nhwc = qx_nchw.contiguous(torch::MemoryFormat::ChannelsLast); torch::Tensor X_hat; for (auto _ : state) X_hat = torch::nn::functional::avg_pool2d( qx_nhwc, torch::nn::AvgPool2dOptions({4, 4}).ceil_mode(true).count_include_pad( true)); } static void BM_TORCH_QAVG_POOL3D_NCDHW_SINGLE_THREADED( benchmark::State& state) { torch::init_num_threads(); torch::set_num_threads(1); auto x_ncdhw = torch::rand({1, state.range(0), 56, 56, 4}, device); auto qx_ncdhw = torch::quantize_per_tensor(x_ncdhw, 0.5, 1, torch::kQUInt8); torch::Tensor X_hat; for (auto _ : state) X_hat = torch::nn::functional::avg_pool3d( qx_ncdhw, torch::nn::AvgPool3dOptions({5, 5, 5}) .ceil_mode(true) .count_include_pad(true)); } static void BM_TORCH_QAVG_POOL3D_NDHWC_SINGLE_THREADED( benchmark::State& state) { torch::init_num_threads(); torch::set_num_threads(1); auto x_ncdhw = torch::rand({1, state.range(0), 56, 56, 4}, device); auto qx_ncdhw = torch::quantize_per_tensor(x_ncdhw, 0.5, 1, torch::kQUInt8); auto qx_ndhwc = qx_ncdhw.contiguous(torch::MemoryFormat::ChannelsLast3d); torch::Tensor X_hat; for (auto _ : state) X_hat = torch::nn::functional::avg_pool3d( qx_ndhwc, torch::nn::AvgPool3dOptions({5, 5, 5}) .ceil_mode(true) .count_include_pad(true)); } static void BM_TORCH_QAVG_POOL3D_NCDHW(benchmark::State& state) { auto x_ncdhw = torch::rand({1, state.range(0), 56, 56, 4}, device); auto qx_ncdhw = torch::quantize_per_tensor(x_ncdhw, 0.5, 1, torch::kQUInt8); torch::Tensor X_hat; for (auto _ : state) X_hat = torch::nn::functional::avg_pool3d( qx_ncdhw, torch::nn::AvgPool3dOptions({5, 5, 5}) .ceil_mode(true) .count_include_pad(true)); } static void BM_TORCH_QAVG_POOL3D_NDHWC(benchmark::State& state) { auto x_ncdhw = torch::rand({1, state.range(0), 56, 56, 4}, device); auto qx_ncdhw = torch::quantize_per_tensor(x_ncdhw, 0.5, 1, torch::kQUInt8); auto qx_ndhwc = qx_ncdhw.contiguous(torch::MemoryFormat::ChannelsLast3d); torch::Tensor X_hat; for (auto _ : state) X_hat = torch::nn::functional::avg_pool3d( qx_ndhwc, torch::nn::AvgPool3dOptions({5, 5, 5}) .ceil_mode(true) .count_include_pad(true)); } BENCHMARK(BM_TORCH_QAVG_POOL2D_NCHW)->RangeMultiplier(8)->Range(4, 256); BENCHMARK(BM_TORCH_QAVG_POOL2D_NHWC)->RangeMultiplier(8)->Range(4, 256); BENCHMARK(BM_TORCH_QAVG_POOL3D_NCDHW)->RangeMultiplier(8)->Range(4, 256); BENCHMARK(BM_TORCH_QAVG_POOL3D_NDHWC)->RangeMultiplier(8)->Range(4, 256); BENCHMARK(BM_TORCH_QAVG_POOL2D_NCHW_SINGLE_THREADED) ->RangeMultiplier(8) ->Range(4, 256); BENCHMARK(BM_TORCH_QAVG_POOL2D_NHWC_SINGLE_THREADED) ->RangeMultiplier(8) ->Range(4, 256); BENCHMARK(BM_TORCH_QAVG_POOL3D_NCDHW_SINGLE_THREADED) ->RangeMultiplier(8) ->Range(4, 256); BENCHMARK(BM_TORCH_QAVG_POOL3D_NDHWC_SINGLE_THREADED) ->RangeMultiplier(8) ->Range(4, 256); BENCHMARK_MAIN(); ``` 3. `mkdir build && cd build` 4. ```cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` .. ``` 5. `cmake --build . --config Release` 6. `./time_average_pool` # Further notes - I've used `istrideB, istrideD, istrideH, strideW, strideC` to match `_qadaptive_avg_pool_kernel` since there's some code duplication there as mentioned in https://github.com/pytorch/pytorch/issues/40316. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42009 Reviewed By: pbelevich Differential Revision: D22794441 Pulled By: z-a-f fbshipit-source-id: 16710202811a1fbe1c99ea4d9b45876d6d28a8da	2020-08-05 19:44:42 -07:00
Nikita Shulga	9add11ffc1	Fix IS_SPMM_AVAILABLE macro definition (#42643 ) Summary: This should fix CUDA-11 on Windows build issue `defined` is not a function, and so it can not be used in macro substitution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42643 Reviewed By: pbelevich, xw285cornell Differential Revision: D22963420 Pulled By: malfet fbshipit-source-id: cccf7db0d03cd62b655beeb154db9e628aa749f0	2020-08-05 18:56:23 -07:00
Summer Deng	509fb77b70	Adjust bound_shape_inferencer to take 4 inputs for FCs (#41934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41934 The model exported from online training workflow with int8 quantization contains FCs with 4 inputs. The extra input is the quant_param blob. This diff is to adjust the bound_shape_inferencer and int8 op schema to get shape info for the quant_param input. Test Plan: ``` buck test caffe2/caffe2/opt:bound_shape_inference_test ``` Reviewed By: yinghai Differential Revision: D22683554 fbshipit-source-id: 684d1433212a528120aba1c37d27e26b6a31b403	2020-08-05 18:44:48 -07:00
Andres Suarez	9ea9d1b52e	[fbs][2/n] Remove .python3 markers Test Plan: `xbgr '\.python3'` shows only one (dead) usage of this file: https://www.internalfb.com/intern/diffusion/FBS/browse/master/fbcode/python/repo_stats/buck.py?commit=9a8dd3243207819325d520c208218f6ab69e4e49&lines=854 Reviewed By: lisroach Differential Revision: D22955631 fbshipit-source-id: e686d9157c08c347d0ce4acdd05bd7ab29ff7df5	2020-08-05 18:25:50 -07:00
Yanan Cao	5d7c3f92b9	Issue warning instead of error when parsing Enum while enum support is not enabled (#42623 ) Summary: Returnning None rather than error matches previous behavior better. Fixes https://fburl.com/yrrvtes3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42623 Reviewed By: ajaech Differential Revision: D22957498 Pulled By: gmagogsfm fbshipit-source-id: 61dabc6d23ad44e75bd35d837768bdb6fe71eece	2020-08-05 17:55:29 -07:00
Vasiliy Kuznetsov	50f0d2b97d	quant: add q_batchnorm_1d op (#42491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42491 Hooks up quantized batchnorm_1d to the quantized_bn kernel. Eager mode hookup will be in a future PR, and graph mode should work after this PR. Note: currently the implementation is ~2x slower on the benchmark than q_batch_norm2d because we convert back to contiguous memory format at the end, since channels_last is only defined for rank >= 4. If further optimization is needed, that can be a separate PR (will need the NHWC folks to see if there is a workaround). Meanwhile, having this is better than not having anything. Context: There have been both internal and external requests for various quantized BN1d use cases. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d_relu python test/test_quantization.py TestQuantizeJitOps.test_qbatch_norm // performance: // https://gist.github.com/vkuzo/73a07c0f24c05f5804990d9ebfaecf5e ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D22926254 fbshipit-source-id: 2780e6a81cd13a7455f6ab6e5118c22850a97a12	2020-08-05 17:20:18 -07:00
Stephen Chen	54ffb05eff	better error message between C2 and glow (#41603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41603 Pull Request resolved: https://github.com/pytorch/glow/pull/4704 Previously in the glow onnxifi path, when an error is encountered, we log it to stderr then just return ONNXIFI_STATUS_INTERNAL_ERROR to C2. C2 then does CAFFE2_ENFORCE_EQUAL(return_code, ONNXIFI_STATUS_SUCCESS). The error message that eventually went to the user is something like [enforce fail at onnxifi_op.cc:545] eventStatus == ONNXIFI_STATUS_SUCCESS. 1030 vs 0 This diff adds plumbing to get human readable error message out of glow into C2. Test Plan: Run ads replayer. Overload it with traffic. Now the error message sent back to the client used to be E0707 00:57:45.697196 3709559 Caffe2DisaggAcceleratorTask.cpp:493] During running REMOTE_OTHER net: [enforce fail at onnxifi_op.cc:545] eventStatus == ONNXIFI_STATUS_SUCCESS. 1030 vs 0 (Error from operator:.... Now it's ``` E0707 16:46:48.366263 1532943 Client.cpp:966] Exception when calling caffe2_run_disagg_accelerator on remote predictor for model 190081310_0 : apache::thrift::TApplicationException: c10::Error: [enforce fail at onnxifi_op.cc:556] . Error code: RUNTIME_REQUEST_REFUSED Error message: The number of allowed queued requests has been exceeded. queued requests: 100 allowed requests: 100 Error return stack: glow/glow/lib/Runtime/HostManager/HostManager.cpp:673 glow/glow/lib/Onnxifi/HostMana (Error from operator:... ``` Reviewed By: gcatron, yinghai Differential Revision: D22416857 fbshipit-source-id: 564bc7644d9666eb660725c2dca5637affae9b73	2020-08-05 16:25:13 -07:00
Nikita Shulga	aa4e91a6dc	Fix `TestSparse.test_bmm_windows_error` when CUDA is not available (#42626 ) Summary: Refactor comnon pattern of (torch.cuda.version and [int(x) for x in torch.cuda.version.split(".")] >= [a, b]) into `_get_torch_cuda_version()` function Pull Request resolved: https://github.com/pytorch/pytorch/pull/42626 Reviewed By: seemethere Differential Revision: D22956149 Pulled By: malfet fbshipit-source-id: 897c55965e53b477cd20f69e8da15d90489035de	2020-08-05 16:07:35 -07:00
Stephen Chen	5023995292	fix output size adjustment for onnxifi_op Summary: this breaks if we cut the net at certain int8 ops boundary. Test Plan: with net_runner to lower a single Int8Quantize op. It used to break. Now it works. Reviewed By: yinghai Differential Revision: D22912178 fbshipit-source-id: ca306068c9768df84c1cfa8b34226a1330e19912	2020-08-05 15:55:46 -07:00
Mikhail Zolotukhin	102abb877c	Reland D22939119: "[TensorExpr] Fix a way we were createing np arrays in tests." (#42608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42608 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D22952745 Pulled By: ZolotukhinM fbshipit-source-id: fd6a3efbfcaa876a2f4d27b507fe0ccdcb55a002	2020-08-05 15:14:23 -07:00
Luca Wehrstedt	2501e2b12d	[RPC tests] Run DdpUnderDistAutogradTest and DdpComparisonTest with fork too (#42528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42528 It seems it was an oversight that they weren't run. This allows to simplify our auto-generation logic as now all test suites are run in both modes. ghstack-source-id: 109229969 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D22922151 fbshipit-source-id: 0766a6970c927efb04eee4894b73d4bcaf60b97f	2020-08-05 15:10:29 -07:00
Luca Wehrstedt	4da602b004	[RPC tests] Generate test classes automatically (#42527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42527 ghstack-source-id: 109229468 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D22864698 fbshipit-source-id: 6a55f3201c544f0173493b38699a2c7e95ac1bbc	2020-08-05 15:10:26 -07:00
Luca Wehrstedt	d7516ccfac	[RPC tests] Enroll TensorPipe in missing test suites (#40823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40823 Summary of the entire stack: -- This diff is part of an attempt to refactor the RPC tests. They currently suffer from several problems: - Several ways to specify the agent to use: there exists one "generic" fixture that uses the global variable TEST_CONFIG to look up the agent name, and is used for process group and Thrift, and then there are separate fixtures for the flaky agent and the TensorPipe one. - These two ways lead to having two separate decorators (`requires_process_group_agent` and `@_skip_if_tensorpipe_agent`) which must both be specified, making it unclear what the effect of each of them is and what happens if only one is given. - Thrift must override the TEST_CONFIG global variable before any other import (in order for the `requires_process_group_agent` decorator to work correctly) and for that it must use a "trap" file, which makes it even harder to track which agent is being used, and which is specific to Buck, and thus cannot be used in OSS by other agents. - Even if the TensorPipe fixture doesn't use TEST_CONFIG, it still needs to set it to the right value for other parts of the code to work. (This is done in `dist_init`). - There are a few functions in dist_utils.py that return some properties of the agent (e.g., a regexp to match against the error it returns in case of shutdown). These functions are effectively chained if/elses on the various agents, which has the effect of "leaking" some part of the Thrift agent into OSS. - Each test suite (RPC, dist autograd/dist optimizer, their JIT versions, remote module, ...) must be run on each agent (or almost; the faulty one is an exception) in both fork and spawn mode. Each of these combinations is a separate file, which leads to a proliferation of scripts. - There is no "master list" of what combinations make sense and should be run. Therefore it has happened that when adding new tests or new agents we forgot to enroll them into the right tests. (TensorPipe is still missing a few tests, it turns out). - All of these tiny "entry point" files contain almost the same duplicated boilerplate. This makes it very easy to get the wrong content into one of them due to a bad copy-paste. This refactoring aims to address these problems by: - Avoiding global state, defaults/override, traps, if/elses, ... and have a single way to specify the agent, based on an abstract base class and several concrete subclasses which can be "mixed in" to any test suite. - Instead of enabling/disabling tests using decorators, the tests that are specific to a certain agent are now in a separate class (which is a subclass of the "generic" test suite) so that they are only picked up by the agent they apply to. - Instead of having one separate entry point script for each combination, it uses one entry point for each agent, and in that script it provides a list of all the test suites it wants to run on that agent. And it does that by trying to deduplicate the boilerplate as much as possible. (In fact, the various agent-suite combinations could be grouped in any way, not necessarily by agent as I did here). It provides further advantages: - It puts all the agents on equal standing, by not having any of them be the default, making it thus easier to migrate from process group to TensorPipe. - It will make it easier to add more versions of the TensorPipe tests (e.g., one that disables the same-machine backends in order to test the TCP-based ones) without a further duplication of entry points, of boilerplate, ... Summary of this commit -- As it is now easier to spot that the TensorPipe agent wasn't being run on some test suite, we fix that. We keep this change for last so that if those tests turn out to be flaky and must be reverted this won't affect the rest of the stack. ghstack-source-id: 109229469 Test Plan: Sandcastle and CircleCI Reviewed By: pritamdamania87 Differential Revision: D22309432 fbshipit-source-id: c433a6a49a7b6737e0df4cd953f3dfde290f20b8	2020-08-05 15:10:23 -07:00
Luca Wehrstedt	2e7b464c43	[RPC tests] Remove global TEST_CONFIG (#40822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40822 Summary of the entire stack: -- This diff is part of an attempt to refactor the RPC tests. They currently suffer from several problems: - Several ways to specify the agent to use: there exists one "generic" fixture that uses the global variable TEST_CONFIG to look up the agent name, and is used for process group and Thrift, and then there are separate fixtures for the flaky agent and the TensorPipe one. - These two ways lead to having two separate decorators (`requires_process_group_agent` and `@_skip_if_tensorpipe_agent`) which must both be specified, making it unclear what the effect of each of them is and what happens if only one is given. - Thrift must override the TEST_CONFIG global variable before any other import (in order for the `requires_process_group_agent` decorator to work correctly) and for that it must use a "trap" file, which makes it even harder to track which agent is being used, and which is specific to Buck, and thus cannot be used in OSS by other agents. - Even if the TensorPipe fixture doesn't use TEST_CONFIG, it still needs to set it to the right value for other parts of the code to work. (This is done in `dist_init`). - There are a few functions in dist_utils.py that return some properties of the agent (e.g., a regexp to match against the error it returns in case of shutdown). These functions are effectively chained if/elses on the various agents, which has the effect of "leaking" some part of the Thrift agent into OSS. - Each test suite (RPC, dist autograd/dist optimizer, their JIT versions, remote module, ...) must be run on each agent (or almost; the faulty one is an exception) in both fork and spawn mode. Each of these combinations is a separate file, which leads to a proliferation of scripts. - There is no "master list" of what combinations make sense and should be run. Therefore it has happened that when adding new tests or new agents we forgot to enroll them into the right tests. (TensorPipe is still missing a few tests, it turns out). - All of these tiny "entry point" files contain almost the same duplicated boilerplate. This makes it very easy to get the wrong content into one of them due to a bad copy-paste. This refactoring aims to address these problems by: - Avoiding global state, defaults/override, traps, if/elses, ... and have a single way to specify the agent, based on an abstract base class and several concrete subclasses which can be "mixed in" to any test suite. - Instead of enabling/disabling tests using decorators, the tests that are specific to a certain agent are now in a separate class (which is a subclass of the "generic" test suite) so that they are only picked up by the agent they apply to. - Instead of having one separate entry point script for each combination, it uses one entry point for each agent, and in that script it provides a list of all the test suites it wants to run on that agent. And it does that by trying to deduplicate the boilerplate as much as possible. (In fact, the various agent-suite combinations could be grouped in any way, not necessarily by agent as I did here). It provides further advantages: - It puts all the agents on equal standing, by not having any of them be the default, making it thus easier to migrate from process group to TensorPipe. - It will make it easier to add more versions of the TensorPipe tests (e.g., one that disables the same-machine backends in order to test the TCP-based ones) without a further duplication of entry points, of boilerplate, ... Summary of this commit -- This is the last step of removing TEST_CONFIG. As there was no one left using it, there is really not much to it. ghstack-source-id: 109229471 Test Plan: Sandcastle and CircleCI Reviewed By: pritamdamania87 Differential Revision: D22307778 fbshipit-source-id: 0d9498d9367eec671e0a964ce693015f73c5638c	2020-08-05 15:10:20 -07:00
Luca Wehrstedt	e7c7eaab82	[RPC tests] Move some functions to methods of fixture (#40821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40821 Summary of the entire stack: -- This diff is part of an attempt to refactor the RPC tests. They currently suffer from several problems: - Several ways to specify the agent to use: there exists one "generic" fixture that uses the global variable TEST_CONFIG to look up the agent name, and is used for process group and Thrift, and then there are separate fixtures for the flaky agent and the TensorPipe one. - These two ways lead to having two separate decorators (`requires_process_group_agent` and `@_skip_if_tensorpipe_agent`) which must both be specified, making it unclear what the effect of each of them is and what happens if only one is given. - Thrift must override the TEST_CONFIG global variable before any other import (in order for the `requires_process_group_agent` decorator to work correctly) and for that it must use a "trap" file, which makes it even harder to track which agent is being used, and which is specific to Buck, and thus cannot be used in OSS by other agents. - Even if the TensorPipe fixture doesn't use TEST_CONFIG, it still needs to set it to the right value for other parts of the code to work. (This is done in `dist_init`). - There are a few functions in dist_utils.py that return some properties of the agent (e.g., a regexp to match against the error it returns in case of shutdown). These functions are effectively chained if/elses on the various agents, which has the effect of "leaking" some part of the Thrift agent into OSS. - Each test suite (RPC, dist autograd/dist optimizer, their JIT versions, remote module, ...) must be run on each agent (or almost; the faulty one is an exception) in both fork and spawn mode. Each of these combinations is a separate file, which leads to a proliferation of scripts. - There is no "master list" of what combinations make sense and should be run. Therefore it has happened that when adding new tests or new agents we forgot to enroll them into the right tests. (TensorPipe is still missing a few tests, it turns out). - All of these tiny "entry point" files contain almost the same duplicated boilerplate. This makes it very easy to get the wrong content into one of them due to a bad copy-paste. This refactoring aims to address these problems by: - Avoiding global state, defaults/override, traps, if/elses, ... and have a single way to specify the agent, based on an abstract base class and several concrete subclasses which can be "mixed in" to any test suite. - Instead of enabling/disabling tests using decorators, the tests that are specific to a certain agent are now in a separate class (which is a subclass of the "generic" test suite) so that they are only picked up by the agent they apply to. - Instead of having one separate entry point script for each combination, it uses one entry point for each agent, and in that script it provides a list of all the test suites it wants to run on that agent. And it does that by trying to deduplicate the boilerplate as much as possible. (In fact, the various agent-suite combinations could be grouped in any way, not necessarily by agent as I did here). It provides further advantages: - It puts all the agents on equal standing, by not having any of them be the default, making it thus easier to migrate from process group to TensorPipe. - It will make it easier to add more versions of the TensorPipe tests (e.g., one that disables the same-machine backends in order to test the TCP-based ones) without a further duplication of entry points, of boilerplate, ... Summary of this commit -- This change continues the work towards removing TEST_CONFIG, by taking a few functions that were accepting the agent name (as obtained from TEST_CONFIG) and then did a bunch of if/elses on it, and replace them by new abstract methods on the fixtures, so that these functions become "decentralized". ghstack-source-id: 109229472 Test Plan: Sandcastle and CircleCI Reviewed By: pritamdamania87 Differential Revision: D22307776 fbshipit-source-id: 9e1f6edca79aacf0bcf9d83d50ce9e0d2beec0dd	2020-08-05 15:10:17 -07:00
Luca Wehrstedt	2acef69ce3	[RPC tests] Make generic fixture an abstract base class (#40820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40820 Summary of the entire stack: -- This diff is part of an attempt to refactor the RPC tests. They currently suffer from several problems: - Several ways to specify the agent to use: there exists one "generic" fixture that uses the global variable TEST_CONFIG to look up the agent name, and is used for process group and Thrift, and then there are separate fixtures for the flaky agent and the TensorPipe one. - These two ways lead to having two separate decorators (`requires_process_group_agent` and `@_skip_if_tensorpipe_agent`) which must both be specified, making it unclear what the effect of each of them is and what happens if only one is given. - Thrift must override the TEST_CONFIG global variable before any other import (in order for the `requires_process_group_agent` decorator to work correctly) and for that it must use a "trap" file, which makes it even harder to track which agent is being used, and which is specific to Buck, and thus cannot be used in OSS by other agents. - Even if the TensorPipe fixture doesn't use TEST_CONFIG, it still needs to set it to the right value for other parts of the code to work. (This is done in `dist_init`). - There are a few functions in dist_utils.py that return some properties of the agent (e.g., a regexp to match against the error it returns in case of shutdown). These functions are effectively chained if/elses on the various agents, which has the effect of "leaking" some part of the Thrift agent into OSS. - Each test suite (RPC, dist autograd/dist optimizer, their JIT versions, remote module, ...) must be run on each agent (or almost; the faulty one is an exception) in both fork and spawn mode. Each of these combinations is a separate file, which leads to a proliferation of scripts. - There is no "master list" of what combinations make sense and should be run. Therefore it has happened that when adding new tests or new agents we forgot to enroll them into the right tests. (TensorPipe is still missing a few tests, it turns out). - All of these tiny "entry point" files contain almost the same duplicated boilerplate. This makes it very easy to get the wrong content into one of them due to a bad copy-paste. This refactoring aims to address these problems by: - Avoiding global state, defaults/override, traps, if/elses, ... and have a single way to specify the agent, based on an abstract base class and several concrete subclasses which can be "mixed in" to any test suite. - Instead of enabling/disabling tests using decorators, the tests that are specific to a certain agent are now in a separate class (which is a subclass of the "generic" test suite) so that they are only picked up by the agent they apply to. - Instead of having one separate entry point script for each combination, it uses one entry point for each agent, and in that script it provides a list of all the test suites it wants to run on that agent. And it does that by trying to deduplicate the boilerplate as much as possible. (In fact, the various agent-suite combinations could be grouped in any way, not necessarily by agent as I did here). It provides further advantages: - It puts all the agents on equal standing, by not having any of them be the default, making it thus easier to migrate from process group to TensorPipe. - It will make it easier to add more versions of the TensorPipe tests (e.g., one that disables the same-machine backends in order to test the TCP-based ones) without a further duplication of entry points, of boilerplate, ... Summary of this commit -- Now that no one is using the generic fixture anymore (i.e., the fixture that looks up the agent's name in the global TEST_CONFIG) we can make it abstract, i.e., have its methods become no-ops and add decorators that will require all subclasses to provide new implementations of those methods. This is a first step towards removing TEST_CONFIG. ghstack-source-id: 109229475 Test Plan: Sandcastle and CircleCI Reviewed By: pritamdamania87 Differential Revision: D22307777 fbshipit-source-id: e52abd915c37894933545eebdfdca3ecb9559926	2020-08-05 15:10:14 -07:00
Luca Wehrstedt	a94039fce5	[RPC tests] Avoid decorators to skip tests (#40819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40819 Summary of the entire stack: -- This diff is part of an attempt to refactor the RPC tests. They currently suffer from several problems: - Several ways to specify the agent to use: there exists one "generic" fixture that uses the global variable TEST_CONFIG to look up the agent name, and is used for process group and Thrift, and then there are separate fixtures for the flaky agent and the TensorPipe one. - These two ways lead to having two separate decorators (`requires_process_group_agent` and `@_skip_if_tensorpipe_agent`) which must both be specified, making it unclear what the effect of each of them is and what happens if only one is given. - Thrift must override the TEST_CONFIG global variable before any other import (in order for the `requires_process_group_agent` decorator to work correctly) and for that it must use a "trap" file, which makes it even harder to track which agent is being used, and which is specific to Buck, and thus cannot be used in OSS by other agents. - Even if the TensorPipe fixture doesn't use TEST_CONFIG, it still needs to set it to the right value for other parts of the code to work. (This is done in `dist_init`). - There are a few functions in dist_utils.py that return some properties of the agent (e.g., a regexp to match against the error it returns in case of shutdown). These functions are effectively chained if/elses on the various agents, which has the effect of "leaking" some part of the Thrift agent into OSS. - Each test suite (RPC, dist autograd/dist optimizer, their JIT versions, remote module, ...) must be run on each agent (or almost; the faulty one is an exception) in both fork and spawn mode. Each of these combinations is a separate file, which leads to a proliferation of scripts. - There is no "master list" of what combinations make sense and should be run. Therefore it has happened that when adding new tests or new agents we forgot to enroll them into the right tests. (TensorPipe is still missing a few tests, it turns out). - All of these tiny "entry point" files contain almost the same duplicated boilerplate. This makes it very easy to get the wrong content into one of them due to a bad copy-paste. This refactoring aims to address these problems by: - Avoiding global state, defaults/override, traps, if/elses, ... and have a single way to specify the agent, based on an abstract base class and several concrete subclasses which can be "mixed in" to any test suite. - Instead of enabling/disabling tests using decorators, the tests that are specific to a certain agent are now in a separate class (which is a subclass of the "generic" test suite) so that they are only picked up by the agent they apply to. - Instead of having one separate entry point script for each combination, it uses one entry point for each agent, and in that script it provides a list of all the test suites it wants to run on that agent. And it does that by trying to deduplicate the boilerplate as much as possible. (In fact, the various agent-suite combinations could be grouped in any way, not necessarily by agent as I did here). It provides further advantages: - It puts all the agents on equal standing, by not having any of them be the default, making it thus easier to migrate from process group to TensorPipe. - It will make it easier to add more versions of the TensorPipe tests (e.g., one that disables the same-machine backends in order to test the TCP-based ones) without a further duplication of entry points, of boilerplate, ... Summary of this commit -- This diff removes the two decorators (`requires_process_group_agent` and `@_skip_if_tensorpipe_agent`) which were used to skip tests. They were only used to prevent the TensorPipe agent from running tests that were using the process group agent's options. The converse (preventing the PG agent from using the TP options) is achieved by having those tests live in a `TensorPipeAgentRpcTest` class. So here we're doing the same for process group, by moving those tests to a `ProcessGroupAgentRpcTest` class. ghstack-source-id: 109229473 Test Plan: Sandcastle and CircleCI Reviewed By: pritamdamania87 Differential Revision: D22283179 fbshipit-source-id: b9315f9fd67f35e88fe1843faa161fc53a4133c4	2020-08-05 15:10:11 -07:00
Luca Wehrstedt	935fcc9580	[RPC tests] Merge process group tests into single entry point (#40818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40818 Summary of the entire stack: -- This diff is part of an attempt to refactor the RPC tests. They currently suffer from several problems: - Several ways to specify the agent to use: there exists one "generic" fixture that uses the global variable TEST_CONFIG to look up the agent name, and is used for process group and Thrift, and then there are separate fixtures for the flaky agent and the TensorPipe one. - These two ways lead to having two separate decorators (`requires_process_group_agent` and `@_skip_if_tensorpipe_agent`) which must both be specified, making it unclear what the effect of each of them is and what happens if only one is given. - Thrift must override the TEST_CONFIG global variable before any other import (in order for the `requires_process_group_agent` decorator to work correctly) and for that it must use a "trap" file, which makes it even harder to track which agent is being used, and which is specific to Buck, and thus cannot be used in OSS by other agents. - Even if the TensorPipe fixture doesn't use TEST_CONFIG, it still needs to set it to the right value for other parts of the code to work. (This is done in `dist_init`). - There are a few functions in dist_utils.py that return some properties of the agent (e.g., a regexp to match against the error it returns in case of shutdown). These functions are effectively chained if/elses on the various agents, which has the effect of "leaking" some part of the Thrift agent into OSS. - Each test suite (RPC, dist autograd/dist optimizer, their JIT versions, remote module, ...) must be run on each agent (or almost; the faulty one is an exception) in both fork and spawn mode. Each of these combinations is a separate file, which leads to a proliferation of scripts. - There is no "master list" of what combinations make sense and should be run. Therefore it has happened that when adding new tests or new agents we forgot to enroll them into the right tests. (TensorPipe is still missing a few tests, it turns out). - All of these tiny "entry point" files contain almost the same duplicated boilerplate. This makes it very easy to get the wrong content into one of them due to a bad copy-paste. This refactoring aims to address these problems by: - Avoiding global state, defaults/override, traps, if/elses, ... and have a single way to specify the agent, based on an abstract base class and several concrete subclasses which can be "mixed in" to any test suite. - Instead of enabling/disabling tests using decorators, the tests that are specific to a certain agent are now in a separate class (which is a subclass of the "generic" test suite) so that they are only picked up by the agent they apply to. - Instead of having one separate entry point script for each combination, it uses one entry point for each agent, and in that script it provides a list of all the test suites it wants to run on that agent. And it does that by trying to deduplicate the boilerplate as much as possible. (In fact, the various agent-suite combinations could be grouped in any way, not necessarily by agent as I did here). It provides further advantages: - It puts all the agents on equal standing, by not having any of them be the default, making it thus easier to migrate from process group to TensorPipe. - It will make it easier to add more versions of the TensorPipe tests (e.g., one that disables the same-machine backends in order to test the TCP-based ones) without a further duplication of entry points, of boilerplate, ... Summary of this commit -- This diff does the changes described above for the process group agent. It defines a fixture for it (instead of using the generic fixture in its default behavior) and then merges all the entry points into a single script. Note that after this change there won't be anymore a "vanilla" RPC test: all test scripts now specify what agent they are using. This puts all agents on equal standing. ghstack-source-id: 109229474 Test Plan: Sandcastle and CircleCI Reviewed By: pritamdamania87 Differential Revision: D22283182 fbshipit-source-id: 7e3626bbbf37d88b892077a03725f0598576b370	2020-08-05 15:10:07 -07:00
Luca Wehrstedt	b93c7c54eb	[RPC tests] Merge tests for faulty agent into single script (#40817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40817 Summary of the entire stack: -- This diff is part of an attempt to refactor the RPC tests. They currently suffer from several problems: - Several ways to specify the agent to use: there exists one "generic" fixture that uses the global variable TEST_CONFIG to look up the agent name, and is used for process group and Thrift, and then there are separate fixtures for the flaky agent and the TensorPipe one. - These two ways lead to having two separate decorators (`requires_process_group_agent` and `@_skip_if_tensorpipe_agent`) which must both be specified, making it unclear what the effect of each of them is and what happens if only one is given. - Thrift must override the TEST_CONFIG global variable before any other import (in order for the `requires_process_group_agent` decorator to work correctly) and for that it must use a "trap" file, which makes it even harder to track which agent is being used, and which is specific to Buck, and thus cannot be used in OSS by other agents. - Even if the TensorPipe fixture doesn't use TEST_CONFIG, it still needs to set it to the right value for other parts of the code to work. (This is done in `dist_init`). - There are a few functions in dist_utils.py that return some properties of the agent (e.g., a regexp to match against the error it returns in case of shutdown). These functions are effectively chained if/elses on the various agents, which has the effect of "leaking" some part of the Thrift agent into OSS. - Each test suite (RPC, dist autograd/dist optimizer, their JIT versions, remote module, ...) must be run on each agent (or almost; the faulty one is an exception) in both fork and spawn mode. Each of these combinations is a separate file, which leads to a proliferation of scripts. - There is no "master list" of what combinations make sense and should be run. Therefore it has happened that when adding new tests or new agents we forgot to enroll them into the right tests. (TensorPipe is still missing a few tests, it turns out). - All of these tiny "entry point" files contain almost the same duplicated boilerplate. This makes it very easy to get the wrong content into one of them due to a bad copy-paste. This refactoring aims to address these problems by: - Avoiding global state, defaults/override, traps, if/elses, ... and have a single way to specify the agent, based on an abstract base class and several concrete subclasses which can be "mixed in" to any test suite. - Instead of enabling/disabling tests using decorators, the tests that are specific to a certain agent are now in a separate class (which is a subclass of the "generic" test suite) so that they are only picked up by the agent they apply to. - Instead of having one separate entry point script for each combination, it uses one entry point for each agent, and in that script it provides a list of all the test suites it wants to run on that agent. And it does that by trying to deduplicate the boilerplate as much as possible. (In fact, the various agent-suite combinations could be grouped in any way, not necessarily by agent as I did here). It provides further advantages: - It puts all the agents on equal standing, by not having any of them be the default, making it thus easier to migrate from process group to TensorPipe. - It will make it easier to add more versions of the TensorPipe tests (e.g., one that disables the same-machine backends in order to test the TCP-based ones) without a further duplication of entry points, of boilerplate, ... Summary of this commit -- This diff does the changes described above for the faulty agent, which is its own strange beast. It merges all the test entry points (i.e., the combinations of agent, suite and fork/spawn) into a single file. It also modifies the test suites that are intended to be run only on the faulty agent, which used to inherit from its fixture, to inherit from the generic fixture, as they will be mixed in with the faulty fixture at the very end, inside the entry point script. ghstack-source-id: 109229477 Test Plan: Sandcastle and CircleCI Reviewed By: pritamdamania87 Differential Revision: D22283178 fbshipit-source-id: 72659efe6652dac8450473642a578933030f2c74	2020-08-05 15:10:04 -07:00
Luca Wehrstedt	edf6c4bc4d	[RPC tests] Merge TensorPipe tests into single entry point (#40816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40816 Summary of the entire stack: -- This diff is part of an attempt to refactor the RPC tests. They currently suffer from several problems: - Several ways to specify the agent to use: there exists one "generic" fixture that uses the global variable TEST_CONFIG to look up the agent name, and is used for process group and Thrift, and then there are separate fixtures for the flaky agent and the TensorPipe one. - These two ways lead to having two separate decorators (`requires_process_group_agent` and `@_skip_if_tensorpipe_agent`) which must both be specified, making it unclear what the effect of each of them is and what happens if only one is given. - Thrift must override the TEST_CONFIG global variable before any other import (in order for the `requires_process_group_agent` decorator to work correctly) and for that it must use a "trap" file, which makes it even harder to track which agent is being used, and which is specific to Buck, and thus cannot be used in OSS by other agents. - Even if the TensorPipe fixture doesn't use TEST_CONFIG, it still needs to set it to the right value for other parts of the code to work. (This is done in `dist_init`). - There are a few functions in dist_utils.py that return some properties of the agent (e.g., a regexp to match against the error it returns in case of shutdown). These functions are effectively chained if/elses on the various agents, which has the effect of "leaking" some part of the Thrift agent into OSS. - Each test suite (RPC, dist autograd/dist optimizer, their JIT versions, remote module, ...) must be run on each agent (or almost; the faulty one is an exception) in both fork and spawn mode. Each of these combinations is a separate file, which leads to a proliferation of scripts. - There is no "master list" of what combinations make sense and should be run. Therefore it has happened that when adding new tests or new agents we forgot to enroll them into the right tests. (TensorPipe is still missing a few tests, it turns out). - All of these tiny "entry point" files contain almost the same duplicated boilerplate. This makes it very easy to get the wrong content into one of them due to a bad copy-paste. This refactoring aims to address these problems by: - Avoiding global state, defaults/override, traps, if/elses, ... and have a single way to specify the agent, based on an abstract base class and several concrete subclasses which can be "mixed in" to any test suite. - Instead of enabling/disabling tests using decorators, the tests that are specific to a certain agent are now in a separate class (which is a subclass of the "generic" test suite) so that they are only picked up by the agent they apply to. - Instead of having one separate entry point script for each combination, it uses one entry point for each agent, and in that script it provides a list of all the test suites it wants to run on that agent. And it does that by trying to deduplicate the boilerplate as much as possible. (In fact, the various agent-suite combinations could be grouped in any way, not necessarily by agent as I did here). It provides further advantages: - It puts all the agents on equal standing, by not having any of them be the default, making it thus easier to migrate from process group to TensorPipe. - It will make it easier to add more versions of the TensorPipe tests (e.g., one that disables the same-machine backends in order to test the TCP-based ones) without a further duplication of entry points, of boilerplate, ... Summary of this commit -- This diff does the changes described above for the TensorPipe agent. It fixes its fixture (making it inherit from the generic fixture) and merges all the entry point scripts into a single one, so that it's easier to have a clear overview of all the test suites which we run on TensorPipe (you'll notice that many are missing: the JIT ones, the remote module one, ...). ghstack-source-id: 109229476 Test Plan: Sandcastle and CircleCI Reviewed By: pritamdamania87 Differential Revision: D22283180 fbshipit-source-id: d5e9f9f4e6d4bfd6fbcae7ae56eed63d2567a02f	2020-08-05 15:08:32 -07:00
Mikhail Zolotukhin	73351ee91d	[TensorExpr] Disallow fallback to JIT interpreter from TensorExprKernel (flip the default). (#42568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42568 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D22936175 Pulled By: ZolotukhinM fbshipit-source-id: 62cb505acb77789ed9f483842a8b31eb245697b3	2020-08-05 14:13:49 -07:00
Mikhail Zolotukhin	ef50694d44	[TensorExpr] Apply GenericIntrinsicExpander recursively. (#42567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42567 Before this change we didn't expand arguments, and thus in an expr `sigmoid(sigmoid(x))` only the outer call was expanded. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D22936177 Pulled By: ZolotukhinM fbshipit-source-id: 9c05dc96561225bab9a90a407d7bcf9a89b078a1	2020-08-05 14:13:46 -07:00
Mikhail Zolotukhin	ea9053b86d	[TensorExpr] Handle constant nodes in shape inference. (#42566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42566 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D22936176 Pulled By: ZolotukhinM fbshipit-source-id: 69d0f9907de0e98f1fbd56407df235774cb5b788	2020-08-05 14:13:44 -07:00
Mikhail Zolotukhin	b9c49f0e69	[TensorExpr] Support shape inference in TE for aten::cat. (#42387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42387 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D22879281 Pulled By: ZolotukhinM fbshipit-source-id: 775e46a4cfd91c63196b378ee587cc4434672c89	2020-08-05 14:11:24 -07:00
Basil Hosmer	feeb515ad5	add Quantizer support to IValue (#42438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42438 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D22894190 Pulled By: bhosmer fbshipit-source-id: b2d08abd6f582f29daa6cc7ebf05bb1a99f7514b	2020-08-05 12:56:18 -07:00
Mike Ruberry	24e2a8a171	Revert D22780307: Fix illegal memory acess issue for CUDA versionn of SplitByLengths operator. Test Plan: revert-hammer Differential Revision: D22780307 (`76905527fe`) Original commit changeset: c5ca60ae16b2 fbshipit-source-id: f3c99eec5f05121e2bed606fe2ba84a0be0cdf16	2020-08-05 12:47:56 -07:00
Kurt Mohler	df7c059428	Throw error if `torch.set_deterministic(True)` is called with nondeterministic CuBLAS config (#41377 ) Summary: For CUDA >= 10.2, the `CUBLAS_WORKSPACE_CONFIG` environment variable must be set to either `:4096:8` or `:16:8` to ensure deterministic CUDA stream usage. This PR adds some logic inside `torch.set_deterministic()` to raise an error if this environment variable is not set properly and CUDA >= 10.2. Issue https://github.com/pytorch/pytorch/issues/15359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41377 Reviewed By: malfet Differential Revision: D22758459 Pulled By: ezyang fbshipit-source-id: 4b96f1e9abf85d94ba79140fd927bbd0c05c4522	2020-08-05 12:42:24 -07:00
Vincent Quenneville-Belair	7221a3d1aa	enable torch.optim.swa_utils.SWALR (#42574 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42574 Reviewed By: zou3519 Differential Revision: D22949369 Pulled By: vincentqb fbshipit-source-id: f2f319ec94a97e0afe4d4327c866504ae632a986	2020-08-05 12:37:45 -07:00
Yongbin Gu	18a32b807b	Add API to collect output_col_minmax_histogram Summary: Add an API to collect output_col_minmax_histogram. This is used to implement input_equalization. Roll back revised the collect_single_histogram in the new version to make sure it does not affect the product. The newly added one can implement collect the activation histogram and output col max histogram at the same time. Test Plan: Add a unit test, and pass it. https://our.intern.facebook.com/intern/testinfra/testrun/2251799847601374 After updating the dump API, it passed the updated unit test https://our.intern.facebook.com/intern/testinfra/testrun/844425097716401 Integrated the output_col_minmax_histogram to the collect single histogram, and make it downward compatible https://our.intern.facebook.com/intern/testinfra/testrun/8162774342207893 I added different cases to tested newly added function. It passed the unit test https://our.intern.facebook.com/intern/testinfra/testrun/4503599658969000 Tested after new revision: https://our.intern.facebook.com/intern/testinfra/testrun/5348024589078557 Reviewed By: hx89 Differential Revision: D22919913 fbshipit-source-id: c9cb05e0cf14af0dfde3d22921abb42f97a61df2	2020-08-05 12:33:10 -07:00
Edward Yang	7c33225c72	Add strict mypy type checking and update code_template.py (#42322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42322 Our current type checking rules are rather lax, and for example don't force users to make sure they annotate all functions with types. For code generation code, it would be better to force 100% typing. This PR introduces a new mypy configuration mypy-strict.ini which applies rules from --strict. We extend test_type_hints.py to test for this case. It only covers code_template.py, which I have made strict clean in this PR. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D22846120 Pulled By: ezyang fbshipit-source-id: 8d253829223bfa0d811b6add53b7bc2d3a4356b0	2020-08-05 12:28:15 -07:00
Yinghai Lu	5c5d7a9dca	Freeze dynamic (re)quantizaiton ops into standard ones (#42591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42591 We don't support lowering with 2-input Int8Quantize and 4-input Int8FC. Just do a conversion to absorb the quantization params into the op itself. Test Plan: ``` buck test caffe2/caffe2/quantization/server:quantize_dnnlowp_op_test ``` Reviewed By: benjibc Differential Revision: D22942673 fbshipit-source-id: a392ba2afdfa39c05c5adcb6c4dc5f814c95e449	2020-08-05 11:53:09 -07:00
Will Constable	6d1e43c5a6	Release the GIL before invokeOperator (#42341 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41865 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42341 Reviewed By: ezyang Differential Revision: D22928622 Pulled By: wconstab fbshipit-source-id: 8fa41277c9465f816342db6ec0e6cd4b30095c5c	2020-08-05 11:51:39 -07:00
Ren Chen	76905527fe	Fix illegal memory acess issue for CUDA versionn of SplitByLengths operator. Summary: 1. Fix illegal memory access issue for SplitByLengths operator in the CUDA context. 2. Add support to scaling lengths vector for SplitByLengths operator. 3. Add support to test SplitByLengths operator in the CUDA context. Example for SplitByLengths operator processing scaling lengths vector: value vector A = [1, 2, 3, 4, 5, 6] length vector B = [1, 2] after execution of SplitByLengths operator, the output should be [1,2] and [3,4,5,6] Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:concat_split_op_test Reviewed By: kennyhorror Differential Revision: D22780307 fbshipit-source-id: c5ca60ae16b24032cedfa045a421503b713daa6c	2020-08-05 11:46:00 -07:00
Dmytro Dzhulgakov	06d978a9ad	[c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42249 Main change is to bring Caffe2's superior error messages for cuda initialization into c10 and use them in all code paths. Basic logic: \| Case \| Call to device_count() \| init_cuda, e.g. allocating tensor \| \| -- \| -- \| -- \| \| all good \| non-zero \| just works \| \| no gpus \| 0, no warning \| throw exception with good message \| \| driver issues \| 0, produce warning \| throw exception with good message \| \| out of memory with ASAN \| 0, produce warning\| throw exception with ASAN message \| Previously, the error thrown from init_cuda was very generic and the ASAN warning (if any) was buried in the logs. Other clean up changes: * cache device_count() always in a static variable * move all asan macros in c10 Test Plan: Hard to unittest because of build modes. Verified manually that the behavior from the table above holds by running the following script in different modes (ASAN/no-ASAN, CUDA_VISIBLE_DEVICES=): ``` print('before import') import torch print('after import') print('devices: ', torch.cuda.device_count()) x = torch.tensor([1,2,3]) print('tensor creation') x = x.cuda() print('moved to cuda') ``` Reviewed By: ngimel Differential Revision: D22824329 fbshipit-source-id: 5314007313a3897fc955b02f8b21b661ae35fdf5	2020-08-05 11:39:31 -07:00
Ivan Kobzarev	27e8dc78ca	[vulkan] VulkanTensor lazy buffer allocation (#42569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42569 We do not need to allocate buffers for Vulkan Tensors if they are not the forward input or output. Removing allocate_storage() for outputs of operations by default, their image representation will have the result. Allocating buffer only if it was requested for the operations (For some ops like concatenate, transpose) or copy to host. `VulkanTensor.image()` if buffer was not allocated - just allocates texture skipping copy from buffer to texture. As allocate storage was before for all operations - we are saving buffer allocation and buffer_to_image call. MobilNetV2 on my Pixel4: ``` flame:/data/local/tmp $ ./speed_benchmark_torch --model=mnfp32-vopt.pt --input_type=float --input_dims=1,3,224,224 --warmup=3 --iter=20 --vulkan=true Starting benchmark. Running warmup runs. Main runs. Main run finished. Microseconds per iter: 305818. Iters per second: 3.26991 Segmentation fault ``` ``` 139\|flame:/data/local/tmp $ ./speed_benchmark_torch_noas --model=mnfp32-vopt.pt --input_type=float --input_dims=1,3,224,224 --warmup=3 --iter=20 --vulkan=true Starting benchmark. Running warmup runs. Main runs. Main run finished. Microseconds per iter: 236768. Iters per second: 4.22355 Segmentation fault ``` Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22946552 Pulled By: IvanKobzarev fbshipit-source-id: ac0743bb316847632a22cf9aafb8938e50b2fb7b	2020-08-05 10:54:41 -07:00
Ailing	dae94ed022	Keep manual_kernel_registration only effective in aten codegen. (#42386 ) Summary: This PR removes manual registration in aten/native codebase. And it separates manual device/catchall kernel registration from manual VariableType kernel registration. The first one remains as manual_kernel_registration in native_functions.yaml. The second one is moved to tools/ codegen. Difference in generated TypeDefault.cpp: https://gist.github.com/ailzhang/897ef9fdf0c834279cd358febba07734 No difference in generated VariableType_X.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/42386 Reviewed By: agolynski Differential Revision: D22915649 Pulled By: ailzhang fbshipit-source-id: ce93784b9b081234f05f3343e8de3c7a704a5783	2020-08-05 10:31:35 -07:00
peter	b08347fd7b	Add CUDA 11 builds for Windows CI (#42420 ) Summary: Stacked on https://github.com/pytorch/pytorch/pull/42410. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42420 Reviewed By: seemethere Differential Revision: D22917230 Pulled By: malfet fbshipit-source-id: 6ad394f7f8c430c587e0b0d9c5a5e7b7bcd85bfe	2020-08-05 09:40:33 -07:00
Eli Uriegas	db52cd7322	.circleci: Hardcode rocm image to previous tag (#42603 ) Summary: There were some inconsistencies with the newer docker images so it'd be best to stick with something that works without reverting the entire docker builder PR This was made after the previous efforts to disable the tests that were failing: * https://github.com/pytorch/pytorch/pull/42583 * https://github.com/pytorch/pytorch/pull/42561 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/42603 Reviewed By: ezyang Differential Revision: D22948743 Pulled By: seemethere fbshipit-source-id: cc8b834e0c8a6a4763f5ba07ce220a9c192ea6eb	2020-08-05 09:23:21 -07:00
Facebook Community Bot	eb8a5fed38	Automated submodule update: FBGEMM (#42584 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `4abc34af1a` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42584 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: dskhudia Differential Revision: D22941475 fbshipit-source-id: 29863cad7f77939edb44d337918693879b35cfaa	2020-08-05 09:19:27 -07:00
Kimish Patel	924a1dbe9b	Revert D22939119: [TensorExpr] Fix a way we were createing np arrays in tests. Test Plan: revert-hammer Differential Revision: D22939119 (`882ad117cf`) Original commit changeset: 3388270af8ea fbshipit-source-id: 7c8d159586ce2c4c21184fd84aa6da5183bc71ea	2020-08-05 08:25:47 -07:00
Nikita Shulga	0cf71eb547	Unconditinally use typing extensions in jit_internal (#42538 ) Summary: Since https://github.com/pytorch/pytorch/issues/38221 is closed now, `typing_extensions` module should always be available Pull Request resolved: https://github.com/pytorch/pytorch/pull/42538 Reviewed By: ezyang Differential Revision: D22942153 Pulled By: malfet fbshipit-source-id: edabbadde13800a3412d14c19ca55ef206ada5e1	2020-08-05 08:22:59 -07:00
Ivan Kobzarev	b85216887b	[vulkan] max_pool2d (#41379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41379 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22754944 Pulled By: IvanKobzarev fbshipit-source-id: 5261337bb731a207a1532e6423c0d33f1307e413	2020-08-05 01:53:52 -07:00
Michael Carilli	0f358fab6b	Hide cudnn symbols in libtorch_cuda.so when statically linking cudnn (#41986 ) Summary: This PR intends to fix https://github.com/pytorch/pytorch/issues/32983. The initial (one-line) diff causes statically linked cudnn symbols in `libtorch_cuda.so` to have local linkage (such that they shouldn't be visible to external libraries during dynamic linking at load time), at least in my source build on Ubuntu 20.04. Procedure I used to verify: ``` export USE_STATIC_CUDNN=ON python3 setup.py install ... ``` then ``` mcarilli@mcarilli-desktop:~/Desktop/mcarilli_github/pytorch/torch/lib$ nm libtorch_cuda.so \| grep cudnnCreate 00000000031ff540 t cudnnCreate 00000000031fbe70 t cudnnCreateActivationDescriptor ``` Before the diff they were marked with capital `T`s indicating external linkage. Caveats: - The fix is gcc-specific afaik. I have no idea how to enable it for Windows or other compilers. - Hiding the cudnn symbols will break external C++ applications that rely on linking `libtorch.so` to supply cudnn symbol definitions. IMO this is "off menu" usage so I don't think it's a major concern. Hiding the symbols _won't_ break applications that call cudnn indirectly through torch functions, which IMO is the "on menu" way. - I know _very little_ about the build system. The diff's intent is to add a link option that applies to any Pytorch `.so`s that statically link cudnn, and does so on Linux only. I'm blindly following soumith 's recommendation https://github.com/pytorch/pytorch/issues/32983#issuecomment-662056151, and post-checking the built libs (I also added `set(CMAKE_VERBOSE_MAKEFILE ON)` to the top-level CMakeLists.txt at one point to confirm `-Wl,--exclude-libs,libcudnn_static.a` was picked up by the command that linked `libtorch_cuda.so`). - https://github.com/pytorch/pytorch/issues/32983 (which used a Pytorch 1.4 binary build) complained about `libtorch.so`, not `libtorch_cuda.so`: ``` nvpohanh@ubuntu:~$ nm /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch.so \| grep ' cudnnCreate' 000000000f479c30 T cudnnCreate 000000000f475ff0 T cudnnCreateActivationDescriptor ``` In my source build, `libtorch.so` ends up small, containing no cudnn symbols (this is true with or without the PR's diff), which contradicts https://github.com/pytorch/pytorch/issues/32983. Maybe the symbol organization (what goes in `libtorch.so` vs `libtorch_cuda/cpu/whatever.so`) changed since 1.4. Or maybe the symbol organization is different for source vs binary builds, in which case I have no idea if this PR's diff has the same effect for a binary build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41986 Reviewed By: glaringlee Differential Revision: D22934926 Pulled By: malfet fbshipit-source-id: 711475834e0f8148f0e5f2fe28fca5f138ef494b	2020-08-04 22:59:40 -07:00
Mikhail Zolotukhin	882ad117cf	[TensorExpr] Fix a way we were createing np arrays in tests. (#42575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42575 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D22939119 Pulled By: ZolotukhinM fbshipit-source-id: 3388270af8eae9fd4747f06202f366887aaf5f36	2020-08-04 21:24:25 -07:00
Nikita Shulga	3c7fccc1c2	Reenable cusparse SpMM on cuda 10.2 (#42556 ) Summary: This fixes feature regression introduced by https://github.com/pytorch/pytorch/issues/42412 which limited all the use of the API to CUDA-11.0+ Pull Request resolved: https://github.com/pytorch/pytorch/pull/42556 Reviewed By: ngimel Differential Revision: D22932129 Pulled By: malfet fbshipit-source-id: 2756e0587456678fa1bc7deaa09d0ea482dfd19f	2020-08-04 19:02:34 -07:00
Basil Hosmer	78f4cff8fe	handle multiple returns properly in boxing wrappers (#42437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42437 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D22894191 Pulled By: bhosmer fbshipit-source-id: fd4c7bc605a4b20bb3882f71e3b8874150671324	2020-08-04 18:27:25 -07:00
Yongbin Gu	d45e2d3ef9	Reduce the output overhead of OutputColumnMaxHistogramObserver by enabling changing bin_nums, Update the observer_test.py Summary: Current OutputColumnMaxHistogramObserver will output 2048 bins for each column. The file will be extremely large and the dumping time is quite long. However, we only use the min and max finally. This diff enables changing bin_nums by adding an argument. And the default value is set to 16 to reduce dumping overhead. When we need more bins to analyze the results, we only need to change this argument Test Plan: buck run caffe2/caffe2/quantization/server:observer_test {F263843430} Reviewed By: hx89 Differential Revision: D22918202 fbshipit-source-id: bda34449355b269b24c55802012450ebaa4d280c	2020-08-04 17:07:25 -07:00
Nikita Shulga	61027a1a59	Install typing_extensions in PyTorch CI (#42551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42551 Reviewed By: seemethere Differential Revision: D22929256 Pulled By: malfet fbshipit-source-id: 9a6f8c56ca1c0fb8a8569614a34a12f2769755f3	2020-08-04 17:03:44 -07:00
Meghan Lele	29700c0092	[JIT] Fix torch.jit.is_tracing() (#42486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42486 Summary This commit fixes a small bug in which `torch.jit.is_tracing()` returns `torch._C.is_tracing`, the function object, instead of calling the function and returning the result. Test Plan Continuous integration? Fixes This commit fixes #42448. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D22911062 Pulled By: SplitInfinity fbshipit-source-id: b94eca0c1c65ca6f22acc6c5542af397f2dc37f0	2020-08-04 16:57:36 -07:00
BowenBao	afa489dea9	[ONNX] Enable lower_tuple pass for custom layer (#41548 ) Summary: Custom layer by `torch.autograd.Function` appears in the lower_tuple as `prim::PythonOp`. Adding this op type to the allowed list to enable lower_tuple pass. This helps with exporting custom layer with tuple outputs. E.g. ```python import torch class CustomFunction(torch.autograd.Function): staticmethod def symbolic(g, input): return g.op('CustomNamespace::Custom', input, outputs=2) staticmethod def forward(ctx, input): return input, input class Custom(torch.nn.Module): def forward(self, input): return CustomFunction.apply(input) model = Custom() batch = torch.FloatTensor(1, 3) torch.onnx.export(model, batch, "test.onnx", verbose=True) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/41548 Reviewed By: glaringlee Differential Revision: D22926143 Pulled By: bzinodev fbshipit-source-id: ce14d1d3c70a920154a8235d635ab31ddf0c46f3	2020-08-04 16:22:39 -07:00
Eli Uriegas	ccc831ae35	test: Disable test_strided_grad_layout on ROCM (#42561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42561 Regression was introduced as part of `5939d8a3e0`, logs: https://app.circleci.com/pipelines/github/pytorch/pytorch/196558/workflows/9a2dd56e-86af-4d0f-9fb9-b205dcd12f93/jobs/6502042 Going to go ahead and disable the test to give rocm folks time to investigate what's going on Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D22932615 Pulled By: seemethere fbshipit-source-id: 41150f3085f848cce75990716362261fea9391a0	2020-08-04 16:20:44 -07:00
Facebook Community Bot	c3e2ee725f	Automated submodule update: FBGEMM (#42496 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `87c378172a` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42496 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: dskhudia Differential Revision: D22911638 fbshipit-source-id: f20c83908b51ff56d8bf1d8b46961f70d023c81a	2020-08-04 16:15:26 -07:00
Ivan Yashchuk	b9e68e03c4	Fix the bug in THCTensor_(baddbmm) and ATen's addmm_cuda for strided views input (#42425 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42418. The problem was that the non-contiguous batched matrices were passed to `gemmStridedBatched`. The following code fails on master and works with the proposed patch: ```python import torch x = torch.tensor([[1., 2, 3], [4., 5, 6]], device='cuda:0') c = torch.as_strided(x, size=[2, 2, 2], stride=[3, 1, 1]) torch.einsum('...ab,...bc->...ac', c, c) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42425 Reviewed By: glaringlee Differential Revision: D22925266 Pulled By: ngimel fbshipit-source-id: a72d56d26c7381b7793a047d76bcc5bd45a9602c	2020-08-04 16:11:07 -07:00
Yanan Cao	317b9d3bfc	Implement sort for string in aten (#42398 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42375 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42398 Reviewed By: ailzhang Differential Revision: D22884849 Pulled By: gmagogsfm fbshipit-source-id: e53386949f0a5e166f3d1c2aa695294340bd1440	2020-08-04 15:25:35 -07:00
Nikita Shulga	56fc7d0345	Fix doc build (#42559 ) Summary: Add space between double back quotes and left curly bracket Otherwise doc generation failed with `Inline literal start-string without end-string.` This regression was introduced by `b56db305cf` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42559 Reviewed By: glaringlee Differential Revision: D22931527 Pulled By: malfet fbshipit-source-id: 11c04a92dbba48592505f704d77222cf92a81055	2020-08-04 15:15:15 -07:00
iurii zdebskyi	e995c3d21e	Add private API to support tensor lists: _foreach_add(TensorList tensors, Scalar scalar) (#41554 ) Summary: Initial PR for the Tensor List functionality. Motivation [GitHub issue](https://github.com/pytorch/pytorch/issues/38655) Current PyTorch optimizer implementations are not efficient in cases when we work with a lot of small feature tensors. Starting a lot of kernels slows down the whole process. We need to reduce the number of kernels that we start. As an example, we should be looking at [NVIDIAs Apex](https://github.com/NVIDIA/apex). In order to track progress, we will pick PyTorchs DCGAN model with Adam optimizer and once the optimizer is reimplemented with tensor lists, benchmark the model performance against original model version, Apexs version with original Adam optimizer and it’s FusedAdam optimizer. In this PR - Adding `multi_tensor_apply` mechanism which will help to efficiently apply passed functor on a given list of tensors on CUDA. - Adding a first private API - `std::vector<Tensor> _foreach_add(TensorList tensors, Scalar scalar)` Tests Tested via unit tests Plan for the next PRs 1. Cover these ops with `multi_tensor_apply` support - exponent - division - mul_ - add_ - addcmul_ - addcdiv_ - Sqrt 2. Rewrite PyTorch optimizers to use for-each operators in order to get performance gains. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41554 Reviewed By: cpuhrsch Differential Revision: D22829724 Pulled By: izdeby fbshipit-source-id: 47febdbf7845cf931958a638567b7428a24782b1	2020-08-04 15:01:09 -07:00
Eli Uriegas	a0695b34cd	.circleci: Have python docs always push to site (#42552 ) Summary: Was getting an error when attempting to push to master for pytorch/pytorch.github.io since the main branch on that repository is actually site and not master. Get rid of the loop too since the loop wasn't going to work with a conditional and conditionals on a two variable loop just isn't worth the readability concerns Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/42552 Reviewed By: malfet Differential Revision: D22929503 Pulled By: seemethere fbshipit-source-id: acdd26b86718304eac9dcfc81761de0b3e609004	2020-08-04 14:44:42 -07:00
Ivan Kobzarev	91d87292a6	[vulkan][asan] Fix Invalid Memory ops (#41224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41224 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22754940 Pulled By: IvanKobzarev fbshipit-source-id: f012b78a57f5f88897b2b6b91713090c8984a0bc	2020-08-04 14:33:49 -07:00
Ivan Kobzarev	0d1a689764	[vulkan] reshape op (#41223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41223 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22754942 Pulled By: IvanKobzarev fbshipit-source-id: 99fc5888803d6afe2a73bb5bbed6651d2ea98313	2020-08-04 14:32:06 -07:00
Omkar Salpekar	e97e87368e	Clean up CUDA Sleep and Tensor Initialization in ProcessGroupNCCLTest (#42211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42211 Helper functions for launching CUDA Sleep and Tensor Value Initialization for the collective test functions. This is more of a code cleanup fix compared to the previous diffs. ghstack-source-id: 109097243 Test Plan: working on devGPU and devvm Reviewed By: jiayisuse Differential Revision: D22782671 fbshipit-source-id: 7d88f568a4e08feae778669affe69c8d638973db	2020-08-04 12:36:27 -07:00
Omkar Salpekar	3ca361791f	TearDown function for ProcessGroupNCCLTest Initializer (#42209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42209 This PR adds a TearDown function to the testing superclass to ensure that the NCCL_BLOCKING_WAIT environment variable is reset after each test case. ghstack-source-id: 109097247 Test Plan: Working on devGPU and devvm. Reviewed By: jiayisuse Differential Revision: D22782672 fbshipit-source-id: 8f919a96d7112f9f167e90ce3df59886c88f3514	2020-08-04 12:36:24 -07:00
Omkar Salpekar	2b8e7e2f2d	Moving ProcessGroupNCCLTest to Gtest (#42208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42208 ProcessGroupNCCLTest is currently written without any testing framework, and all tests are simply called from the main function and throw exceptions upon failure. As a result, it is hard to debug and pinpoint which tests have succeeded/failed. This PR moves ProcessGroupNCCLTest to gtest with appropriate setup and skipping functionality in the test superclass. ghstack-source-id: 109097246 Test Plan: Working Correctly on devGPU and devvm. Reviewed By: jiayisuse Differential Revision: D22782673 fbshipit-source-id: 85bd407f4534f3d339ddcdd65ef3d2022aeb7064	2020-08-04 12:34:09 -07:00
Mikhail Zolotukhin	b3ffebda7a	[TensorExpr] Properly handle all dtypes of the condition in evaluation of IfThenElse exprs. (#42495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42495 Test Plan: Imported from OSS Reviewed By: nickgg Differential Revision: D22910753 Pulled By: ZolotukhinM fbshipit-source-id: f9ffd3dc4c50fb3fb84ce6d6916c1fbfd3201c8f	2020-08-04 12:25:56 -07:00
Mikhail Zolotukhin	c334ebf1aa	[TensorExpr] Properly handle all dtypes in evaluation of Intrinsics exprs. (#42494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42494 Note that we're currently assuming that dtypes of all the arguments and the return value is the same. Test Plan: Imported from OSS Reviewed By: nickgg Differential Revision: D22910755 Pulled By: ZolotukhinM fbshipit-source-id: 7f899692065428fbf2ad05d22b4ca39cab788ae5	2020-08-04 12:25:54 -07:00
Mikhail Zolotukhin	38a9984451	[TensorExpr] Properly handle all dtypes in evaluation of CompareSelect exprs. (#42493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42493 Test Plan: Imported from OSS Reviewed By: nickgg Differential Revision: D22910754 Pulled By: ZolotukhinM fbshipit-source-id: cf7073d6ea792998a9fa3989c7ec486419476de0	2020-08-04 12:24:03 -07:00
Eli Uriegas	5939d8a3e0	Revert "Revert D22360735: .circleci: Build docker images as part of C… (#40950 ) Summary: …I workflow" This reverts commit 3c6b8a64964b0275884359dd6a5bf484655d8c7c. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40950 Reviewed By: malfet Differential Revision: D22909883 Pulled By: seemethere fbshipit-source-id: 93c070400d7fbe1753f88c3291ab5eba4ab237fa	2020-08-04 12:12:17 -07:00
Ailing Zhang	4b42a5b5a1	Remove redundant kernels calling TypeDefault in VariableType codegen. (#42031 ) Summary: We have code snippet like below in VariableType_X.cpp ``` Tensor __and___Scalar(const Tensor & self, Scalar other) { auto result = TypeDefault::__and___Scalar(self, other); return result; } TORCH_LIBRARY_IMPL(aten, Autograd, m) { m.impl("__and__.Scalar", c10::impl::hacky_wrapper_for_legacy_signatures(TORCH_FN(VariableType::__and___Scalar)) ); ``` We already register TypeDefault kernels as catchAll so they're not needed to be wrapped and register to Autograd key in VariableType.cpp. This PR removes the wrapper and registration in VariableType.cpp. (The ones in other files like TracedType.cpp remains the same). Here's a [diff in generated VariableTypeEverything.cpp](https://gist.github.com/ailzhang/18876edec4dad54e43a1db0c127c5707) Pull Request resolved: https://github.com/pytorch/pytorch/pull/42031 Reviewed By: agolynski Differential Revision: D22903507 Pulled By: ailzhang fbshipit-source-id: 04e6672b6c79e079fc0dfd95c409ebca7f9d76fc	2020-08-04 11:56:15 -07:00
Will Constable	94e8676a70	Initialize uninitialized variable (#42419 ) Summary: Fixes internal T70924595 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42419 Reviewed By: allwu, Krovatkin Differential Revision: D22889325 Pulled By: wconstab fbshipit-source-id: 108b6a6c6bb7c98d77e22bae9974a6c00bc296f0	2020-08-04 11:35:54 -07:00
gunandrose4u	d2a2ac4eea	Fix read/write bulk data (#42504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42504 Reviewed By: glaringlee Differential Revision: D22922750 Pulled By: mrshenli fbshipit-source-id: 9008fa22c00513bd75c3cf88a3081184cd72b0e3	2020-08-04 11:30:53 -07:00
Natalia Gimelshein	ec898b1ab5	fix discontiguous inputs/outputs for cummin/cummax (#42507 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42363 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42507 Reviewed By: mruberry Differential Revision: D22917876 Pulled By: ngimel fbshipit-source-id: 05f3f4a55bcddf6a853552184c9fafcef8d36270	2020-08-04 10:12:07 -07:00
Srinivas Sridharan	ecb88c5d11	Add NCCL Alltoall to PT NCCL process group (#42514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42514 Add Alltoall and Alltoallv to PT NCCL process group using NCCL Send/Recv. Reviewed By: mrshenli Differential Revision: D22917967 fbshipit-source-id: 402f2870915bc237845864a4a27c97df4351d975	2020-08-04 08:39:28 -07:00
Zhicheng Chen	b56db305cf	Improve the documentation of DistributedDataParallel (#42471 ) Summary: Fixes #{issue number} It's not clear by illustrating 'gradients from each node are averaged' in the documentation of DistributedDataParallel. Many people, including me, have a totally wrong understanding on this part. I add a note into the documentation to make it more straight forward and more user friendly. Here is some toy code to illustrate my point: * non-DistributedDataParallel version ```python import torch import torch.nn as nn x = torch.tensor([-1, 2, -3, 4], dtype=torch.float).view(-1, 1) print("input:", x) model = nn.Linear(in_features=1, out_features=1, bias=False) model.weight.data.zero_() model.weight.data.add_(1.0) opti = torch.optim.SGD(model.parameters(), lr=0.001) opti.zero_grad() y = model(x) label = torch.zeros(4, 1, dtype=torch.float) loss = torch.sum((y - label)*2) loss.backward() opti.step() print("grad:", model.weight.grad) print("updated weight:\n", model.weight) # OUTPUT # $ python test.py # input: tensor([[-1.], # [ 2.], # [-3.], # [ 4.]]) # grad: tensor([[60.]]) # updated weight: # Parameter containing: # tensor([[0.9400]], requires_grad=True) ``` DistributedDataParallel version ```python import os import torch import torch.nn as nn import torch.distributed as dist from torch.multiprocessing import Process def run(rank, size): x = torch.tensor([-(1 + 2 * rank), 2 + 2 * rank], dtype=torch.float).view(-1, 1) print("input:", x) model = nn.Linear(in_features=1, out_features=1, bias=False) model.weight.data.zero_() model.weight.data.add_(1.0) model = torch.nn.parallel.DistributedDataParallel(model) opti = torch.optim.SGD(model.parameters(), lr=0.001) opti.zero_grad() y = model(x) label = torch.zeros(2, 1, dtype=torch.float) loss = torch.sum((y.view(-1, 1) - label)**2) loss.backward() opti.step() if rank == 0: print("grad:", model.module.weight.grad) print("updated weight:\n", model.module.weight) def init_process(rank, size, fn, backend="gloo"): os.environ['MASTER_ADDR'] = '127.0.0.1' os.environ['MASTER_PORT'] = '29500' dist.init_process_group(backend, rank=rank, world_size=size) fn(rank, size) if __name__ == "__main__": size = 2 process = [] for rank in range(size): p = Process(target=init_process, args=(rank, size, run)) p.start() process.append(p) for p in process: p.join() # OUTPUT # $ python test_d.py # input: tensor([[-3.], # [ 4.]])input: tensor([[-1.], # [ 2.]]) # grad: tensor([[30.]]) # updated weight: # Parameter containing: # tensor([[0.9700]], requires_grad=True) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42471 Reviewed By: glaringlee Differential Revision: D22923340 Pulled By: mrshenli fbshipit-source-id: 40b8c8ba63a243f857cd5976badbf7377253ba82	2020-08-04 08:36:42 -07:00
Richard Zou	f3e8fff0d2	Batching rules for: chunk, split, unbind (#42480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42480 These are grouped together because they all return a tuple of multiple tensors. This PR implements batching rules for chunk, split, and unbind. It also updates the testing logic. Previously, reference_vmap was not able to handle multiple outputs, now, it does. Test Plan: - `pytest test/test_vmap.py -v -k "Operators"` Reviewed By: ezyang Differential Revision: D22905401 Pulled By: zou3519 fbshipit-source-id: 9963c943d035e9035c866be74dbdf7ab1989f8c4	2020-08-04 08:33:43 -07:00
Richard Zou	f1d7f001b9	Batching rules for: torch.movedim, torch.narrow, Tensor.unfold (#42474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42474 Test Plan: - `pytest test/test_vmap.py -v -k "Operators"` Reviewed By: ezyang Differential Revision: D22903513 Pulled By: zou3519 fbshipit-source-id: 06b3fb0c7d12b9a045c73a5c5a4f4e3207e07b02	2020-08-04 08:33:41 -07:00
Richard Zou	01cd613e7e	Batching rules for: T, view, view_as, reshape, reshape_as (#42458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42458 Test Plan: - `pytest test/test_vmap.py -v -k "Operators"` Reviewed By: ezyang Differential Revision: D22898715 Pulled By: zou3519 fbshipit-source-id: 47f374962697dcae1d5aec80a41085679d016f92	2020-08-04 08:31:33 -07:00
Guilherme Leobas	0c48aa1e07	Add typing annotations to hub.py and _jit_internal.py (#42252 ) Summary: xref: https://github.com/pytorch/pytorch/wiki/Guide-for-adding-type-annotations-to-PyTorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/42252 Reviewed By: malfet Differential Revision: D22916480 Pulled By: ezyang fbshipit-source-id: 392ab805b0023640a3b5cdf600f70638b375f84f	2020-08-04 08:20:44 -07:00
Nikita Shulga	d21e345ef0	Fix segfault in `THPGenerator_dealloc` (take 2) (#42510 ) Summary: Segfault happens when one tries to deallocate uninitialized generator. Make `THPGenerator_dealloc` UBSAN-safe by moving implicit cast in the struct definition to reinterpret_cast Add `TestTorch.test_invalid_generator_raises` that validates that Generator created on invalid device is handled correctly Fixes https://github.com/pytorch/pytorch/issues/42281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42510 Reviewed By: pbelevich Differential Revision: D22917469 Pulled By: malfet fbshipit-source-id: 5eaa68eef10d899ee3e210cb0e1e92f73be75712	2020-08-04 08:06:08 -07:00
Yinghai Lu	8850fd1952	Add python inferface to create OfflineTensor (#42516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42516 att. We need it for some scripts. Reviewed By: houseroad Differential Revision: D22918112 fbshipit-source-id: 8a1696ceeeda67a34114bc57cb52c925711cfb4c	2020-08-04 01:31:34 -07:00
Mike Ruberry	ae67f4c8b8	Revert D22845258: [pytorch][PR] [ONNX] Enable scripting tests and update jit passes Test Plan: revert-hammer Differential Revision: D22845258 (`04e55d69f9`) Original commit changeset: d57fd4086f27 fbshipit-source-id: 15aa5cdae496a5e8ce2d8739a06dd4a7edc2200c	2020-08-03 23:15:06 -07:00
BowenBao	842759591d	[ONNX] Refactor ONNX fixup for Loop and If (#40943 ) Summary: * move both under new file `fixup_onnx_controlflow` * move the fixup to where the ONNX loop/if node is created, as oppose to running the fixup as postpass. This will help with enable onnx shape inference later. * move `fuseSequenceSplitConcat` to `Peephole`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40943 Reviewed By: mrshenli Differential Revision: D22709999 Pulled By: bzinodev fbshipit-source-id: 51d316991d25dc4bb4047a6bb46ad1e2401d3d2d	2020-08-03 22:33:17 -07:00
Nikita Shulga	55d2a732cd	Skip part of test_figure[_list] if Matplotlib-3.3.0 is installed (#42500 ) Summary: See https://github.com/matplotlib/matplotlib/issues/18163 for more details Fixes https://github.com/pytorch/pytorch/issues/41680 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42500 Reviewed By: ezyang Differential Revision: D22915857 Pulled By: malfet fbshipit-source-id: 4f8858b7b0018c6958a49f908de81a13a29e6046	2020-08-03 21:43:22 -07:00
Spandan Tiwari	49e06e305f	[ONNX] Updating input node removal in ONNX function_substitution pass. (#42146 ) Summary: ONNX pass `torch._C._jit_pass_onnx_function_substitution(graph)` inlines the function with the compiled torch graph. But while it removes all connections with the compiled function node (e.g. see below - `%6 : Function = prim::Constant[name="f"]()`), it does not remove the function node itself. For example, if the input graph is: ``` graph(%0 : Long(requires_grad=0, device=cpu), %1 : Long(requires_grad=0, device=cpu)): %6 : Function = prim::Constant[name="f"]() %7 : Tensor = prim::CallFunction(%6, %0, %1) return (%7) ``` The output graph is: ``` graph(%0 : Long(requires_grad=0, device=cpu), %1 : Long(requires_grad=0, device=cpu)): %6 : Function = prim::Constant[name="f"]() %8 : int = prim::Constant[value=1]() %z.1 : Tensor = aten::sub(%0, %1, %8) # test/onnx/test_utility_funs.py:790:20 %10 : Tensor = aten::add(%0, %z.1, %8) # test/onnx/test_utility_funs.py:791:23 return (%10) ``` Note that the `%6 : Function = prim::Constant[name="f"]()` has not been removed (though it is not being used). This PR updates the pass to remove the function node completely. The updated graph looks as follows: ``` graph(%0 : Long(requires_grad=0, device=cpu), %1 : Long(requires_grad=0, device=cpu)): %8 : int = prim::Constant[value=1]() %z.1 : Tensor = aten::sub(%0, %1, %8) # test/onnx/test_utility_funs.py:790:20 %10 : Tensor = aten::add(%0, %z.1, %8) # test/onnx/test_utility_funs.py:791:23 return (%10) ``` A test point has also been added for this scenario. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42146 Reviewed By: VitalyFedyunin Differential Revision: D22845314 Pulled By: bzinodev fbshipit-source-id: 81fb351f0a36f47204e5327b60b84d7a91d3bcd9	2020-08-03 21:31:19 -07:00
Nikita Shulga	0cb86afd72	Revert D22908795: [pytorch][PR] Fix segfault in `THPGenerator_dealloc` Test Plan: revert-hammer Differential Revision: D22908795 (`d3acfe3ba8`) Original commit changeset: c5b6a35db381 fbshipit-source-id: c7559c382fced23cef683c8c90cff2d6012801ec	2020-08-03 21:03:44 -07:00
Ralf Gommers	dc1f87c254	Add typing_extensions as a dependency. (#42431 ) Summary: Closes gh-38221. The related pytorch/builder PR: https://github.com/pytorch/builder/pull/475 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42431 Reviewed By: malfet Differential Revision: D22916499 Pulled By: ezyang fbshipit-source-id: c8fe9413b62fc7a6b829fc82aaf32531b55994d1	2020-08-03 20:06:16 -07:00
Xiao Wang	c8cb5e5bcb	Relax cusparse windows guard on cuda 11 (#42412 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42406 ### cusparse Xcsrmm2 API: (https://github.com/pytorch/pytorch/issues/37202) - new: https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-generic-function-spmm - old (deprecated in cuda 11): https://docs.nvidia.com/cuda/archive/10.2/cusparse/index.html#csrmm2 Before: \|cuda ver \| windows \| linux \| \|--\|--\|--\| \| 10.1 \| old api \| old api \| \| 10.2 \| old api \| new api \| \| 11 \| old api (build error claimed in https://github.com/pytorch/pytorch/issues/42406) \| new api \| After: \|cuda ver \| windows \| linux \| \|--\|--\|--\| \| 10.1 \| old api \| old api \| \| 10.2 \| old api \| old api \| \| 11 \| new api \| new api \| ### cusparse bmm-sparse-dense API <details><summary>reverted, will be revisited in the future</summary> (cc kurtamohler https://github.com/pytorch/pytorch/issues/33430) - new: https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-generic-function-spmm Before: \|cuda ver \| windows \| linux \| \|--\|--\|--\| \| 10.1 \| not supported \| new api \| \| 10.2 \| not supported \| new api \| \| 11 \| not supported \| new api \| After: \|cuda ver \| windows \| linux \| \|--\|--\|--\| \| 10.1 \| not supported \| new api \| \| 10.2 \| not supported \| new api \| \| 11 \| new api \| new api \| </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/42412 Reviewed By: agolynski Differential Revision: D22892032 Pulled By: ezyang fbshipit-source-id: cded614af970f0efdc79c74e18e1d9ea8a46d012	2020-08-03 19:59:59 -07:00
Christian Puhrsch	24199e0768	tuple_map / tuple_concat (#42326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42326 ghstack-source-id: 108868289 Test Plan: Unit tests Reviewed By: smessmer Differential Revision: D22846504 fbshipit-source-id: fa9539d16e21996bbd80db3e3c524b174b22069e	2020-08-03 19:19:47 -07:00
BowenBao	1b18adb7e8	[ONNX] Export static as_strided (#41569 ) Summary: `as_strided` creates a view of an existing tensor with specified `sizes`, `strides`, and `storage_offsets`. This PR supports the export of `as_strided` with static argument `strides`. The following scenarios will not be supported: * Calling on tensor of dynamic shape, i.e. the tensor shape differs between model runs and different model inputs. * In-place operations, i.e. updates to the original tensor that are expected to reflect in the `as_strided` output, and vice versa. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41569 Reviewed By: VitalyFedyunin Differential Revision: D22845295 Pulled By: bzinodev fbshipit-source-id: 7d1aa88a810e6728688491478dbf029f17ae7201	2020-08-03 18:56:40 -07:00
Negin Raoof	04e55d69f9	[ONNX] Enable scripting tests and update jit passes (#41413 ) Summary: This PR initiates the process of updating the torchsciprt backend interface used by ONNX exporter. - Replace jit lower graph pass by freeze module pass - Enable ScriptModule tests for ONNX operator tests (ORT backend) and model tests by default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41413 Reviewed By: VitalyFedyunin Differential Revision: D22845258 Pulled By: bzinodev fbshipit-source-id: d57fd4086f27bd0c3bf5f70af7fd0daa39a2814a	2020-08-03 18:51:19 -07:00
BowenBao	c000b890a8	[ONNX] Export torch.eye to ONNX::EyeLike (#41357 ) Summary: Export dynamic torch.eye, i.e. commonly from another tensor, shape for torch.eye is not known at export time. Static torch.eye where n,m are constants is exported as constant tensor directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41357 Reviewed By: VitalyFedyunin Differential Revision: D22845220 Pulled By: bzinodev fbshipit-source-id: 6e5c331fa28ca542022ea16f9c88c69995a393b2	2020-08-03 18:51:17 -07:00
Yanan Cao	fb56299d4a	Fix check highlight in filecheck. (#42417 ) Summary: * It originally failed to check for cases where highlight token appears more than once. * Now it repeated tries to find highlight token if one doesn't seem correctly highlighted until end of error message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42417 Reviewed By: SplitInfinity Differential Revision: D22889411 Pulled By: gmagogsfm fbshipit-source-id: 994835db32849f3d7e98ab7f662bd5c6b8a1662e	2020-08-03 18:49:22 -07:00
Natalia Gimelshein	7a5708832f	fix masked_select for discontiguous outputs (#41841 ) Summary: This fixes https://github.com/pytorch/pytorch/issues/41473 for discontiguous input, mask and out. Tests to follow. Reverting https://github.com/pytorch/pytorch/issues/33269 is not a great solution because I'm told masked_select was needed for printing complex tensors. cc gchanan , zou3519, ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/41841 Reviewed By: mruberry Differential Revision: D22706943 Pulled By: ngimel fbshipit-source-id: 413d7fd3f3308b184de04fd56b8a9aaabcad22fc	2020-08-03 18:43:45 -07:00
Ann Shan	d707d4bf6d	Implement a light SGD optimizer (#42137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42137 This PR implements an SGD optimizer class similar to torch::optim::SGD, but it doesn't inherit from torch::optim::Optimizer, for use on mobile devices (or other lightweight use case). Adding Martin's comment for visibility: "SGD may be the only optimizer used in near future. If more client optimizers are needed, refactoring the full optim codes and reusing the existing code would be an option." Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D22846514 Pulled By: ann-ss fbshipit-source-id: f5f46804aa021e7ada7c0cd3f16e24404d10c7eb	2020-08-03 17:27:53 -07:00
Eli Uriegas	934b68f866	ecr_gc: Iterate through all tags, reduce prints (#42492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42492 There's a potential for multiple tags to be created for the same digest so we should iterate through all potential tags so that we're not deleting digests that are associated with tags that we actually want. Also, reduced the number of prints in this script to only the absolutely necessary prints. (i.e. only the deleted images) Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22909248 Pulled By: seemethere fbshipit-source-id: 7f2e540d133485ed6464e413b01ef67aa73df432	2020-08-03 16:59:56 -07:00
Nikita Shulga	d3acfe3ba8	Fix segfault in `THPGenerator_dealloc` (#42490 ) Summary: Segfault happens when one tries to deallocate unintialized generator Add `TestTorch.test_invalid_generator_raises` that validates that Generator created on invalid device is handled correctly Fixes https://github.com/pytorch/pytorch/issues/42281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42490 Reviewed By: seemethere Differential Revision: D22908795 Pulled By: malfet fbshipit-source-id: c5b6a35db381738c0fc984aa54e5cab5ef2cbb76	2020-08-03 16:28:34 -07:00
Yinghai Lu	dbdd28207c	Expose a generic shape info struct for ONNXIFI Python interface (#42421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42421 Previously, we can only feed shape info from Python with float dtype, and batch based dim type when we do onnxifi from Python. This diff removes this limitation and uses TensorBoundShapes protobuf as a generic shape info struct. This will make the onnxifi interface in Python more flexible. Reviewed By: ChunliF Differential Revision: D22889781 fbshipit-source-id: 1a89f3a68c215a0409738c425b4e0d0617d58245	2020-08-03 16:10:05 -07:00
Kimish Patel	f0fd1cc873	Calculate inverse of output scale first. (#41342 ) Summary: This is to unify how output scale calculation is to be done between fbgemm and qnnpack (servers vs mobile). Pull Request resolved: https://github.com/pytorch/pytorch/pull/41342 Test Plan: Quantization tests. Reviewed By: vkuzo Differential Revision: D22506347 Pulled By: kimishpatel fbshipit-source-id: e14d22f13c6e751cafa3e52617e76ecd9d39dad5	2020-08-03 14:45:08 -07:00
Jerry Zhang	c3236b6649	[quant] Expose register activation post process hook function to user (#42342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42342 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D22856711 fbshipit-source-id: d6ad080c82b744ae1147a656c321c448ac5e7f10	2020-08-03 12:28:42 -07:00
Eli Uriegas	1b9cd747cf	Revert "Conda build (#38796 )" (#42472 ) Summary: This reverts commit 9c7ca89ae637a9cea52b4fee0877adc7485f4eb7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42472 Reviewed By: ezyang, agolynski Differential Revision: D22903382 Pulled By: seemethere fbshipit-source-id: e2b01537bcdf6c50d967329833cb6450a75b8247	2020-08-03 12:08:13 -07:00
aviloria	0eb513beef	Set a proper type for a variable (#42453 ) Summary: `ninputs` variable was always used as a `size_t` but declared as an `int32_t` Now, some annoying warnings are fixed Pull Request resolved: https://github.com/pytorch/pytorch/pull/42453 Reviewed By: agolynski Differential Revision: D22898282 Pulled By: mrshenli fbshipit-source-id: b62d6b07f0bc3717482906df6010d88762ae0ccd	2020-08-03 11:44:37 -07:00
Hong Xu	34025eb826	Vectorize arange (#38697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38697 Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP): ```python import timeit for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'): for n, t in [(40_000, 50000), (400_000, 5000)]: print(f'torch.arange(0, {n}, dtype={dtype}) for {t} times') print(timeit.timeit(f'torch.arange(0, {n}, dtype={dtype})', setup=f'import torch', number=t)) ``` Before: ``` torch.arange(0, 40000, dtype=torch.double) for 50000 times 1.587841397995362 torch.arange(0, 400000, dtype=torch.double) for 5000 times 0.47885190199303906 torch.arange(0, 40000, dtype=torch.float) for 50000 times 1.5519152240012772 torch.arange(0, 400000, dtype=torch.float) for 5000 times 0.4733216500026174 torch.arange(0, 40000, dtype=torch.uint8) for 50000 times 1.426058754004771 torch.arange(0, 400000, dtype=torch.uint8) for 5000 times 0.43596178699226584 torch.arange(0, 40000, dtype=torch.int8) for 50000 times 1.4289699140063021 torch.arange(0, 400000, dtype=torch.int8) for 5000 times 0.43451592899509706 torch.arange(0, 40000, dtype=torch.int16) for 50000 times 0.5714442400058033 torch.arange(0, 400000, dtype=torch.int16) for 5000 times 0.14837959500437137 torch.arange(0, 40000, dtype=torch.int32) for 50000 times 0.5964003179979045 torch.arange(0, 400000, dtype=torch.int32) for 5000 times 0.15676555599202402 torch.arange(0, 40000, dtype=torch.int64) for 50000 times 0.8390555799996946 torch.arange(0, 400000, dtype=torch.int64) for 5000 times 0.23184613398916554 ``` After: ``` torch.arange(0, 40000, dtype=torch.double) for 50000 times 0.6895066159922862 torch.arange(0, 400000, dtype=torch.double) for 5000 times 0.16820953000569716 torch.arange(0, 40000, dtype=torch.float) for 50000 times 1.3640095089940587 torch.arange(0, 400000, dtype=torch.float) for 5000 times 0.39255041000433266 torch.arange(0, 40000, dtype=torch.uint8) for 50000 times 0.3422072059911443 torch.arange(0, 400000, dtype=torch.uint8) for 5000 times 0.0605111670010956 torch.arange(0, 40000, dtype=torch.int8) for 50000 times 0.3449254590086639 torch.arange(0, 400000, dtype=torch.int8) for 5000 times 0.06115841199061833 torch.arange(0, 40000, dtype=torch.int16) for 50000 times 0.7745441729930462 torch.arange(0, 400000, dtype=torch.int16) for 5000 times 0.22106765500211623 torch.arange(0, 40000, dtype=torch.int32) for 50000 times 0.720475220005028 torch.arange(0, 400000, dtype=torch.int32) for 5000 times 0.20230313099455088 torch.arange(0, 40000, dtype=torch.int64) for 50000 times 0.8144655400101328 torch.arange(0, 400000, dtype=torch.int64) for 5000 times 0.23762561299372464 ``` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22291236 Pulled By: VitalyFedyunin fbshipit-source-id: 134dd08b77b11e631d914b5500ee4285b5d0591e	2020-08-03 11:14:57 -07:00
Hong Xu	fa6e900e8c	Let TensorIterator::nullary_op support check_mem_overlap option (#38693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38693 Test Plan: Imported from OSS Differential Revision: D22291237 Pulled By: VitalyFedyunin fbshipit-source-id: 5bc96e617ed36ed076da73e3d019699f2efd6e4e	2020-08-03 11:13:04 -07:00
Edward Yang	ed44269edc	Add missing space after -> for topk.values (#42321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42321 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22846520 Pulled By: ezyang fbshipit-source-id: 7c0ab0b019d05a13309c3b8d770582414795799f	2020-08-03 10:10:20 -07:00
Shen Li	326d777e53	Convert _wait_all_workers to _all_gather (#42276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42276 This commit converts `_wait_all_workers()` to `_all_gather()` by allowing each worker to provide its own data object. The `_all_gather()` function blocks and returns the gathered results. This API can be converted to `rpc.barrier()` latter. Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D22853480 Pulled By: mrshenli fbshipit-source-id: 9d506813b9fd5b7c144885e2b76a863cbd19466a	2020-08-03 08:48:45 -07:00
Shen Li	ebde590864	Remove debug vestige (#42277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42277 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D22853481 Pulled By: mrshenli fbshipit-source-id: 74e58c532d8f872c1dd830573b2a4c4c86410de2	2020-08-03 08:46:38 -07:00
Richard Zou	4cdbe5c495	Implement batching rules for some view ops (#42248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42248 Including: - torch.diagonal - torch.t - torch.select - Tensor.expand_as - Tensor slicing. Please let me know in the future if it would be easier to review these separately (I put five operators into this PR because each implementation is relatively simple). Test Plan: - new tests in `test/test_vmap.py`. - I would like to have a more structured/automated way of testing but my previous attempts at making something resulted in something very complicated. Reviewed By: ezyang Differential Revision: D22846273 Pulled By: zou3519 fbshipit-source-id: 8e45ebe11174512110faf1ee0fdc317a25e8b7ac	2020-08-03 08:01:48 -07:00
Richard Zou	2f8d5b68fa	vmap fallback kernel (#41943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41943 If an operator doesn't have a batching rule implemented then we fallback to this implementation. The fallback only works on out-of-place operators that return only tensors with new memory. (e.g., no in-place operators, no view operations). The fallback effectively takes all of the BatchedTensors in `stack`, slices them, and runs `op` on all of the corresponding slices to produce slices of the outputs. The output slices then get `torch.stack`ed to create the final returns. The performance of the fallback is not very good because it introduces an extra copy from stacking the sliced outputs. Because of this, we prefer to write batching rules for operators whenever possible. In the future, I'd like to disable the fallback kernel for random functions until we have a better random story for vmap. I will probably add a blocklist of operators to support that. Test Plan: - `pytest test/test_vmap.py -v` Reviewed By: ezyang Differential Revision: D22764103 Pulled By: zou3519 fbshipit-source-id: b235833f7f27e11fb76a8513357ac3ca286a638b	2020-08-03 07:59:33 -07:00
peter	192487d716	Update MAGMA to 2.5.3 for Windows (#42410 ) Summary: In order to introduce CUDA 11 build jobs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42410 Reviewed By: malfet Differential Revision: D22892025 Pulled By: ezyang fbshipit-source-id: 11bd7507f623d654a589ba00a138f6b947990f4c	2020-08-03 07:43:09 -07:00
Xing Wang	ebfff31e19	[distributedhogwild] Introducing new tags for distributed hogwild. (#42381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42381 Introduce new tag to support distributed hogwild. Reviewed By: boryiingsu Differential Revision: D20484099 fbshipit-source-id: 5973495589e0a7ab185d3867b37437aa747f408a	2020-08-03 07:10:44 -07:00
Martin Yuan	bfa94487b9	Remove register_mobile_autograd.cpp. (#42397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42397 Since the autograd registration is unified to code-gen, we don't need to keep a manual registration file for mobile. Remove it to avoid extra maintenance. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D22883153 Pulled By: iseeyuan fbshipit-source-id: 6db0bd89369beab9eed6e9a9692dd46f5bd1ff48	2020-08-02 14:14:33 -07:00
Hong Xu	91c80d122a	torch.gcd: Do not use std::abs() because it does not have an unsigned integer overload (#42254 ) Summary: `abs` doesn't have an signed overload across all compilers, so applying abs on uint8_t can be ambiguous: https://en.cppreference.com/w/cpp/numeric/math/abs This may cause unexpected issue when the input is uint8 and is greater than 128. For example, on MSVC, applying `std::abs` on an unsigned char variable ```c++ #include <cmath> unsigned char a(unsigned char x) { return std::abs(x); } ``` gives the following warning: warning C4244: 'return': conversion from 'int' to 'unsigned char', possible loss of data Pull Request resolved: https://github.com/pytorch/pytorch/pull/42254 Reviewed By: VitalyFedyunin Differential Revision: D22860505 Pulled By: mruberry fbshipit-source-id: 0076d327bb6141b2ee94917a1a21c22bd2b7f23a	2020-08-01 23:03:33 -07:00
Mike Ruberry	4cbf18ccc3	Enables integer -> float type promotion in TensorIterator (#42359 ) Summary: Many ufuncs (mostly unary ufuncs) in NumPy promote integer inputs to float. This typically occurs when the results of the function are not representable as integers. For example: ``` a = np.array([1, 2, 3], dtype=np.int64) np.sin(a) : array([0.84147098, 0.90929743, 0.14112001]) ``` In PyTorch we only have one function, `torch.true_divide`, which exhibits this behavior today, and it did it by explicitly pre-casting its inputs to the default (float) scalar type where necessary before calling TensorIterator. This PR lets TensorIterator understand and implement this behavior directly, and it updates `torch.true_divide` to verify the behavior is properly implemented. This will be convenient when implementing more integer->float promotions later (like with `torch.sin`), and also saves copies on CUDA, where the cast from one dtype to another is fused with the computation. The mechanism for this change is simple. A new flag, `promote_integer_inputs_to_float_` is added to TensorIteratorConfig, and it requires `promote_integer_inputs_to_float_` be true if it's set. When the new flag is set, after the TensorIterator's "common dtype" (AKA "computation type") is computed it's checked for being an integral (boolean included) type and, if it is, changed to the default (float) scalar type, instead. Only `torch.true_divide` sets this flag (for now). In the future we'll likely... - provide helpers (`binary_float_op`, `unary_float_op`) to more easily construct functions that promote int->float instead of requiring they build their own TensorIteratorConfigs. - update torch.atan2 to use `binary_float_op` - update many unary ufuncs, like `torch.sin` to use `unary_float_op` and support unary ops having different input and result type (this will also require a small modification to some of the "loops" code) Pull Request resolved: https://github.com/pytorch/pytorch/pull/42359 Reviewed By: ngimel Differential Revision: D22878394 Pulled By: mruberry fbshipit-source-id: b8de01e46be859321522da411aed655e2c40e5b9	2020-08-01 22:41:00 -07:00
taiyuanz	d403983695	Support List[str].index (#39210 ) (#40348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40348 Test Plan: Imported from OSS Reviewed By: wanchaol Differential Revision: D22757035 Pulled By: firstprayer fbshipit-source-id: 4fadf8beabf8d5bdfa5b0a185075f7caf9ba8b02	2020-08-01 13:47:25 -07:00
Yanan Cao	bdcf320bed	Support custom exception message (#41907 ) Summary: Raise and assert used to have a hard-coded error message "Exception". User provided error message was ignored. This PR adds support to represent user's error message in TorchScript. This breaks backward compatibility because now we actually need to script the user's error message, which can potentially contain unscriptable expressions. Such programs can break when scripting, but saved models can still continue to work. Increased an op count in test_mobile_optimizer.py because now we need aten::format to form the actual exception message. This is built upon an WIP PR: https://github.com/pytorch/pytorch/pull/34112 by driazati Pull Request resolved: https://github.com/pytorch/pytorch/pull/41907 Reviewed By: ngimel Differential Revision: D22778301 Pulled By: gmagogsfm fbshipit-source-id: 2b94f0db4ae9fe70c4cd03f4048e519ea96323ad	2020-08-01 13:03:45 -07:00
Xiaomeng Yang	5769b06ab5	[Caffe2] Remove explicitly divide by zero in SpatialBN training mode (#42380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42380 [Caffe2] Remove explicitly divide by zero in SpatialBN training mode Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:spatial_bn_op_test Reviewed By: houseroad Differential Revision: D22873214 fbshipit-source-id: 70b505391b5db02b45fc46ecd7feb303e50c6280	2020-08-01 11:54:58 -07:00
Nikita Shulga	115d226498	Pin NumPy version on MacOS testers to 1.18.5 (#42409 ) Summary: Otherwise numba linking by clang-9 fails with: ``` ld: in /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/numpy/core/lib/libnpymath.a(npy_math.o), could not parse object file /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/numpy/core/lib/libnpymath.a(npy_math.o): 'Unknown attribute kind (61) (Producer: 'LLVM10.0.0' Reader: 'LLVM APPLE_1_902.0.39.2_0')', using libLTO version 'LLVM version 9.1.0, (clang-902.0.39.2)' for architecture x86_64 ``` Because conda's numpy-1.19.1 is compiled with clang-10 This should fix MacOS regressions in CIrcleCI Pull Request resolved: https://github.com/pytorch/pytorch/pull/42409 Reviewed By: xw285cornell Differential Revision: D22887683 Pulled By: malfet fbshipit-source-id: d58ee9bf53772b57c59e18f71151916d4f0a3c7d	2020-08-01 09:22:23 -07:00
Mike Ruberry	2912390662	Limits cpu scalar error message to where it's appropriate (#42360 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40986. TensorIterator's test for a CUDA kernel getting too many CPU scalar inputs was too permissive. This update limits the check to not consider outputs and to only be performed if the kernel can support CPU scalars. A test is added to verify the appropriate error message is thrown in a case where the old error message was thrown previously. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42360 Reviewed By: ngimel Differential Revision: D22868536 Pulled By: mruberry fbshipit-source-id: 2bc8227978f8f6c0a197444ff0c607aeb51b0671	2020-08-01 02:04:30 -07:00
Kurt Mohler	206db5c127	Improve `torch.norm` functionality, errors, and tests (#41956 ) Summary: BC-Breaking Note: BC breaking changes in the case where keepdim=True. Before this change, when calling `torch.norm` with keepdim=True and p='fro' or p=number, leaving all other optional arguments as their default values, the keepdim argument would be ignored. Also, any time `torch.norm` was called with p='nuc', the result would have one fewer dimension than the input, and the dimensions could be out of order depending on which dimensions were being reduced. After the change, for each of these cases, the result has the same number and order of dimensions as the input. PR Summary: * Fix keepdim behavior * Throw descriptive errors for unsupported sparse norm args * Increase unit test coverage for these cases and for complex inputs These changes were taken from part of PR https://github.com/pytorch/pytorch/issues/40924. That PR is not going to be merged because it overrides `torch.norm`'s interface, which we want to avoid. But these improvements are still useful. Issue https://github.com/pytorch/pytorch/issues/24802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41956 Reviewed By: albanD Differential Revision: D22837455 Pulled By: mruberry fbshipit-source-id: 509ecabfa63b93737996f48a58c7188b005b7217	2020-08-01 01:55:12 -07:00
Mustafa Said Mehmetoglu	44b018ddeb	Convert ProcessGroupNCCLTest.cpp to gtest unittest (#42365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42365 Converting the test Reviewed By: malfet Differential Revision: D22855087 fbshipit-source-id: dc917950dcf99ec7036e48aaa4264d2c455cb19e	2020-07-31 20:34:11 -07:00
Nick Gibson	f47e00bdc3	[NNC] Bounds Inference: make inferred bounds respect gaps (#42185 ) Summary: A heavy refactor of bounds inference to fix some issues and bugs blocking using it to analyze cross thread interactions: * We were merging all accesses to a Buf into a single bounds info entry, even if they did not overlap. E.g. if we accessed a[0:2] and a[5:6] we would merge that into a bound of a[0:6]. I've changed this behaviour to merge only overlapping bounds. * We were not separating bounds of different kinds (e.g. Load vs Store) and would merge a Store bounds into a Load bounds, losing the information about what kind of access it was. E.g. this loop would produce bounds: [{Load, 0, 10}] and now produces bounds [{Load, 0, 9}, {Store, 1, 10}]: ``` for i in 1 to 10... x[i] = x[i-1] ``` * Both ComputeAt and Rfactor relied on the overzealous merging and only used a single entry in the bounds list to determine the bounds of temporary buffers they created, which could result in temporary buffers allocated smaller than accesses to them. I've fixed Rfactor, but not ComputeAt - however all ComputeAt tests still pass (may require loop fusion to trigger this issue) - I will come back to it. Being more precise about bounds is more complex, rather than taking the minimum of starts and maximum of stops we now need to determine if two bounds overlap or are adjacent. There are many edge cases and so I've added a bunch of test coverage of the merging method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42185 Reviewed By: mruberry Differential Revision: D22870391 Pulled By: nickgg fbshipit-source-id: 3ee34fcbf0740a47259defeb44cba783b54d0baa	2020-07-31 20:22:04 -07:00
Mikhail Zolotukhin	dcc4d11ffa	[TensorExpr] Make tensorOrConstant non-templatized function. (#42202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42202 Currently we used the template in order to be able to take both `std::vector<ExprHandle>` and `std::vector<VarHandle>`. However, semantics of this function tells that the only allowed option should be the former one: we're specifying indices for the tensor access we want to generate. While it could be convenient to avoid conversion from vector of vars to a vector of exprs at the callsites, it makes the code less explicit and thus more difficult to reason about. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D22806429 Pulled By: ZolotukhinM fbshipit-source-id: 8403af5fe6947c27213050a033e79a09f7075d4c	2020-07-31 20:05:24 -07:00
Mikhail Zolotukhin	2decccea2e	[TensorExpr] Implement shape inference for TE. (#41451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41451 Since TE operates on a limited subset of ops with a well-defined semantics, we can easily infer shapes of intermediate and output tensors given shapes of the inputs. There is a couple of ops that are not yet supported in the shape inference, once we add them we could relax the shape info requirements in the TE fuser: currently it requires all values in the fusion group to have shapes known and we can change it to only inputs. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D22543470 Pulled By: ZolotukhinM fbshipit-source-id: 256bae921028cb6ec3af91977f12bb870c385f40	2020-07-31 20:05:21 -07:00
Mikhail Zolotukhin	f41bb1f92b	[TensorExpr] Explicitly cast to bool results of comparison ops in kernel.cpp. (#42201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42201 Previously, we've been using operators <, >, ==, et al. and relied on the dtype to be picked automatically. It led to a wrong dtype being picked for the result, but that choice was overwritten by the type explicitly specified in JIT IR, which we were lowering. Now we are moving towards using shape inference instead of relying on all types being specified in the IR, and that made this issue to immediately pop up. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D22806428 Pulled By: ZolotukhinM fbshipit-source-id: 89d2726340efa2bb3da45d1603bedc53955e14b9	2020-07-31 20:05:19 -07:00
Mikhail Zolotukhin	f8c5800bb5	[TensorExpr] Add debug dumps to kernel.cpp. (#42196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42196 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D22803676 Pulled By: ZolotukhinM fbshipit-source-id: 109372ca45d86478826190b868d005d2fb2c9ba7	2020-07-31 20:02:21 -07:00
Yanan Cao	655f376460	Implement Enum sugared value and Enum constant support (#42085 ) Summary: [3/N] Implement Enum JIT support * Add enum value as constant support * Add sugared value for EnumClass Supported: Enum-typed function arguments using Enum type and comparing them Support getting name/value attrs of enums Using Enum value as constant TODO: Add PyThon sugared value for Enum Support Enum-typed return values Support serialization and deserialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/42085 Reviewed By: eellison Differential Revision: D22758042 Pulled By: gmagogsfm fbshipit-source-id: 5c6e571686c0b60d7fbad59503f5f94b3b3cd125	2020-07-31 17:29:55 -07:00
Venkata Chintapalli	ff91b169c7	Changes to match Fused Op: Dequantize->Swish->Quantize (#42255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42255 Changes to match Fused Op: Dequantize->Swish->Quantize * Changes to scale handling Results showing matching intermediate and final Swish_Int8 Op. P137389801 Test Plan: test case test_deq_swish_quant_nnpi.py Reviewed By: hyuen Differential Revision: D22827499 fbshipit-source-id: b469470ca66f6405ccc89696694af372ce6ce89e	2020-07-31 16:54:39 -07:00
Sebastian Messmer	1542c41a67	Change C++ frontend to take optional<Tensor> arguments (#41947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41947 Previously, if an op took an optional `Tensor?` argument, the C++ frontend (i.e. `at::op()` and `Tensor::op()`) were generated to take `Tensor`. A previous PR (https://github.com/pytorch/pytorch/pull/41610) changed the kernels to be written with `c10::optional<Tensor>` instead of `Tensor`, but that did not touch the C++ frontend yet. This PR changes the C++ frontend API to take `c10::optional<Tensor>` instead of `Tensor` as well. This should be mostly bc conserving. Since `Tensor` implicitly converts to `c10::optional<Tensor>`, any old code calling an op with a `Tensor` would still work. There are likely corner cases that get broken though. For example, C++ only ever does one implicit conversion. So if you call an op with a non-tensor object that gets implicitly converted to a `Tensor`, then that previously worked since the API took a `Tensor` and C++ allows one implicit conversion. Now it wouldn't work anymore because it would require two implicit conversions (to `Tensor` and then to `c10::optional<Tensor>`) and C++ doesn't do that. The main reasons for doing this are - Make the C++ API more sane. Those arguments are optional and that should be visible from the signature. - Allow easier integration for XLA and Autocast. Those backends generate code to wrap operators and forward operator arguments to calls to at::op(). After https://github.com/pytorch/pytorch/pull/41610, there was a mismatch because they had to implement operators with `optional<Tensor>` but call `at::op()` with `Tensor`, so they had to manually convert between those. After this PR, they can just forward the `optional<Tensor>` in their call to `at::op()`. ghstack-source-id: 108873705 Test Plan: unit tests Reviewed By: bhosmer Differential Revision: D22704832 fbshipit-source-id: f4c00d457b178fbc124be9e884a538a3653aae1f	2020-07-31 16:11:55 -07:00
Sebastian Messmer	3a19af2427	Make operators with optional Tensor? arguments c10-full (#41610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41610 Previously, operators that have a `Tensor?` (i.e. optional tensor) in their schema implemented it using `Tensor` in C++ and filled in an undefined tensor for the None case. The c10 operator library, however, expects `Tensor?` to be represented as `optional<Tensor>`, so those operators couldn't be c10-full yet and still had to use codegenerated unboxing instead of templated unboxing. This PR changes that. It extends the `hacky_wrapper_for_legacy_signatures` to not only take case of TensorOptions, but now also map between signatures taking `Tensor` and `optional<Tensor>`. For this, it requires an additional template parameter, the expected signature, and it uses that to go argument-by-argument and unwrap any optionals it finds. ghstack-source-id: 108873701 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D22607879 fbshipit-source-id: 57b2fb01a294b804f82cd55cd70f0ef4a478e14f	2020-07-31 16:09:08 -07:00
Elias Ellison	f502290e91	[JIT] Make create autodiff subgraphs do in place updates to aliasDb (#42141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42141 Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. Description from https://github.com/pytorch/pytorch/pull/37106 holds pretty well: """ Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with `x, y = f(a, b, c)` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the x and y Value*s in the process. This operation is easy to express as an update to the aliasDb--x_out just takes on all the aliasing information x used to have. In particular, since we know f and prim::fusionGroup are purely functional, we don't have to mess with any write information. """ The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. There are a few options: - attempt to make all subgraph utils & ir cloning logic update a map - mirror the subgraph utils implementation in create_autodiff_subgraph - uniquely map x, y and x_in, y_in so you can back out the correspondence. I went with the third option. This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D22798377 Pulled By: eellison fbshipit-source-id: 9a133bcaa3b051c0fb565afb23a3eed56dbe71f9	2020-07-31 15:13:32 -07:00
Elias Ellison	2285a2fc11	refactor canonical ordering to also be able to do isAfter checks (#42140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42140 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D22798378 Pulled By: eellison fbshipit-source-id: d1a549f43b28fe927729597818a46674c58fe81d	2020-07-31 15:11:40 -07:00
Jing Ma	4fc525e729	[Dper3] Implementation of squeezed input to DC++ Summary: This Diff provides an option for DC++ module to use the squeezed sparse feature embeddings to generate attention weights, with the purpose of reducing the network size to achieve QPS gains. There are 3 squeeze options: sum, max, and mean, along the embedding dimension and are provided for both the attention weights and resnet generation. Example workflow: f208474456 {F257199459} Test Plan: 1. Test single ops buck test dper3/dper3/modules/low_level_modules/tests:single_operators_test -- test_reduce_back_mean buck test dper3/dper3/modules/low_level_modules/tests:single_operators_test -- test_reduce_back_max 2. Test DC++ module buck test dper3/dper3/modules/tests:core_modules_test -- test_dc_pp_arch_one_layer_compressed_embeddings_only_squeeze_input buck test dper3/dper3/modules/tests:core_modules_test -- test_dc_pp_arch_shared_input_squeeze_input buck test dper3/dper3/modules/tests:core_modules_test -- test_dc_pp_input_compress_embeddings_squeeze_input 3. Test Arch buck test dper3/dper3_models/ads_ranking/model_impl/sparse_nn/tests:sparse_nn_lib_test -- test_dense_sparse_interaction_compress_dot_arch_dot_compress_pp_squeezed_input 4. e2e test buck test dper3/dper3_models/ads_ranking/tests:model_paradigm_e2e_tests -- test_sparse_nn_compress_dot_attention_fm_max_fc_size_squeeze_input Reviewed By: taiqing Differential Revision: D22825069 fbshipit-source-id: 29269ea22cb47d487a1c92a1f6daae1055f54cfc	2020-07-31 14:31:43 -07:00
Jiakai Liu	a01e91e6b2	[pytorch] include all overloads for OSS custom build Summary: For mobile custom build, we only generate code for ops that are used by specific models to reduce binary size. There multiple places where we apply the op filtering: - generated_unboxing_wrappers_.cpp - autograd/VariableType.cpp - c10 op registration (in aten/gen.py) For c10 op registration, we filter by the main op name - all overloads that match the main op name part will be kept. For generated_unboxing_wrappers_, we filter by the full op name - only those having exactly the same overload name will be kept. This PR changes generated_unboxing_wrappers_ and autograd/VariableType.cpp codegen to also filter by the main op name. The reasons are: - keeping all overloads can have better backward compatibility; - generated_unboxing_wrappers_ are relatively small as it only contains thin wrappers for root ops. - generated_unboxing_wrappers_* will be replaced by c10 op registration soon anyway. - autograd/VariableType*.cpp are not included in OSS build. Why it offers better backward compatibility? #40737 is an example: It introduced a new `_convolution` overload and renamed the original one to `_convolution.deprecated`. Before this PR, the model prepared by the old version PyTorch won't be able to run on the custom mobile build generated on the PR because `_convolution.deprecated` won't be kept in the custom build due to full op name matching policy. By relaxing it to partial matching policy, the mobile custom build CI on the PR can pass. Will test the size impact for FB production build before landing. Differential Revision: D22809564 Test Plan: Imported from OSS Reviewed By: iseeyuan Pulled By: ljk53 fbshipit-source-id: e2fc017da31f38b9430cc2113f33e6d21a0eaf0b	2020-07-31 12:43:31 -07:00
Supriya Rao	38bf5be24f	[quant] Use PlaceholderObserver instead of Fp16Observer and NoopObserver (#42348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42348 Use the dtype info in placeholderObserver to decide what ops to insert in the graph In the next PR we can delete NoopObserver Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: jerryzh168 Differential Revision: D22859457 fbshipit-source-id: a5c618f22315534ebd9a2df77b14a0aece196989	2020-07-31 12:33:56 -07:00
Supriya Rao	6bd46b583e	[quant][graph] Add support for FP16 dynamic quant (#42222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42222 This change adds the necessary passes to perform FP16 dynamic quantization. We skip inserting observers for activations based on the dtype (torch.float16) and only insert the Fp16Observer for weights Test Plan: python test/test_quantization.py TestQuantizeJitOps Imported from OSS Reviewed By: jerryzh168 Differential Revision: D22849220 fbshipit-source-id: 2c53594ecd2485e9e3dd0b380eceaf7c5ab5fc50	2020-07-31 12:33:53 -07:00
Supriya Rao	8c5bf10264	[quant] Add FP16Observer for fp16 quant support (#42221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42221 Adds a new observer that emits a warning if the range of tensor is beyond fp16 range. This will be further used in graph mode quantization to insert the cast to fp16 ops in the graph Test Plan: python test/test_quantizaton.py TestObserver.test_fp16_observer Imported from OSS Reviewed By: jerryzh168 Differential Revision: D22849222 fbshipit-source-id: a301281ce38ba4d4e7a009308400d34a08c113d2	2020-07-31 12:33:51 -07:00
Supriya Rao	a9eebaf693	[quant] Add saturate_to_fp16 op for FP16 quant support (#42147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42147 Op to check the range of a tensor and clamp the values to fp16 range This operator will be inserted into the graph in subsequent diffs. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_fp16_saturate_op Imported from OSS Reviewed By: jerryzh168 Differential Revision: D22849221 fbshipit-source-id: 0da3298e179750f6311e3a09596a7b8070509096	2020-07-31 12:31:07 -07:00
Yan Xie	bdd9ef1981	Support RowWiseSparseAdam on GPU (#35404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35404 Implement RowWiseSparseAdam on CUDA Reviewed By: xw285cornell Differential Revision: D20650225 fbshipit-source-id: 5f871e2f259e362b713c9281b4d94534453995cf	2020-07-31 10:47:29 -07:00
Wanchao Liang	a9e7e787f8	[jit] make clone works for interface type (#42121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42121 This PR changes the Module API to allow register a module with module interface type, and therefore allows Module::clone works on the case where there's a module interface type being shared by two submodules. interface type will be shared by the new cloned instance in the same compilation unit bc it only contains a list of functionSchema, which does not involve any attributes compared to classType. fixes https://github.com/pytorch/pytorch/issues/41882 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D22781205 Pulled By: wanchaol fbshipit-source-id: f97f4b75970f0b434e38b5a1f778eda2c4e5109b	2020-07-31 10:24:27 -07:00
Edward Yang	352e15f1a2	Revert D22812445: Update TensorPipe submodule Test Plan: revert-hammer Differential Revision: D22812445 (`2335430086`) Original commit changeset: e6d824bb28f5 fbshipit-source-id: 606632a9aaf2513b5ac949e4d6687aa7563eae5d	2020-07-31 10:16:48 -07:00
Sharvil Nanavati	832b1659e7	Fix missing attribute when loading model from older version (#42242 ) (#42290 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42242 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42290 Reviewed By: VitalyFedyunin Differential Revision: D22844096 Pulled By: albanD fbshipit-source-id: 707e552e0ed581fbe00f1527ab7426880edaed64	2020-07-31 09:03:07 -07:00
Brandon Lin	4c6878c97d	[gloo] change ProcessGroupGlooAsyncTest to use gtest (#42313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42313 Changes the tests in `ProcessGroupGlooAsyncTest.cpp` to use the Gtest testing framework. Reviewed By: malfet Differential Revision: D22821577 fbshipit-source-id: 326b24a334ae84a16434d0d5ef27d16ba4b90d5d	2020-07-31 08:54:50 -07:00
Ailing Zhang	0adb584376	Make resize_ use normal device dispatch (#42240 ) Summary: `resize_` only requires manual registration to `Autograd` key and its device kernels can safely live together with our normal device dispatch in `native_functions.yaml`. But currently we do manual registration for `CPU/CUDA` kernels (and leaves no dispatch in native_functions.yaml) which makes `resize_` non-overrideable from backend point of view. While it indeed should dispatch at device level, this caused xla to whitelist `resize_` and register a lowering to XLA key. This PR moves the device dispatch of `resize_` back to `native_functions.yaml` so that it shows up as `abstract` method properly for downstream extensions. Note that we also do manual registration for `copy_/detach_/resize_as_/etc` in aten but they are slightly different than `resize_` since for them we only register `catchAll` kernels instead of device kernels. I'll need to investigate and send a followup PR for those ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42240 Reviewed By: VitalyFedyunin Differential Revision: D22846311 Pulled By: ailzhang fbshipit-source-id: 10b6cf99c4ed3d62fc4e1571f4a2a463d1b88c81	2020-07-31 02:15:27 -07:00
Mike Ruberry	2f840b1662	Warns when TensorIterator would resize its output (#42079 ) Summary: See https://github.com/pytorch/pytorch/issues/41027. This adds a helper to resize output to ATen/native/Resize.* and updates TensorIterator to use it. The helper throws a warning if a tensor with one or more elements needs to be resized. This warning indicates that these resizes will become an error in a future PyTorch release. There are many functions in PyTorch that will resize their outputs and don't use TensorIterator. For example, `985fd970aa/aten/src/ATen/native/cuda/NaiveConvolutionTranspose2d.cu (L243)` And these functions will need to be updated to use this helper, too. This PR avoids their inclusion since the work is separable, and this should let us focus on the function and its behavior in review. A TODO appears in the code to reflect this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42079 Reviewed By: VitalyFedyunin Differential Revision: D22846851 Pulled By: mruberry fbshipit-source-id: d1a413efb97e30853923bce828513ba76e5a495d	2020-07-30 22:39:16 -07:00
Mike Ruberry	e54f268a7a	Enables torch.full bool and integer type inference (#41912 ) Summary: After being deprecated in 1.5 and throwing a runtime error in 1.6, we can now enable torch.full inferring its dtype when given bool and integer fill values. This PR enables that inference and updates the tests and docs to reflect this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41912 Reviewed By: albanD Differential Revision: D22836802 Pulled By: mruberry fbshipit-source-id: 33dfbe4d4067800c418b314b1f60fab8adcab4e7	2020-07-30 22:39:13 -07:00
kshitij12345	31d41f987a	torch.where : Scalar Support (#40336 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/38349 #9190 TODO * [x] Add Tests * [x] Update Docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/40336 Reviewed By: albanD Differential Revision: D22813834 Pulled By: mruberry fbshipit-source-id: 67c1693c059a301b249213afee3c25cea9f64fec	2020-07-30 22:36:53 -07:00
ziab	1c8217a7a6	Abstract cuda calls made from `torch_python` (#42251 ) Summary: * Make c10::cuda functions regular non-inlined functions * Add driver_version() and device_synchronize() functions With this change I don't see anymore direct calls to CUDA API when look at Modules.cpp.obj FYI malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/42251 Reviewed By: malfet Differential Revision: D22826505 Pulled By: ziab fbshipit-source-id: 8dc2f3e209d3710e2ce78411982a10e8c727573c	2020-07-30 19:18:33 -07:00
DeepakVelmurugan	fbb052c2cc	BlackList to BlockList (#42279 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41701 blackList convention to blockList convention Pull Request resolved: https://github.com/pytorch/pytorch/pull/42279 Reviewed By: VitalyFedyunin Differential Revision: D22843178 Pulled By: malfet fbshipit-source-id: c9be5a5f084dfd0e46545d4a3d1124ef59277604	2020-07-30 18:06:49 -07:00
Yujun Zhao	27c22b9b3c	Modify function to takes dtype as argument Summary: To avoid repeating to() casts for every argument of the function Test Plan: CI Reviewed By: malfet Differential Revision: D22833521 fbshipit-source-id: ae0a8f70339cd6adfeea2f552d35bbcd48b11cf7	2020-07-30 16:27:55 -07:00
Yujun Zhao	b5fcd89479	Add tests to `sigmoid_backward` and `fmod` (#42289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42289 `sigmoid_backward` and `fmod` are not covered by neither in `test/cpp/api` nor in `Aten/test`. Add test functions to cover them Test Plan: 1. Test locally and check new lines are covred 2. CI Reviewed By: malfet Differential Revision: D22804912 fbshipit-source-id: ea50ef0ef3dcf3940ac950d74f6f1cb38d8547a7	2020-07-30 16:26:13 -07:00
Hong Xu	7d6c4f62ef	Remove 4 unused variables in lp_pool_op.cc (#42329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42329 Reviewed By: VitalyFedyunin Differential Revision: D22850894 Pulled By: mrshenli fbshipit-source-id: 1e91380a432525b83c0bb0bfef0d5067c767cb67	2020-07-30 15:50:17 -07:00
Vasiliy Kuznetsov	153673c33b	fix quantized elu benchmark (#42318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42318 We forgot to update this benchmark when quantized elu's signature changed to require observation, fixing. Test Plan: ``` cd benchmarks/operator_benchmark python -m pt.qactivation_test ``` Imported from OSS Reviewed By: supriyar Differential Revision: D22845251 fbshipit-source-id: 1443f6f0deac695715b1f2bd47f0f22b96dc72ca	2020-07-30 14:57:12 -07:00
Elias Ellison	5ff54ff4ff	import freeze (#42319 ) Summary: torch.jit.freeze was broken with https://github.com/pytorch/pytorch/pull/41154/files#diff-9084cd464651f7fa1ff030d2edd9eb55R1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42319 Reviewed By: ZolotukhinM Differential Revision: D22845476 Pulled By: eellison fbshipit-source-id: bc9e50678d0e0ffca4062854ccc71bbef2e1a97b	2020-07-30 13:00:11 -07:00
Hong Xu	344defc973	Let bfloat16 support promotion with other types (#41698 ) Summary: Fix https://github.com/pytorch/pytorch/issues/40580 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41698 Reviewed By: albanD Differential Revision: D22824042 Pulled By: mruberry fbshipit-source-id: 7dad9c12dc51d8f88c3ca963ae9c5f8aa2f72277	2020-07-30 12:28:09 -07:00
Nikita Shulga	c489bbe122	Add typing support to torch._six (#42232 ) Summary: Also add __prepare__ method metaclass created by `with_metaclass` to conform with PEP 3115 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42232 Reviewed By: ezyang Differential Revision: D22816936 Pulled By: malfet fbshipit-source-id: a47d054b2f061985846d0db6b407f4e5df97b0d4	2020-07-30 12:12:46 -07:00
kiyosora	26d58503c2	Implementing NumPy-like function torch.signbit() (#41589 ) Summary: - Related with https://github.com/pytorch/pytorch/issues/38349 - Implementing the NumPy-like function `torch.signbit()` . Pull Request resolved: https://github.com/pytorch/pytorch/pull/41589 Reviewed By: albanD Differential Revision: D22835249 Pulled By: mruberry fbshipit-source-id: 7988f7fa8f591ce4b6a23ac884ee7b3aa718bcfd	2020-07-30 11:21:15 -07:00
Jiakai Liu	c35faae10d	[pytorch][ci] install nightly instead of stable libtorch for mobile CIs (#42220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42220 Mobile custom build CI jobs need desktop version libtorch to prepare models and dump root ops. Ideally we should use the libtorch built on the PR so that backward incompatible changes won't break this script - but it will significantly slow down mobile CI jobs. This PR changed it to install nightly instead of stable so that we have an option to temporarily skip mobile CI jobs on BC-breaking PRs until they are in nightly. Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D22810484 Pulled By: ljk53 fbshipit-source-id: eb5f7b762a969d1cfeeac2648816be546bd291b6	2020-07-30 11:07:14 -07:00
Ashkan Aliabadi	ce546328a3	Const-correctness, variable initialization, and error checking. (#42124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42124 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D22835543 Pulled By: AshkanAliabadi fbshipit-source-id: 29b7619b7bc6dd346eec91b8a2b6cc6a76769bcf	2020-07-30 11:04:24 -07:00
Ashkan Aliabadi	d0ed1e303f	Add missing header guards. (#42272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42272 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D22835546 Pulled By: AshkanAliabadi fbshipit-source-id: c880199acaf0ad11c3db4ac9f9f2d000038f98f1	2020-07-30 11:04:21 -07:00
Ashkan Aliabadi	ee2150370e	Add Vulkan Test to ATen Mobile Tests. (#42123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42123 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D22835544 Pulled By: AshkanAliabadi fbshipit-source-id: 08bce5d94ed8c966d25707f69e51b16d5b45febd	2020-07-30 11:04:19 -07:00
Ashkan Aliabadi	7cd92aaa6b	Disable validation layers in non-debug builds. (#42122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42122 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D22835545 Pulled By: AshkanAliabadi fbshipit-source-id: b0eee550c8d727c79b5d45a7e1d603379ae3af5c	2020-07-30 11:01:51 -07:00
Edward Yang	8e3d1908b6	Fix minor typo in comment (#42184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42184 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D22809375 Pulled By: ezyang fbshipit-source-id: 322a4c2059b612a10c6257013bbf2fd207e75df7	2020-07-30 09:48:22 -07:00
Facebook Community Bot	86b2faeb53	Automated submodule update: FBGEMM (#42302 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `e04b9ce034` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42302 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: efiks Differential Revision: D22841424 fbshipit-source-id: 211463b0207da986fc5b451242ae99edf32b9f68	2020-07-30 08:56:34 -07:00
Hong Xu	f15af2fe4f	Remove unused variable "schema" (#42245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42245 Reviewed By: albanD Differential Revision: D22835223 Pulled By: mrshenli fbshipit-source-id: 94f0cbddb36feefc8a136ef38b0a74d22b305680	2020-07-30 08:40:36 -07:00
Joseph Spisak	547bbdac86	Add MSFT Owners to the Windows Maintainership (#42280 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/42280 Reviewed By: albanD Differential Revision: D22836782 Pulled By: soumith fbshipit-source-id: a38f91e381abc0acf3ab41e05ff70611926091ac	2020-07-30 08:22:13 -07:00
generatedunixname89002005287564	269ec767ca	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D22838806 fbshipit-source-id: 29039585c82bb214db860d582cc4e269ab990c85	2020-07-30 04:01:20 -07:00
Luca Wehrstedt	2335430086	Update TensorPipe submodule (#42225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42225 Main changes: - Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch. - Changed the way the preprocessor flags are provided, and changed their name. There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header. I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one. Test Plan: CircleCI is all green. Reviewed By: beauby Differential Revision: D22812445 fbshipit-source-id: e6d824bb28f5afe75fd765de0430968174f3531f	2020-07-30 02:32:52 -07:00
Hao Lu	4f163df41a	[caffe2] Special handling of If/AsyncIf op in RemoveOpsByType (#42286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42286 One more bug to fix. Operators such as If and AsyncIf need special treatment not just in `onnx::SsaRewrite`, but also in `RemoveOpsByType`. The solution needs two steps: 1) add external inputs/outputs of the subnets of If/AsyncIf op to the inputs/outputs of the op 2) if the inputs/outputs of the If/AsyncIf op need to be renamed as a result, the same inputs/outputs of the subnets need to be renamed as well. I also added unit tests to cover this corner case. Test Plan: ``` buck test //caffe2/caffe2/fb/predictor:black_box_predictor_test mkdir /tmp/models rm -rf /tmp/$USER/snntest rm -rf /tmp/snntest buck run mode/opt admarket/lib/ranking/prediction_replayer/snntest_replayer_test/tools:snntest_replay_test -- --serving_paradigm=USER_AD_PRECOMPUTATION_DSNN ``` Differential Revision: D22834028 fbshipit-source-id: c070707316cac694f452a96e5c80255abf4014bc	2020-07-30 02:02:20 -07:00
Jianyu Huang	f30ac66e79	[caffe2] Fix a performance bug in Dedup SparseAdagrad op (#42287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42287 We shouldn't use block_size for thread dimensions in linear_index_weight_offsets_dedup_kernel, since the kernel doesn't iterate the embedding dimensions. ghstack-source-id: 108834058 Test Plan: ``` buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient $caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps$' --print-passing-details ``` Reviewed By: jspark1105 Differential Revision: D22800959 fbshipit-source-id: 641d52a51070715c04f9fd286e7e22ac62001f61	2020-07-30 01:00:59 -07:00
Yujun Zhao	0444bac940	Add test to cross function Summary: function `cross_kernel_scalar` is not covered in `Aten/native/cpu/CrossKernel.cpp`, add tests to cover it Test Plan: 1. Test locally to check new lines are covered 2. CI https://pxl.cl/1fZjG Reviewed By: malfet Differential Revision: D22834122 fbshipit-source-id: 0d50f3a3e6aee52cb6fdee2b9f5883f542c7b6e2	2020-07-29 22:48:52 -07:00
Yujun Zhao	9ea7476d9c	Add test to lerp function (#42266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42266 function `lerp_kernel_scalar` and `lerp_kernel_tensor` are not covered in `Aten/native/cpu/LerpKernel.cpp`, add tests to cover them Test Plan: 1. Test locally to check new lines are covered 2. CI https://pxl.cl/1fXPd Reviewed By: malfet Differential Revision: D22832164 fbshipit-source-id: b1eaabbf8bfa08b4dedc1a468abfdfb619a50e3c	2020-07-29 22:47:37 -07:00
Nikita Shulga	7459da268e	Add typing annotations to torch.random (#42234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42234 Reviewed By: ezyang Differential Revision: D22816933 Pulled By: malfet fbshipit-source-id: 9e2124ad16fed339abd507f6e474cb63feb7eada	2020-07-29 22:16:08 -07:00
Pritam Damania	872237c1f2	Output to stderr in distributed tests. (#42139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42139 A bunch of tests were failing with buck since we would output to stdout and buck would fail parsing stdout in some cases. Moving these print statements to stderr fixes this issue. ghstack-source-id: 108606579 Test Plan: Run the offending unit tests. Reviewed By: mrshenli Differential Revision: D22779135 fbshipit-source-id: 789af3b16a03b68a6cb12377ed852e5b5091bbad	2020-07-29 19:23:34 -07:00
Xiao Wang	fe4f19e164	[CUDA] max_pool2d NCHW performance improvement (#42182 ) Summary: Fix the regression introduced in https://github.com/pytorch/pytorch/issues/38953. Please see https://github.com/xwang233/code-snippet/blob/master/max-pool2d-nchw-perf/max-pool2d.ipynb for detailed before & after performance comparisons. Performance improvement for backward max_pool2d before and after this PR (negative value means speed up) ![image](https://user-images.githubusercontent.com/24860335/88712204-363c8e00-d0ce-11ea-8586-057e09b16103.png) Seems like the forward modulo doesn't benefit much from a similar change, so I did not change forward. `1718f0ccfd` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42182 Reviewed By: albanD Differential Revision: D22829498 Pulled By: ngimel fbshipit-source-id: 4c81968fe072f4e264e70c70ade4c32d760a3af4	2020-07-29 19:01:31 -07:00
Basil Hosmer	c18223f9ef	add Dimname support to IValue (#42054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42054 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D22750398 Pulled By: bhosmer fbshipit-source-id: 7028268093f86b33c4117868b0edcb9e1ca6f7ee	2020-07-29 16:30:26 -07:00
Priyanshu	6c251f74b2	replace black_list/blacklist with blocklist/block_list (#42089 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41734 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42089 Reviewed By: pbelevich Differential Revision: D22794556 Pulled By: SplitInfinity fbshipit-source-id: 4404845b6293b076b3c8cc02b135b20c91397a79	2020-07-29 16:26:02 -07:00
Xing Wang	27b03d62de	[HT] Clear the device placement tag for the auto gen sum so that we could break the component for FC sharing the same input (#42219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42219 Introduce a new extra info that is tagged on the forward net for the operators sharing the same input. The effect is that the auto gen sum of gradient for the input will not follow the tag of the operator tags in the forward net. This allow more flexible device allocation. Test Plan: # unit test `./buck-out/gen/caffe2/caffe2/python/core_gradients_test#binary.par -r testMultiUseInputAutoGenSumDevice` Reviewed By: xianjiec, boryiingsu Differential Revision: D22609080 fbshipit-source-id: d558145e5eb36295580a70e1ee3a822504dd439a	2020-07-29 15:21:27 -07:00
Michael Carilli	7cdf786a07	fix typo in GradScaler docstring (#42236 ) Summary: Closes https://github.com/pytorch/pytorch/issues/42226. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42236 Reviewed By: albanD Differential Revision: D22817980 Pulled By: ngimel fbshipit-source-id: 4326fe028dba1dbeed454edc4e4d4fffa56f51d6	2020-07-29 13:14:57 -07:00
Yanli Zhao	79cfd85987	grad detach_ only when it has grad_fn in zero_grad call (#41283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41283 in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function ghstack-source-id: 108702289 Test Plan: unit test Reviewed By: mrshenli Differential Revision: D22487315 fbshipit-source-id: 861909b15c8497f1da57f092d8963d4920c85e38	2020-07-29 11:40:13 -07:00
Mike Ruberry	4b6e5f42a4	Creates spectral ops test suite (#42157 ) Summary: In preparation for creating the new torch.fft namespace and NumPy-like fft functions, as well as supporting our goal of refactoring and reducing the size of test_torch.py, this PR creates a test suite for our spectral ops. The existing spectral op tests from test_torch.py and test_cuda.py are moved to test_spectral_ops.py and updated to run under the device generic test framework. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42157 Reviewed By: albanD Differential Revision: D22811096 Pulled By: mruberry fbshipit-source-id: e5c50f0016ea6bb8b093cd6df2dbcef6db9bb6b6	2020-07-29 11:36:18 -07:00
Basil Hosmer	029007c8b6	Improved coverage for unboxed->boxed kernel wrappers (#38999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38999 Adds boxing for inplace and outplace kernels, itemizes remaining unsupported cases, and fails compilation when new unsupported types are introduced in op signatures. Test Plan: Imported from OSS Differential Revision: D21718547 Pulled By: bhosmer fbshipit-source-id: 03295128b21d1843e86789fb474f38411b26a8b6	2020-07-29 11:31:16 -07:00
Xiaomeng Yang	60f51542dc	[Caffe2] Fix spatial_bn bug for computing running_var on CPU or on CUDA without CuDNN (#42151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42151 Previously our Caffe2 SpatialBN op impl was incorrect for computing running_var without unbias coefficent. Actually it should fail the test because the output will be different with CuDNN's output. However, our tests are too weak to find this bug. This diff fix all of them. Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:spatial_bn_op_test Reviewed By: houseroad Differential Revision: D22786127 fbshipit-source-id: db80becb67d60c44faae180c7e4257cb136a266d	2020-07-29 11:20:03 -07:00
Bert Maher	91546a4b0f	Environment variable for controlling type verbosity in debug output (#41906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41906 Fixes #41770 Test Plan: Example: ``` import torch def bar(): def test(a): return a x = torch.ones(10,10, device='cpu') print(torch.jit.trace(test, (x)).graph) bar() ``` Bash: ``` for i in 0 1 2 3; do PYTORCH_JIT_TYPE_VERBOSITY=$i python test.py done ``` Output: ``` graph(%0): return (%0) graph(%0 : Float(10, 10)): return (%0) graph(%0 : Float(10:10, 10:1)): return (%0) graph(%0 : Float(10:10, 10:1, requires_grad=0, device=cpu)): return (%0) ``` Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D22687966 fbshipit-source-id: cd395257d79a4baa35245c778a74a55d1ea2a842	2020-07-29 11:17:24 -07:00
Paul Shao	01b794f169	Operator-level Benchmark Test for Per Tensor and Per Channel Fake Quantization (#41974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41974 In this diff, 2 new sets of benchmark tests are added to the `quantization` benchmark suite where operator-level benchmarking is conducted for the learnable Python operators, the learnable c++ kernels, and the original non-backprop c++ kernels. Test Plan: Inside the path `torch/benchmarks/operator_benchmark` (The root directory will be `caffe2` inside `fbcode` if working on a devvm): - On a devvm, run the command `buck run pt:fake_quantize_learnable_test` - On a personal laptop, run the command `python3 -m pt.fake_quantize_learnable_test` Benchmark Results (On devGPU with 0% volatile utilization -- all GPUs are free): Each sample has dimensions 3x256x256; ### In microseconds (`1e-6` second), \| \| Python Module \| C++ Kernel \| Non-backprop C++ Kernel \| \|---------------------------\|---------------\|------------\|-------------------------\| \| Per Tensor CPU Forward \| 3112.666 \| 3270.740 \| 3596.864 \| \| Per Tensor Cuda Forward \| 797.258 \| 258.961 \| 133.953 \| \| Per Channel CPU Forward \| 6587.693 \| 6931.461 \| 6352.417 \| \| Per Channel Cuda Forward \| 1579.576 \| 555.723 \| 479.016 \| \| Per Tensor CPU Backward \| 72278.390 \| 22466.648 \| 12922.195 \| \| Per Tensor Cuda Backward \| 6512.280 \| 1546.218 \| 652.942 \| \| Per Channel CPU Backward \| 74138.545 \| 41212.777 \| 14131.576 \| \| Per Channel Cuda Backward \| 6795.173 \| 4321.351 \| 1052.066 \| Reviewed By: z-a-f Differential Revision: D22715683 fbshipit-source-id: 8be528b790663413cbeeabd4f68bbca00be052dd	2020-07-29 11:12:17 -07:00
Yujun Zhao	48acdfd505	add tests to BinaryOpsKernel -- max/min kernel (#42198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42198 1. add tests to max/min kernel Test Plan: 1. Run locally to check cover the corresponding code part in BinaryOpsKernel.cpp. 2. CI Reviewed By: malfet Differential Revision: D22796019 fbshipit-source-id: 84c8d7df509de453c4ec3c5e38977733b0ef3457	2020-07-29 10:35:40 -07:00
Paul Shao	382781221d	Extending Learnable Fake Quantize module to support gradient scaling and factory (partial) construction (#41969 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41969 In this diff, the `_LearnableFakeQuantize` module is extended to provide support for gradient scaling where the gradients for both scale and zero point are multiplied by a constant `g` (in some cases, can help with quicker convergence). In addition, it is also augmented to provide a factory method via `_with_args` such that a partial constructor of the module can be built. Test Plan: For correctness of the fake quantizer operators, on a devvm, enter the following command: ``` buck test //caffe2/torch:quantization -- learnable_py_module ``` Reviewed By: z-a-f Differential Revision: D22715629 fbshipit-source-id: ff8e5764f81ca7264bf9333789f57e0b0cec7a72	2020-07-29 10:22:26 -07:00
Elias Ellison	0a64f99162	[JIT] Dont include view ops in autodiff graphs (#42027 ) Summary: View ops as outputs of differentiable subgraphs can cause incorrect differentiation. For now, do not include them in the subgraph. This was observed with our autograd tests for MultiheadAttention and nn.Transformer, which currently fail with the legacy executor. This commit fixes those test failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42027 Reviewed By: pbelevich Differential Revision: D22798133 Pulled By: eellison fbshipit-source-id: 2f6c08953317bbe013933c6faaad20100376c039	2020-07-29 10:17:33 -07:00
Zhijian Liu	b45b82b006	Fix type annotation for DistributedDataParallel (#42231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42231 Reviewed By: albanD Differential Revision: D22816589 Pulled By: mrshenli fbshipit-source-id: a355f7e2fa895617bf81ef681b051f074d39ab8c	2020-07-29 10:12:20 -07:00
Facebook Community Bot	c8e15842aa	Automated submodule update: FBGEMM (#42205 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `cad1c21404` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42205 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: dskhudia Differential Revision: D22806731 Pulled By: efiks fbshipit-source-id: 779a9f7f00645e7e65f183e2832dc79117eae5fd	2020-07-29 09:26:18 -07:00
Alban Desmaison	460970483d	Revert D22790718: [pytorch][PR] Enables torch.full bool and integer type inference Test Plan: revert-hammer Differential Revision: D22790718 (`6b3f335641`) Original commit changeset: 8d1eb01574b1 fbshipit-source-id: c321177cce129a6c83f1a7b26bd5ed94a343ac0f	2020-07-29 07:52:04 -07:00
Xiong Wei	90074bbfa6	implement numpy-like functionality isposinf, isneginf (#41588 ) Summary: Related https://github.com/pytorch/pytorch/issues/38349 Numpy-like functionalities `isposinf` and `isneginf` are implemented. Test-Plan: - pytest test/test_torch.py -k "test_isposinf_isneginf" Pull Request resolved: https://github.com/pytorch/pytorch/pull/41588 Reviewed By: ngimel Differential Revision: D22770732 Pulled By: mruberry fbshipit-source-id: 7448653e8fb8df6b9cd4604a4739fe18a1135578	2020-07-29 03:29:31 -07:00
Jianyu Huang	1c5c289b62	[pt] Add incude_last_offset option to EmbeddingBag mean and max (#42215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42215 Specifically on https://github.com/pytorch/pytorch/pull/27477#discussion_r371402079 We would like to supported with include_last=True overall for other reduction types like mean and max. It now causes further code fragmentation in DPER (https://www.internalfb.com/intern/diff/D22794469/). More details: https://www.internalfb.com/intern/diff/D22794469/?dest_fbid=309597093427021&transaction_id=631457624153457 ghstack-source-id: 108733009 Test Plan: ``` buck test mode/dev-nosan //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu" ``` ``` (base) [jianyuhuang@devbig281.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ TORCH_SHOW_CPP_STACKTRACES=1 buck test mode/dev-nosan //caffe2/test: nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu" --print-passing-details Parsing buck files: finished in 1.2 sec Building: finished in 5.5 sec (100%) 10130/10130 jobs, 2 updated Total time: 6.7 sec More details at https://www.internalfb.com/intern/buck/build/dbdc2063-69d8-45cb-9146-308a9e8505ef First unknown argument: --print-passing-details. Falling back to TestPilot classic. Trace available for this run at /tmp/testpilot.20200728-195414.1422748.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision cd2638f1f47250eac058b8c36561760027d16add fbpkg f88726c8ebde4ba288e1172a348c7f46 at Mon Jul 27 18:11:43 2020 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/887/t.par Discovering tests Running 1 test Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425097242375 ✓ caffe2/test:nn - test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu (test_nn.TestNNDeviceTypeCPU) 0.162 1/1 (passed) Test output: > /data/users/jianyuhuang/fbsource/fbcode/buck-out/dev/gen/caffe2/test/nn#binary,link-tree/torch/_utils_internal.py:103: DeprecationWarning: This is a NOOP in python >= 3.7, its just too dangerous with how we write code at facebook. Instead we patch os.fork and multiprocessing which can raise exceptions if a deadlock would happen. > threadSafeForkRegisterAtFork() > /usr/local/fbcode/platform007/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__ > return f(args, *kwds) > test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu (test_nn.TestNNDeviceTypeCPU) ... Couldn't download test skip set, leaving all tests enabled... > ok > > ---------------------------------------------------------------------- > Ran 1 test in 0.162s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425097242375 Summary (total time 5.54s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 Did _not_ run with tpx. See https://fburl.com/tpx for details. ``` Reviewed By: dzhulgakov Differential Revision: D22801881 fbshipit-source-id: 80a624465727081bb9bf55c28419695a3d79c6e5	2020-07-29 01:20:00 -07:00
Mike Ruberry	6b3f335641	Enables torch.full bool and integer type inference (#41912 ) Summary: After being deprecated in 1.5 and throwing a runtime error in 1.6, we can now enable torch.full inferring its dtype when given bool and integer fill values. This PR enables that inference and updates the tests and docs to reflect this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41912 Reviewed By: pbelevich Differential Revision: D22790718 Pulled By: mruberry fbshipit-source-id: 8d1eb01574b1977f00bc0696974ac38ffdd40d9e	2020-07-28 23:11:08 -07:00
mattip	8c653e05ff	DOC: fail to build if there are warnings (#41335 ) Summary: Merge after gh-41334 and gh-41321 (EDIT: both are merged). Closes gh-38011 This is the last in a series of PRs to build documentation without warnings. It adds `-WT --keepgoing` to the shpinx build which will [fail the build if there are warnings](https://www.sphinx-doc.org/en/master/man/sphinx-build.html#cmdoption-sphinx-build-W), print a [trackeback on error](https://www.sphinx-doc.org/en/master/man/sphinx-build.html#cmdoption-sphinx-build-T) and [finish the build](https://www.sphinx-doc.org/en/master/man/sphinx-build.html#cmdoption-sphinx-build-keep-going) even when there are warnings. It should fail now, but pass once the PRs mentioned at the top are merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41335 Reviewed By: pbelevich Differential Revision: D22794425 Pulled By: mruberry fbshipit-source-id: eb2903e50759d1d4f66346ee2ceebeecfac7b094	2020-07-28 22:33:44 -07:00
Ann Shan	4b108ca763	refactor save_data as non member function (#42045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42045 This PR changes the save_data() member functions of torch::jit::mobile::Module which was introduced in #41403 to be the non member function torch::jit::mobile::_save_parameters() (taking a mobile Module as its first argument). In addition, this PR: * adds a getter function _ivalue() for the mobile::Module object * renames torch::jit::mobile::_load_mobile_data() to torch::jit::mobile_load_parameters() * refactors the import.h header file into import.h and import_data.h Test Plan: Imported from OSS Reviewed By: kwanmacher, iseeyuan Differential Revision: D22766781 Pulled By: ann-ss fbshipit-source-id: 5cabae31927187753a958feede5e9a28d71d9e92	2020-07-28 21:52:32 -07:00
Edward Yang	8fc5adc88e	Remove dead named_tensors_unsupported_error definitions. (#42171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42171 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D22794980 Pulled By: ezyang fbshipit-source-id: 250b6566270e19240361d758db55101d6fcb33e9	2020-07-28 21:40:28 -07:00
Pritam Damania	8deb4fe809	Fix flaky NCCL error handling tests. (#42149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42149 Some of these tests were flaky since we could kill the process in some way without cleaning up the ProcessGroup. This resulted in issues where the FileStore didn't clean up appropriately resulting in other processes in the group to crash. Fixed this by explicitly deleting the process_group before we bring a process down forcibly. ghstack-source-id: 108629057 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D22785042 fbshipit-source-id: c31d0f723badbc23b7258e322f75b57e0a1a42cf	2020-07-28 18:38:26 -07:00
Omkar Salpekar	b6a9f42758	Add appropriate error messages for ProcessGroupNCCLTest (#42143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42143 Replaces the original makeshift error messages in ProcessGroupNCCLTest with more appropriate ones. ghstack-source-id: 108711579 Test Plan: Ran the tests on DevGPU Reviewed By: mrshenli Differential Revision: D22778505 fbshipit-source-id: 27109874f0b474a74b09f588cf6e7528d2069702	2020-07-28 18:31:23 -07:00
Omkar Salpekar	e4c3f526c8	Fixed Skipping Logic in ProcessGroupNCCLErrors tests (#42192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42192 This PR fixes the complicated skipping logic for ProcessGroupNCCLErrors Tests - it correctly logs the reason for skipping tests when GPUs are not available or the NCCL version is too old. This is part of a broader effort to improve the testing of the ProcessGroup and Collectives tests. ghstack-source-id: 108620568 Test Plan: Tested on devGPU and devvm. Tests are run correctly on GPU and skipped on CPU as expected. Reviewed By: mrshenli Differential Revision: D22782856 fbshipit-source-id: 6071dfdd9743f45e59295e5cee09e89c8eb299c9	2020-07-28 16:59:40 -07:00
Ying Zhang	b2ef7fa359	Add a flag to enforce fp32 to fp16 conversion for all inputs of the onnxifi net. (#39931 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39931 ATT. Reviewed By: yinghai, ChunliF Differential Revision: D21993492 fbshipit-source-id: ff386e6e9b95a783906fc1ae6a62462e6559a20b	2020-07-28 16:48:43 -07:00
Chunli Fu	8a644f0c13	[Shape Inference] Fix InferFC Summary: Sometimes first dim of X in FC is BATCH_OF_FEATURE_MAX instead of BATCH. This caused an issue in f207899183 (when first dim of X is 64 but is set to 1 in inferFC). Change the check from `!= BATCH` to `== UNKNOWN` Test Plan: unit test Reviewed By: yinghai Differential Revision: D22784691 fbshipit-source-id: eb66ba361d6fe75672b13edbac2fbd269a7e7a00	2020-07-28 16:43:19 -07:00
Jerry Zhang	30eacb5fb6	[quant][graphmode] Support stack (#42187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42187 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22801229 fbshipit-source-id: 7d1758c4fb1c8f742a275c3a631605f0f0d08e44	2020-07-28 16:35:34 -07:00
Nikita Shulga	deac621ae2	Stop building PyTorch for VS2017 (#42144 ) Summary: And since CUDA-9.2 is incompatible with VS2019, disable CUDA-9.2 for Windows as well Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/42144 Reviewed By: pbelevich Differential Revision: D22794475 Pulled By: malfet fbshipit-source-id: 24fc980e6fc75240664b9de8a4a63b1153f8d8ee	2020-07-28 16:09:21 -07:00
Venkata Chintapalli	3c084fd358	Dequant => Swish => Quant Test case. (#41976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41976 Dequant => Swish => Quant Test case. (Note: this ignores all push blocking failures!) Test Plan: test_deq_swish_quant_nnpi.py. Reviewed By: hyuen Differential Revision: D22718593 fbshipit-source-id: 1cee503a27e339af6d89c819007511b90bb6610c	2020-07-28 16:05:12 -07:00
Nikita Shulga	e2344db886	Use Python3.7 when running OSX builds/tests (#42191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42191 Reviewed By: seemethere Differential Revision: D22801091 Pulled By: malfet fbshipit-source-id: b589343ef1bc6896d3d6d8d863f75aa3a102d985	2020-07-28 16:00:54 -07:00
Will Constable	4c7fb8c2b6	make FusionCallback refer to specified GraphFuser context (#41560 ) Summary: Fixes issue where - top level fuser's block_ was captured by callback due to [&] capture, - recursive/nested fusers would compare erroneously to top-level block_ instead of own block_ Closes (https://github.com/pytorch/pytorch/issues/39810) Pull Request resolved: https://github.com/pytorch/pytorch/pull/41560 Reviewed By: Krovatkin Differential Revision: D22583196 Pulled By: wconstab fbshipit-source-id: 8f543cd9ea00e116cf3e776ab168cdd9fed69632	2020-07-28 15:01:24 -07:00
Jiakai Liu	8ddd2c4e1b	[pytorch] fix code analyzer for LLVM 9 & 10 (#42135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42135 Tested the code analyzer with LLVM 9 & 10 and fixed a couple issues: - Rename local demangle() which is available as public API since LLVM 9; - Fix falsely associated op registrations due to the `phi` instruction; Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D22795508 Pulled By: ljk53 fbshipit-source-id: 2d47af088acd3312a7ea5fd9361cdccd48940fe6	2020-07-28 14:57:07 -07:00
Nikita Shulga	fd9205e14b	Enable caffe2 tests for RocM jobs (#41604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41604 Reviewed By: ezyang Differential Revision: D22603703 Pulled By: malfet fbshipit-source-id: 789ccf2bb79668a5a68006bb877b2d88fb569809	2020-07-28 14:21:42 -07:00
Nitish Awasthi	4d17ecb071	Changed Blacklisted to Blocklisted (#42100 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41703 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42100 Reviewed By: ngimel Differential Revision: D22780380 Pulled By: SplitInfinity fbshipit-source-id: d465c41f1d4951ab6de55cb827c7ef53975209af	2020-07-28 13:21:26 -07:00
Khalid Almufti	030ab2bda5	Replaced whitelist reference with allowlist (#42071 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41741 Replaced whitelist reference with allowlist. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42071 Reviewed By: pbelevich Differential Revision: D22795176 Pulled By: SplitInfinity fbshipit-source-id: bcf1b8afe516b9684ce0298bc257ef81152ba20c	2020-07-28 12:29:33 -07:00
Nitish Awasthi	64965c4572	Replaced blacklist with blocklist (#42097 ) Summary: Closes https://github.com/pytorch/pytorch/issues/41726 Fixes https://github.com/pytorch/pytorch/issues/41726 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42097 Reviewed By: ngimel Differential Revision: D22779535 Pulled By: SplitInfinity fbshipit-source-id: 1d414af22a1b3e856a11d64cff4b4d33160d957b	2020-07-28 12:08:54 -07:00
Rohan Varma	5ed7cd0025	Allow drop_last option in DistributedSampler (#41171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41171 DistributedSampler allows data to be split evenly across workers in DDP, but it has always added additional samples in order for the data to be evenly split in the case that the # of samples is not evenly divisible by the number of workers. This can cause issues such as when doing distributed validation accuracy, where multiple samples could be considered twice. This PR adds a drop_last option where the tail of the data is dropped such that the effective dataset size is still evenly divisible across the workers. This ensures that DDP can train fine (there is no uneven inputs) and each replica gets an equal number of data indices. ghstack-source-id: 108617516 Test Plan: Added unittest Reviewed By: mrshenli Differential Revision: D22449974 fbshipit-source-id: e3156b751f5262cc66437b9191818b78aee8ddea	2020-07-28 11:33:08 -07:00
Nikita Shulga	48ae5945de	Skip TestExtractPredictorNet if compiled without OpenCV (#42168 ) Summary: Found while trying to get RocM Caffe2 CI green Pull Request resolved: https://github.com/pytorch/pytorch/pull/42168 Reviewed By: seemethere Differential Revision: D22791879 Pulled By: malfet fbshipit-source-id: 8f7ef9711bdc5941b2836e4c8943bb95c72ef8af	2020-07-28 11:26:55 -07:00
Ivan Kobzarev	f666be7bc1	[vulkan] support add for dim < 4 (#41222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41222 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22754937 Pulled By: IvanKobzarev fbshipit-source-id: f8c5e55c965c0a805e75c63b21f410fb0c323515	2020-07-28 11:15:37 -07:00
Ivan Kobzarev	b3a9e21a29	[vulkan] mm op through addmm (#41221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41221 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22754938 Pulled By: IvanKobzarev fbshipit-source-id: f9a0f48d7943a85b7dbb3fc9edf9e214ba07543b	2020-07-28 11:13:48 -07:00
X Wang	b0424a895c	Raise RuntimeError for zero stride pooling (#41819 ) Summary: Close https://github.com/pytorch/pytorch/issues/41767 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41819 Reviewed By: mrshenli Differential Revision: D22780634 Pulled By: ngimel fbshipit-source-id: 376ce5229ad5bd60804d839340d2c6505cf3288d	2020-07-28 11:07:12 -07:00
Priyanshu	5aa2b572ff	replace black list with block (#42091 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41729 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42091 Reviewed By: pbelevich Differential Revision: D22792096 Pulled By: ezyang fbshipit-source-id: caafa42d12cbad377b67ddbaba8f84a2b8c98066	2020-07-28 10:23:51 -07:00
Nikita Shulga	2f61aca17b	Skip DataIO tests relying on LevelDB if compiled without it (#42169 ) Summary: Found while trying to get RocM Caffe2 job green Pull Request resolved: https://github.com/pytorch/pytorch/pull/42169 Reviewed By: seemethere Differential Revision: D22791896 Pulled By: malfet fbshipit-source-id: 9df6233876aec5ead056365499bab970aa7e8bdc	2020-07-28 10:18:26 -07:00
Jongsoo Park	73ff252913	Back out "[NCCL] DDP communication hook: getFuture()" (#42152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42152 Original commit changeset: 8c059745261d Test Plan: . Reviewed By: ajtulloch, jianyuh Differential Revision: D22786183 fbshipit-source-id: 51155389d37dc82ccb4d2fa20d350f9d14abeaca	2020-07-28 10:05:35 -07:00
Hong Xu	2de549518e	Make fmod work with zero divisors consistently (#41948 ) Summary: Currently `torch.tensor(1, dtype=torch.int).fmod(0)` crashes (floating point exception). This PR should fix this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41948 Reviewed By: ngimel Differential Revision: D22771081 Pulled By: ezyang fbshipit-source-id: a94dd35d6cd85daa2d51cae8362004e31f97989e	2020-07-28 08:58:39 -07:00
YifanShenSZ	e7ed0b3fae	Avoid zero division in _cubic_interpolate (#42093 ) Summary: I encountered a zero division problem when using LBFGS: File "/home/yshen/anaconda3/lib/python3.7/site-packages/torch/optim/lbfgs.py", line 118, in _strong_wolfe bracket[1], bracket_f[1], bracket_gtd[1]) File "/home/yshen/anaconda3/lib/python3.7/site-packages/torch/optim/lbfgs.py", line 21, in _cubic_interpolate d1 = g1 + g2 - 3 * (f1 - f2) / (x1 - x2) ZeroDivisionError: float division by zero My solution is to determine whether "line-search bracket is so small" before calling _cubic_interpolate Pull Request resolved: https://github.com/pytorch/pytorch/pull/42093 Reviewed By: pbelevich Differential Revision: D22770667 Pulled By: mrshenli fbshipit-source-id: f8fdfcbd3fd530235901d255208fef8005bf898c	2020-07-28 08:32:00 -07:00
chengjinfang	f0c46878c6	Fix the issue GPU skip message(#41378 ) (#41973 ) Summary: Related https://github.com/pytorch/pytorch/issues/41378 Fix the issue GPU skip message Pull Request resolved: https://github.com/pytorch/pytorch/pull/41973 Reviewed By: pbelevich Differential Revision: D22753459 Pulled By: mrshenli fbshipit-source-id: d24b531926e28b860ae90b9ae07e8ca3438d21db	2020-07-28 08:28:31 -07:00
Daiki Katsuragawa	3acd6b7359	Document formatting (#42065 ) Summary: Apply syntax highlighting to the command in `README.md`. This makes `README.md` easier to read. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42065 Reviewed By: pbelevich Differential Revision: D22753418 Pulled By: mrshenli fbshipit-source-id: ebfa90fdf60478c34bc8a7284d163e0254cfbe3b	2020-07-28 08:27:42 -07:00
alkad	14e75fbdb9	Remove py2 specific code from test_utils.py (#42105 ) Summary: As https://github.com/pytorch/pytorch/issues/23795 mentioned drop Python 2 support. albanD Fixes https://github.com/pytorch/pytorch/issues/31796 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42105 Reviewed By: ngimel Differential Revision: D22765768 Pulled By: mrshenli fbshipit-source-id: bae114a21cd5598004c7f92d313938ad826b4a24	2020-07-28 08:25:40 -07:00
Alexander Grund	86492410bc	Don't run tests with custom arguments with pytest (#41397 ) Summary: This patch basically removes the `-m pytest` parameters when `extra_unittest_args` is used (e.g. `--subprocess`) Fixes https://github.com/pytorch/pytorch/issues/41393 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41397 Reviewed By: pbelevich Differential Revision: D22792133 Pulled By: ezyang fbshipit-source-id: 29930d703666f4ecc0d727356bbab4a5f7ed4860	2020-07-28 08:17:36 -07:00
mattip	672ed3c06b	replace onnx producer_version when updating results (#41910 ) Summary: xref gh-39002 which handled the reading but not the writing of the onnx expect files, and the last comment in that PR which points out `XXX` was suboptimal. xref [this comment](https://github.com/pytorch/pytorch/pull/37091#discussion_r456460168) which pointed out the problem. This PR: - replaces `XXX` with `CURRENT_VERSION` in the stored files - ensures that updating the results with the `--accept` flag will maintain the change Pull Request resolved: https://github.com/pytorch/pytorch/pull/41910 Reviewed By: pbelevich Differential Revision: D22758671 Pulled By: ezyang fbshipit-source-id: 47c345c66740edfc8f0fb9ff358047a41e19b554	2020-07-28 08:15:01 -07:00
Khalid Almufti	b282297559	Replace whitelist with allowlist (#42067 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41757 I've replaced all the whitelist with allowlist for this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42067 Reviewed By: pbelevich Differential Revision: D22791690 Pulled By: malfet fbshipit-source-id: 638c13cf49915f5c83bd79c7f4a39b8390cc15b4	2020-07-28 08:01:16 -07:00
Noman Arshad	1a8269a566	Replace blacklist with blocklist in test/run_test.py file. (#42011 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41716 test/run_test.py file updated with an appropriate replacement for blacklist and whitelist. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42011 Reviewed By: pbelevich Differential Revision: D22791836 Pulled By: malfet fbshipit-source-id: 8139649c5b70c876b711e25c33f3051ea8461063	2020-07-28 07:56:01 -07:00
Sandeep Kumar Pani	e179966248	[caffe2][tpx] log to stderr (#42162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42162 Test Plan: CI Reviewed By: mrshenli Differential Revision: D22791440 fbshipit-source-id: 14f16cd7a94a57161c5724177b518527f486232d	2020-07-28 07:50:27 -07:00
Richard Zou	0571cfd875	Implement `MultiBatchVmapTransform::logicalToPhysical(TensorList)` (#41942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41942 This function: - permutes all batch dims to the front of the tensors - aligns all the batch dims to the collective levels of all the tensors - expands all of the batch dims such that they are present in each of the result tensors This function is useful for the next diff up on the stack (which is implementing a fallback kernel for BatchedTensor). It's also useful in general for implementing batching rules on operators that take in multiple batch dimensions at the front of each tensor (but we don't have too many of those in PyTorch). Test Plan: - `./build/bin/vmap_test` Reviewed By: ezyang Differential Revision: D22764104 Pulled By: zou3519 fbshipit-source-id: d42cc8824a1bcf258687de164b7853af52852f53	2020-07-28 07:45:25 -07:00
Richard Zou	1994ab1473	Optimize alignBatchDimsAtFront (#41941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41941 If we know that the tensor already has the desired aligned size, we don't need to put in the effort to align it. Test Plan: - `./build/bin/vmap_test`, `pytest test/test_vmap.py -v` Reviewed By: albanD Differential Revision: D22764101 Pulled By: zou3519 fbshipit-source-id: a2ab7ce7b98d405ae905f7fd98db097210bfad65	2020-07-28 07:45:23 -07:00
Richard Zou	5124436af4	Fix const correctness for VmapPhysicalView struct methods (#41940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41940 See title. I marked methods that don't mutate the VmapPhysicalView as `const`. Test Plan: - wait for tests Reviewed By: albanD Differential Revision: D22764102 Pulled By: zou3519 fbshipit-source-id: 40f957ad61c85f0e5684357562a541a2712b1f38	2020-07-28 07:43:09 -07:00
Nikita Shulga	2bc7dae2fc	Use new sccache for RocM builds (#42134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42134 Reviewed By: seemethere Differential Revision: D22782146 Pulled By: malfet fbshipit-source-id: 85ba69a705600e30ae0eddbf654298b3dc6f96ed	2020-07-28 07:15:56 -07:00
Mike Ruberry	6bd88f581a	Revert D22790238: [caffe2][tpx] Use logger instead of print Test Plan: revert-hammer Differential Revision: D22790238 (`3c6fae6567`) Original commit changeset: c0a801cdf7f0 fbshipit-source-id: cadfbd22f7d3ce656624483c9a19062f7c9a5b61	2020-07-28 06:11:30 -07:00
Sandeep Kumar Pani	3c6fae6567	[caffe2][tpx] Use logger instead of print Test Plan: CI? Differential Revision: D22790238 fbshipit-source-id: c0a801cdf7f0da489c67708a0eb1b498ff104c64	2020-07-28 04:26:51 -07:00
Hao Lu	5336ccc1b2	[BugFix] Fix bug in onnx::SsaRewrite (#42148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42148 Differential Revision: D22687388 fbshipit-source-id: facf7a186dd48d6f919d0ff5d42f756977c3f9f4	2020-07-28 01:44:47 -07:00
Ivan Kobzarev	4f723825b4	[vulkan] adaptive_avg_pool2d (#41220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41220 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22754943 Pulled By: IvanKobzarev fbshipit-source-id: 91a94f32db005ebb693384f4d27efe66e2c33a14	2020-07-27 23:24:14 -07:00
Yinghai Lu	0a0960126c	If we don't collect tracing, always free the trace data (#42118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42118 We toggle trace on with a certain probablility. In the case of 3 inferences with trace on/off/on. We leak the trace from the first inference. Always clean up the trace will fix it. Test Plan: predictor I created a tiny repro here: D22786551 With this fix, this issue is gone. Reviewed By: gcatron Differential Revision: D22768382 fbshipit-source-id: 9ee0bbcb2bc5f76107dae385759fe578909a683d	2020-07-27 21:49:30 -07:00
Yujun Zhao	83762844e5	Make `run_binary_ops_test` function generic and Add tests to add_kernel function (#42101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42101 1. Add test fixture `atest class` to store global variables 2. Make `run_binary_ops_test` function generic: can dispose different dtypes and different numbers of parameters 3. add test to `add_kernel` Test Plan: Run locally to check cover the corresponding code part in `BinaryOpsKernel.cpp`. CI Reviewed By: malfet Differential Revision: D22760015 fbshipit-source-id: 95b47732f661124615c0856efa827445dd714125	2020-07-27 21:03:00 -07:00
Jiyan Yang	c062cdbd90	Log the net if blob doesn't exist when setting output record (#41971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41971 Reviewed By: wx1988 Differential Revision: D22490309 fbshipit-source-id: d967ee211b610f5523a307b5266b9fcb0277a21c	2020-07-27 19:13:50 -07:00
Stephen Chen	f805184165	onnxifi: make it work with AsyncIf Summary: the onnxifi path didn't handle the input/output name rewrite for ssa correctly for AsyncIf op. Add support for it. Also fixed a place where we lose the net type while doing onnxifi transform. Test Plan: Load 163357582_593 which is a multi feed model that uses AsyncIf. This used to fail with c2 not finding some blobs in workspace. Now it works. Reviewed By: dhe95 Differential Revision: D21268230 fbshipit-source-id: ce7ec0e952513d0f251df1bfcfb2b0250f51fd94	2020-07-27 18:27:35 -07:00
Shen Li	c76fada4a8	Let DDP.train() return self to stay consistent with nn.Module (#42131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42131 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D22775311 Pulled By: mrshenli fbshipit-source-id: ac9e6cf8b2381036a2b6064bd029dca361a81777	2020-07-27 18:22:13 -07:00
Xingying Cheng	bcd75bd683	[ModelLints] Refine dropout lint message. (#42046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42046 Refine dropout lint message as we have enabled dropout operator removal in optimize_for_mobile method. ghstack-source-id: 108607182 Test Plan: buck test ai_infra/ai_mobile_infra/tests:mobile_model_util_tests Reviewed By: kimishpatel Differential Revision: D22741132 fbshipit-source-id: 8f87356aae2bd9c89d1cad0d7be7286278bb14ad	2020-07-27 18:15:30 -07:00
Shen Li	d5de616a4a	Enable c10d Store tests in CI (#42128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42128 Reviewed By: pritamdamania87 Differential Revision: D22774445 Pulled By: mrshenli fbshipit-source-id: 6e5e56f42833414ef375b6cd23fdb3260cb07be9	2020-07-27 18:12:37 -07:00
Pavel Izmailov	509c18a096	Documentation for `torch.optim.swa_utils` (#41228 ) Summary: This PR adds a description of `torch.optim.swa_utils` added in https://github.com/pytorch/pytorch/pull/35032 to the docs at `docs/source/optim.rst`. Please let me know what you think! vincentqb andrewgordonwilson Pull Request resolved: https://github.com/pytorch/pytorch/pull/41228 Reviewed By: ngimel Differential Revision: D22609451 Pulled By: vincentqb fbshipit-source-id: 8dd98102c865ae4a074a601b047072de8cc5a5e3	2020-07-27 17:52:16 -07:00
Will Constable	646042e0fb	Add suggestion to enumerate ModuleDict in error message (#41946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41946 Reviewed By: ngimel Differential Revision: D22774243 Pulled By: wconstab fbshipit-source-id: 5cfbe52b5b1c540f824593e67ae6ba4973458bb5	2020-07-27 16:24:00 -07:00
Vitaly Berov	1df35ba61e	Back out "Support aarch32 neon backend for Vec256" Summary: Original commit changeset: 1c22cf67ec35 Test Plan: sandcastle, testing on Portal Reviewed By: currybeef Differential Revision: D22774614 fbshipit-source-id: 8897aec5df32092c4df86c0d54b0d2fe58d66e66	2020-07-27 16:09:05 -07:00
Utkarsh Agnihotri	d198fb3efe	changed white-allowlisted (#41796 ) Summary: closes https://github.com/pytorch/pytorch/issues/41749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41796 Reviewed By: gmagogsfm Differential Revision: D22718991 Pulled By: SplitInfinity fbshipit-source-id: 6c2d2b0e3b1e79fd515f9bdd395335a32f525a26	2020-07-27 16:01:45 -07:00
muhdahmaddahlan	cb9c2049cd	replace blacklist in aten/src/ATen/native/cudnn/Conv.cpp (#41627 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41700. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41627 Reviewed By: gmagogsfm Differential Revision: D22678492 Pulled By: SplitInfinity fbshipit-source-id: 75b82bd10059754d8e6c25fc20e9dde775d54698	2020-07-27 15:56:36 -07:00
Natalia Gimelshein	6ca5421a8f	Enable non-synchronizing cub scan for cum* operations (#42036 ) Summary: This uses cub for cum* operations, because, unlike thrust, cub is non-synchronizing. Cub does not support more than `231` element tensors out of the box (in fact, due to cub bugs the cutoff point is even smaller) so to support that I split the tensor into `230` element chunks, and modify the first value of the second and subsequent chunks to contain the cumsum result of the previous chunks. Since modification is done inplace on the source tensor, if something goes wrong and we error out before the source tensor is reverted back to its original state, source tensor will be corrupted, but in most cases errors will invalidate the full coda context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42036 Reviewed By: ajtulloch Differential Revision: D22749945 Pulled By: ngimel fbshipit-source-id: 9fc9b54d466df9c8885e79c4f4f8af81e3f224ef	2020-07-27 15:44:03 -07:00
Ann Shan	330a107199	Refactor lite serializer dependencies from full jit (#42127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42127 This diff renames core_autograd_sources to core_trainer_sources and moves/adds dependencies for the lite trainer in order to build the serializer functionality internally. ghstack-source-id: 108589416 Test Plan: Manually tested serializer functionality from the internal lite trainer and verified that data is written correctly. Reviewed By: iseeyuan Differential Revision: D22738293 fbshipit-source-id: 992beb0c4368b2395f5bd5563fb2bc12ddde39a1	2020-07-27 15:38:54 -07:00
Eli Uriegas	f7d50f50b9	.circleci: Prefer netrc for docs push (#42136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42136 Expect was giving weird issues so let's just use netrc since it doesn't rely on janky expect behavior Another follow up for: https://github.com/pytorch/pytorch/pull/41964 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: yns88 Differential Revision: D22778940 Pulled By: seemethere fbshipit-source-id: 1bdf879a5cfbf68a7d2d34b6966c20f95bd0a3b5	2020-07-27 15:28:46 -07:00
Gaurav Subedi	ed822de0fc	change 2 instances of blacklist to blocklist in tools/pyi/gen_pyi.py (#41979 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41722 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41979 Reviewed By: ngimel Differential Revision: D22764112 Pulled By: zou3519 fbshipit-source-id: 3f8580c96cf45078a9df3cd9ca6fdb10d58e143f	2020-07-27 14:12:32 -07:00
lixinyu	5246bc4e87	register parameters correctly in c++ MultiheadAttention (#42037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42037 This is to fix #41951 Test Plan: Imported from OSS Reviewed By: yf225 Differential Revision: D22764717 Pulled By: glaringlee fbshipit-source-id: e6da0aeb05a2356f52446e6d5fad391f2cd1cf6f	2020-07-27 13:58:11 -07:00
acxz	e59db43313	Find hip properly (#42064 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41886 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42064 Reviewed By: seemethere Differential Revision: D22757115 Pulled By: malfet fbshipit-source-id: 9c8805e6eb0b7d7defe0ecb08c1e45dcc775a237	2020-07-27 13:47:01 -07:00
Lingyi Liu	d6f1346c37	Add a new op for converting the dense feature to sparse representation Summary: we need this op to avoid the splicing of a dense tensor and then use the Mergesinglescaler op Test Plan: integrated test with dper2 Differential Revision: D22677523 fbshipit-source-id: f4f9a1f06841b0906ec8cbb435482ae0a89e1721	2020-07-27 12:45:37 -07:00
mariosasko	4281240cb5	Raise error for duplicate params in param group #40967 (#41597 ) Summary: This PR fixes an issue in https://github.com/pytorch/pytorch/issues/40967 where duplicate parameters across different parameter groups are not allowed, but duplicates inside the same parameter group are accepted. After this PR, both cases are treated equally and raise `ValueError`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41597 Reviewed By: zou3519 Differential Revision: D22608019 Pulled By: vincentqb fbshipit-source-id: 6df41dac62b80db042cfefa6e53fb021b49f4399	2020-07-27 12:25:52 -07:00
Ivan Kobzarev	6367a9d2b0	[vulkan] Shaders caching (#39384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39384 Introducing `ComputeUnitFactory` which is responsible for providing `ComputeUnit`s (Shaders), it caches it, using shader name (glsl file name)+workGroupSize as a cacheKey, just `std::map<string, std::shared_ptr>` Macro GLSL_SPV changed to have literal name for cache key as a first argument. All constructors of ComputeUnit are changed to use `ComputeUnitFactory` Ownership model: ComputeUnitFactory also owns `vkPipelineCache` that is internal vulkan cache object ( https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPipelineCache.html ) `VContext` (global object) owns ComputeUnitFactory, that owns ComputeUnits, vkPipelineCache, for destruction of them we need valid VkDevice, so it should be destructed before `vkDestryDevice` in `~VContext` => As members of the class will be destructed only after destructor - forcing destruction of ComputeUnitFactory before `vkDestroyDevice`, doing `unique_ptr<ComputeUnitFactory>.reset()` Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D21962430 Pulled By: IvanKobzarev fbshipit-source-id: effe60538308805f317c11448b31dbcf670487e8	2020-07-27 11:57:07 -07:00
Sebastian Messmer	d4735ff490	Avoid refcount bump in IValue::toStringRef() (#42019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42019 According to benchmarks, this makes IValue::toStringRef() 3-4x as fast. ghstack-source-id: 108451154 Test Plan: unit tests Reviewed By: ezyang Differential Revision: D22731354 fbshipit-source-id: 3ca3822ea7310d8593e38b1d3e6014d6d80963db	2020-07-27 11:44:27 -07:00
Paul Shao	5a6d88d503	Updates to Scale and Zero Point Gradient Calculation (#42034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42034 In this diff, scale and zero point gradient calculations are updated to correctly reflect the actual backpropagation equation (instead of `dScale * dX`, the near-final output should be `dScale * dY`; the same applies to zero point). Test Plan: To execute the unit tests for all affected learnable fake quantize modules and kernels, on a devvm, execute the following command: `buck test //caffe2/test:quantization -- learnable` To enable the `cuda` tests, execute the following command: `buck test mode/dev-nosan //caffe2/test:quantization -- learnable` Reviewed By: jerryzh168 Differential Revision: D22735668 fbshipit-source-id: 45c1e0fd38cbb2d8d5e60be4711e1e989e9743b4	2020-07-27 11:18:49 -07:00
Paul Shao	c261a894d1	Updates to Python Module for Calculation of dX and Addition of Unit Tests (#42033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42033 In this diff, the Python `_LearnableFakeQuantize` module is updated where the gradient with respect to the input `x` is actually computed instead of passed through. Argument naming is also updated for better clarity; and unit tests on the `PerTensor` and `PerChannel` operators are added for asserting correctness. Test Plan: On a devvm, execute the command: `buck test //caffe2/test:quantization -- learnable_py_module` To include `cuda` tests as well, run: `buck test mode/dev-nosan //caffe2/test:quantization -- learnable_py_module` Reviewed By: jerryzh168 Differential Revision: D22735580 fbshipit-source-id: 66bea7e9f8cb6422936e653500f917aa597c86de	2020-07-27 11:18:47 -07:00
Paul Shao	e62bf89273	Renaming variables from dX to dY in Learnable Fake Quantize kernels for Better Clarity (#42032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42032 In this diff, the arguments `dX` within the C++ kernels are named as `dY` for clarity and avoid confusion since it doesn't represent the gradient with respect to the input. Test Plan: To test all related fake quantize kernel operators, on a devvm, run the command: `buck test //caffe2/test:quantization -- learnable` Reviewed By: z-a-f, jerryzh168 Differential Revision: D22735429 fbshipit-source-id: 9d6d967f08b98a720eca39a4d2280ca8109dcdd6	2020-07-27 11:17:26 -07:00
Alvaro	3e121d9688	Amend docstring and add test for Flatten module (#42084 ) Summary: I've noticed when PR https://github.com/pytorch/pytorch/issues/22245 introduced `nn.Flatten`, the docstring had a bug where it wouldn't render properly on the web, and this PR addresses that. Additionally, it adds a unit test for this module. Actual ![image](https://user-images.githubusercontent.com/13088001/88483672-cf896a00-cf3f-11ea-8b1b-a30d152e1368.png) Expected ![image](https://user-images.githubusercontent.com/13088001/88483642-86391a80-cf3f-11ea-8333-0964a027a172.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/42084 Reviewed By: mrshenli Differential Revision: D22756662 Pulled By: ngimel fbshipit-source-id: 60c58c18c9a68854533196ed6b9e9fb0d4f83520	2020-07-27 11:04:28 -07:00
Venkata Chintapalli	4290d0be60	Remove settings for the logit test case. (#42114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42114 Remove settings for the logit test case. (Note: this ignores all push blocking failures!) Test Plan: test_op_nnpi_fp16.py test case. Reviewed By: hyuen Differential Revision: D22766728 fbshipit-source-id: 2fe8404b103c613524cf1beddf1a0eb9068caf8a	2020-07-27 10:59:23 -07:00
alexandrosstergiou	11e5174926	Added support for Huber Loss (#37599 ) Summary: Current losses in PyTorch only include a (partial) implementation of Huber loss through `smooth l1` based on Fast RCNN - which essentially uses a delta value of 1. Changing/Renaming the [`_smooth_l1_loss()`](`3e1859959a/torch/nn/functional.py (L2487)`) and refactoring to include delta, enables to use the actual function. Supplementary to this, I have also made a functional and criterion versions for anyone that wants to set the delta explicitly - based on the functional `smooth_l1_loss()` and the criterion `Smooth_L1_Loss()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37599 Differential Revision: D21559311 Pulled By: vincentqb fbshipit-source-id: 34b2a5a237462e119920d6f55ba5ab9b8e086a8c	2020-07-27 10:42:30 -07:00
Nikita Shulga	fbdaa555a2	Enable ProcessGroupGlooTest in CI (take 2) (#42086 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42073 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42086 Reviewed By: ngimel Differential Revision: D22765777 Pulled By: malfet fbshipit-source-id: ebbcd44f448a1e7f9a3d18fa9967461129dd1dcd	2020-07-27 10:21:59 -07:00
rutujak24	96aaa311c0	Grammar Changes (#42076 ) Summary: Small grammatical updates. ![Screenshot (188)](https://user-images.githubusercontent.com/56619747/88471271-02723480-cf25-11ea-8fd1-ae98d5ebcc86.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/42076 Reviewed By: mrshenli Differential Revision: D22756651 Pulled By: ngimel fbshipit-source-id: e810eb7397a5831d801348c8fff072854658830e	2020-07-26 13:53:41 -07:00
mattip	b7bda236d1	DOC: split quantization.rst into smaller pieces (#41321 ) Summary: xref gh-38010 and gh-38011. After this PR, there should be only two warnings: ``` pytorch/docs/source/index.rst:65: WARNING: toctree contains reference to nonexisting \ document 'torchvision/index' WARNING: autodoc: failed to import class 'tensorboard.writer.SummaryWriter' from module \ 'torch.utils'; the following exception was raised: No module named 'tensorboard' ``` If tensorboard and torchvision are prerequisites to building docs, they should be added to the `requirements.txt`. As for breaking up quantization into smaller pieces: I split out the list of supported operations and the list of modules to separate documents. I think this makes the page flow better, makes it much "lighter" in terms of page cost, and also removes some warnings since the same class names appear in multiple sub-modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41321 Reviewed By: ngimel Differential Revision: D22753099 Pulled By: mruberry fbshipit-source-id: d504787fcf1104a0b6e3d1c12747ec53450841da	2020-07-25 23:59:40 -07:00
mattip	6af659629a	DOC: fix two build warnings (#41334 ) Summary: xref gh-38011. Fixes two warnings when building documentation by - using the external link to torchvision - install tensorboard before building documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/41334 Reviewed By: ngimel Differential Revision: D22753083 Pulled By: mruberry fbshipit-source-id: 876377e9bd09750437fbfab0378664b85701f827	2020-07-25 23:38:33 -07:00
Shen Li	47e6d4b3c8	Revert D22741514: [pytorch][PR] Enable ProcessGroupGlooTest in CI Test Plan: revert-hammer Differential Revision: D22741514 (`45e6f2d600`) Original commit changeset: 738d2e27f523 fbshipit-source-id: 0381105ed0ab676b0abd1927f602a35b1b264a6a	2020-07-25 18:19:17 -07:00
Natalia Gimelshein	b00c05c86c	update cub submodule (#42042 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/42042 Reviewed By: mruberry Differential Revision: D22752345 Pulled By: ngimel fbshipit-source-id: 363735bfe3d49bab12fedef43b68c9dc9e372815	2020-07-25 17:52:45 -07:00
Haixin Liu	c5b4f60fc2	Move qconfig removal into convert() (#41930 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41930 As title ghstack-source-id: 108517079 Test Plan: CI Reviewed By: jerryzh168 Differential Revision: D22698386 fbshipit-source-id: 4f748c9bae4a0b615aa69c7cc8d8e451e5d26863	2020-07-25 13:27:13 -07:00
Mike Ruberry	12cd083fd7	Updates torch.tensor, torch.as_tensor, and sparse ctors to use the device of inputs tensors they're given, by default (#41984 ) Summary: BC-Breaking Note This PR changes the behavior of the torch.tensor, torch.as_tensor, and sparse constructors. When given a tensor as input and a device is not explicitly specified, these constructors now always infer their device from the tensor. Historically, if the optional dtype kwarg was provided then these constructors would not infer their device from tensor inputs. Additionally, for the sparse ctor a runtime error is now thrown if the indices and values tensors are on different devices and the device kwarg is not specified. PR Summary This PR's functional change is a single line: ``` auto device = device_opt.has_value() ? device_opt : (type_inference ? var.device() : at::Device(computeDeviceType(dispatch_key))); ``` => ``` auto device = device_opt.has_value() ? device_opt : var.device(); ``` in `internal_new_from_data`. This line entangled whether the function was performing type inference with whether it inferred its device from an input tensor, and in practice meant that ``` t = torch.tensor((1, 2, 3), device='cuda') torch.tensor(t, dtype=torch.float64) ``` would return a tensor on the CPU, not the default CUDA device, while ``` t = torch.tensor((1, 2, 3), device='cuda') torch.tensor(t) ``` would return a tensor on the device of `t`! This behavior is niche and odd, but came up while aocsa was fixing https://github.com/pytorch/pytorch/issues/40648. An additional side affect of this change is that the indices and values tensors given to a sparse constructor must be on the same device, or the sparse ctor must specify the dtype kwarg. The tests in test_sparse.py have been updated to reflect this behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41984 Reviewed By: ngimel Differential Revision: D22721426 Pulled By: mruberry fbshipit-source-id: 909645124837fcdf3d339d7db539367209eccd48	2020-07-25 02:49:45 -07:00
Rohan Varma	366c014a77	[Resubmit #41318 ] NCCL backend support for torch bool (#41959 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/41318 pushed to ci-all branch. Original description: Closes https://github.com/pytorch/pytorch/issues/24137. This PR adds support for the torch.bool tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since bool is not supported as a native ncclDataType_t, we add the following logic: Map at::kBool to ncclUint8 During reduction (allreduce for example), if the operation is SUM, we instead override to to a MAX, to avoid overflow issues. The rest of the operations work with no changes. In the boolean case, changing sum to max makes no correctness difference since they both function as a bitwise OR. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see https://github.com/pytorch/pytorch/issues/41362 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41959 Reviewed By: mrshenli Differential Revision: D22719665 Pulled By: rohan-varma fbshipit-source-id: 8bc4194a8d1268589640242277124f277d2ec9f1	2020-07-24 23:44:29 -07:00
Nikita Shulga	38580422bb	Allow specifying PYTHON executable to build_android (#41927 ) Summary: build_android.sh should check PYTHON environment variable before trying to use default python executable. Even in that case, try to pick python3 over python2 when available. Closes https://github.com/pytorch/pytorch/issues/41795 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41927 Reviewed By: seemethere Differential Revision: D22696850 Pulled By: malfet fbshipit-source-id: be236c2baf54a1cd111e55ee7743cdc93cb6b9d7	2020-07-24 18:34:42 -07:00
Yanan Cao	8e03c38a4f	Add prim::EnumName and prim::EnumValue ops (#41965 ) Summary: [2/N] Implement Enum JIT support Add prim::EnumName and prim::EnumValue and their lowerings to support getting `name` and `value` attribute of Python enums. Supported: Enum-typed function targuments using Enum type and comparing them Support getting name/value attrs of enums TODO: Add PyThon sugared value for Enum Support Enum-typed return values Support enum values of different types in same Enum class Support serialization and deserialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/41965 Reviewed By: eellison Differential Revision: D22714446 Pulled By: gmagogsfm fbshipit-source-id: db8c4e26b657e7782dbfc2b58a141add1263f76e	2020-07-24 18:33:18 -07:00
Omkar Salpekar	6287f9ed65	Remove AllGatherTestWithTimeout (#41945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41945 This test previously did a thread sleep before launching the allgather operation, and then waited on the work object. Since the sleep was done before the work object was created, it did not affect the allgather call, and thus, did not test work-level timeouts as intended. I am removing this test for now. In the future we can add this test back, but would need to somehow inject a `cudaSleep` call before the allgather (so the collective operation itself is delayed). This may require overriding the `ProcessGroupNCCL::collective`, so it's a bit more heavy-weight. In the meantime, we can remove this test - work-level timeouts are still thoroughly tested with Gloo. ghstack-source-id: 108370178 Test Plan: Ran ProcessGroupNCCL tests on devGPU Reviewed By: jiayisuse Differential Revision: D22702291 fbshipit-source-id: a36ac3d83abfab6351c0476046a2f3b04a80c44d	2020-07-24 18:17:48 -07:00
Nikita Shulga	45e6f2d600	Enable ProcessGroupGlooTest in CI (#41985 ) Summary: Partially addresses https://github.com/pytorch/pytorch/issues/41143 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41985 Reviewed By: rohan-varma Differential Revision: D22741514 Pulled By: malfet fbshipit-source-id: 738d2e27f52334e402b65b724b8ba3b0b41372ee	2020-07-24 17:44:00 -07:00
Nikita Shulga	cf7e7909d5	NCCL must depend on librt (#41978 ) Summary: Since NCCL makes calls to shm_open/shm_close it must depend on librt on Linux This should fix `DSO missing from command line` error on some platforms Pull Request resolved: https://github.com/pytorch/pytorch/pull/41978 Reviewed By: colesbury Differential Revision: D22721430 Pulled By: malfet fbshipit-source-id: d2ae08ce9da3979daaae599e677d5e4519b080f0	2020-07-24 16:47:19 -07:00
Kimish Patel	dede71d6e3	Support aarch32 neon backend for Vec256 (#41267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41267 Due to llvm bug and some unsupported intrinsics we could not directly use intrinsics for implementing aarch32 neon back end for Vec256. Instead we resort to inline assembly. Test Plan: vec256_test run on android phone. Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D22482196 fbshipit-source-id: 1c22cf67ec352942c465552031e9329550b27b3e	2020-07-24 15:49:26 -07:00
Tristan Rice	976e614915	caffe2: add PIPELINE tag (#41482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41482 This adds a new tag for use with pipeline parallelism. Test Plan: CI Reviewed By: heslami Differential Revision: D22551487 fbshipit-source-id: 90910f458a9bce68f7ef684773322a49aa24494a	2020-07-24 15:25:14 -07:00
Bradley Davis	0c0864c6be	update tests to run back-compat check using new binary (#41949 ) Summary: instead exporting schemas using the current binary being tested, install nightly and export its schemas to use in a back-compat test run by the current binary being tested. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41949 Reviewed By: houseroad Differential Revision: D22731054 Pulled By: bradleyhd fbshipit-source-id: 68a7e7637b9be2604c0ffcde2a40dd208057ba72	2020-07-24 15:20:05 -07:00
DeepakVelmurugan	42a0b51f71	Easier english updated tech docs (#42016 ) Summary: Just added a easier way to understand the tech docs ![Screenshot from 2020-07-24 21-48-07](https://user-images.githubusercontent.com/55920093/88412562-6991cb00-cdf7-11ea-9612-5f69146ea233.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/42016 Reviewed By: colesbury Differential Revision: D22735752 Pulled By: mrshenli fbshipit-source-id: 8e3dfb721f51ee0869b0df66bf856d9949553453	2020-07-24 14:36:17 -07:00
Utkarsh Agnihotri	becc1b26dd	updated white list/allow list (#41789 ) Summary: closes https://github.com/pytorch/pytorch/issues/41758 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41789 Reviewed By: izdeby Differential Revision: D22648038 Pulled By: SplitInfinity fbshipit-source-id: 5abc895789d8803ca542dfc0c62069350c6977c4	2020-07-24 14:26:16 -07:00
Eli Uriegas	7e84913233	.circleci: Make sure to install expect for docs push (#41964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41964 Since we're not executing this in a docker container we should go ahead an install expect explicitly This is a follow up PR to #41871 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D22736738 Pulled By: seemethere fbshipit-source-id: a56e19c1ee13c2f6e2750c2483202c1eea3b558a	2020-07-24 14:19:23 -07:00
Shen Li	d4736ef95f	Add done() API to Future (#42013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42013 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D22729596 Pulled By: mrshenli fbshipit-source-id: ed31021a35af6e2c3393b9b14e4572cf51013bc0	2020-07-24 14:13:41 -07:00
Yanan Cao	890b52e09f	Reduce instability in runCleanUpPasses by reordering passes. (#41891 ) Summary: Currently constant pooling runs before const propagation, which can create more constants that need pooling. This can get in the way of serialization/deserialization stability because each time user serializes and deserializes a module, runCleanUpPasses is called upon it. Doing so multiple times would lead to different saved module. This PR moves constant pooling after const propagation, which may slow down const propagation a little bit, but would otherwise side-step aforementioned problem. test_constant_insertion in test_jit.py is also updated because after fixing the pass ordering, the number of constants is no longer a constant and it is extremely difficult to get the exact number with the current convoluted test structure. So for now, I changed the test to check only that CSE doesn't change number of "prim::constant" rather than comparing against a known number. Also left a TODO to improve this test. ConstantPropagation pass is replaced by ConstantPropagationImmutableTypes because the latter is used in runCleanUpPasses. If not replaced, the former would create new CSE opportunities by folding more constants. This voids the purpose of the test case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41891 Reviewed By: colesbury Differential Revision: D22701540 Pulled By: gmagogsfm fbshipit-source-id: 8e60dbdcc54a93dac111d81b8d88fb39387224f5	2020-07-24 11:39:20 -07:00
Sinan Nasir	d904ea5972	[NCCL] DDP communication hook: getFuture() (#41596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41596 We've modified the previous design of `convert_dist_work_to_future` API in the GH Issue [#39272](https://github.com/pytorch/pytorch/issues/39272). 1. Whenever we create a `WorkNCCL` object, create a `Future` associated with `WorkNCCL` and store it with the object. 2. Add an API `c10::intrusive_ptr<c10::ivalue::Future> getFuture()` to `c10d::ProcessGroup::Work`. 3. This API will only be supported by NCCL in the first version, the default implementation will throw UnsupportedOperation. 4. To mark the future associated with WorkNCCL completed, implement a `cudaStreamCallback` function. `cudaStreamAddCallback` is marked as deprecated. An alternative is `cudaLaunchHostFunc`, but it is supported for CUDA > 10 and may not be deprecated until there's a reasonable alternative available according to [this discussion](https://stackoverflow.com/questions/56448390/how-to-recover-from-cuda-errors-when-using-cudalaunchhostfunc-instead-of-cudastr). ghstack-source-id: 108409748 Test Plan: Run old python test/distributed/test_c10d.py. Some additional tests: `test_ddp_comm_hook_allreduce_hook_nccl`: This unit test verifies whether a DDP communication hook that just calls allreduce gives the same result result with the case of no hook registered. Without the then callback, the future_value in reducer is no longer a PyObject, and this unit test verifies future_value is properly checked. `test_ddp_comm_hook_allreduce_then_mult_ten_hook_nccl`: This unit test verifies whether a DDP communication hook that calls allreduce and then multiplies the result by ten gives the expected result. As of v10: ``` ........................s.....s.....................................................s............................... ---------------------------------------------------------------------- Ran 116 tests OK (skipped=3) ``` `flow-cli` performance validation using a stacked diff where `bucket.work` is completely replaced with `bucket.future_work` in `reducer`. See PR [#41840](https://github.com/pytorch/pytorch/pull/41840) [D22660198](https://www.internalfb.com/intern/diff/D22660198/). Reviewed By: izdeby Differential Revision: D22583690 fbshipit-source-id: 8c059745261d68d543eaf21a5700e64826e8d94a	2020-07-24 11:22:44 -07:00
Jeff Daily	2e95b29988	restore at::Half support for caffe2 SumOp (#41952 ) Summary: PR https://github.com/pytorch/pytorch/issues/40379 added long support but removed at::Half support. Restore at::Half support. CC ezyang xw285cornell neha26shah Pull Request resolved: https://github.com/pytorch/pytorch/pull/41952 Reviewed By: colesbury Differential Revision: D22720656 Pulled By: xw285cornell fbshipit-source-id: be83ca7fe51fc43d81bc0685a3b658353d42f8ea	2020-07-24 10:49:06 -07:00
Edmund Williams Jr	e9e6cc8c83	Added Prehook option to prepare method (#41863 ) Summary: Added a logic so that if a prehook is passed into the prepare method during quantization, then the hook will be added as a prehook to all leaf nodes (and modules specified in the non_leaf_module_list). Pull Request resolved: https://github.com/pytorch/pytorch/pull/41863 Test Plan: Small demo, made simple module then called prepare with prehook parameter set to the numeric suite logger, printed the results to verify its what we wanted {F245156246} Reviewed By: jerryzh168 Differential Revision: D22671288 Pulled By: edmundw314 fbshipit-source-id: ce65a00830ff03360a82c0a075b3b6d8cbc4362e	2020-07-24 10:26:39 -07:00
yl-to	1b55e2b043	add prefetch_factor for multiprocessing prefetching process (#41130 ) Summary: fix https://github.com/pytorch/pytorch/issues/40604 Add parameter to Dataloader to configure the per-worker prefetch number. Before this edit, the prefetch process always prefetch 2 * num_workers data items, this commit help us make this configurable, e.x. you can specify to prefetch 10 * num_workers data items. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41130 Reviewed By: izdeby Differential Revision: D22705288 Pulled By: albanD fbshipit-source-id: 2c483fce409735fef1351eb5aa0b033f8e596561	2020-07-24 08:38:13 -07:00
Pruthvi Madugundu	79cdd84c81	Downloading different sccache binary in case of ROCm build (#41958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41958 Reviewed By: colesbury Differential Revision: D22717509 Pulled By: malfet fbshipit-source-id: 96c94512f12193fa549ec84cd51f17978f221bc6	2020-07-24 08:04:25 -07:00
Nikita Shulga	c0bfa45f9d	Enable typechecking for `torch.futures` (#41675 ) Summary: Add typing declarations for torch._C.Future and torch._C._collect_all Pull Request resolved: https://github.com/pytorch/pytorch/pull/41675 Reviewed By: izdeby Differential Revision: D22627539 Pulled By: malfet fbshipit-source-id: 29b87685d65dd24ee2094bae8a84a0fe3787e7f8	2020-07-23 23:06:45 -07:00
Natalia Gimelshein	750d9dea49	move min/max tests to TestTorchDeviceType (#41908 ) Summary: so that testing _min_max on the different devices is easier, and min/max operations have better CUDA test coverage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41908 Reviewed By: mruberry Differential Revision: D22697032 Pulled By: ngimel fbshipit-source-id: a796638fdbed8cda90a23f7ff4ee167f45530914	2020-07-23 22:49:30 -07:00
superkirill	6a8c9f601f	Removed whitelist references from test/backward_compatibility/check_b… (#41691 ) Summary: Removed whitelist reference Fixes https://github.com/pytorch/pytorch/issues/41733. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41691 Reviewed By: houseroad Differential Revision: D22641467 Pulled By: SplitInfinity fbshipit-source-id: 72899b7410d4fc8454d87ca0c042f1ede7cf73de	2020-07-23 21:36:14 -07:00
Meghan Lele	e42eab4b1c	Update PULL_REQUEST_TEMPLATE.md (#41812 ) Summary: Summary This commit updates the repository's pull request template to remind contributors to tag the issue that their pull request addresses. Fixes This commit fixes https://github.com/pytorch/pytorch/issues/35319. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41812 Reviewed By: gmagogsfm Differential Revision: D22667902 Pulled By: SplitInfinity fbshipit-source-id: cda5ff7cbbbfeb89c589fd0dfd378bf73a59d77b	2020-07-23 21:30:43 -07:00
Yi Huang (PyTorch)	2da69081d7	Fix one error message format of torch.dot() (#41963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41963 the error message of dot(CUDA) was copied from dot(CPU), however, they both are easy to cause confusion Test Plan: wait for unittests Reviewed By: ngimel Differential Revision: D22710822 fbshipit-source-id: 565b51149ff4bee567ef0775e3f8828579565f8a	2020-07-23 20:47:11 -07:00
Nikita Shulga	f00a37dd71	Make setup.py Python-2 syntactically correct (#41960 ) Summary: Import __future__ to make `print(args)` a syntactically correct statement under Python-2 Otherwise, if once accidentally invokes setup.py using Python-2 interpreter they will be greeted by: ``` File "setup.py", line 229 print(args) ^ SyntaxError: invalid syntax ``` instead of: ``` Python 2 has reached end-of-life and is no longer supported by PyTorch. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/41960 Reviewed By: orionr, seemethere Differential Revision: D22710174 Pulled By: malfet fbshipit-source-id: ffde3ddd585707ba1d39e57e0c6bc9c4c53f8004	2020-07-23 19:10:20 -07:00
Ashkan Aliabadi	97ab33d47c	Fix memory leak in XNNPACK/MaxPool2D. (#41874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41874 Test Plan: Imported from OSS Reviewed By: ann-ss Differential Revision: D22699598 Pulled By: AshkanAliabadi fbshipit-source-id: fec59ed3d5d23bd9197349057fcf2ce56a2b278b	2020-07-23 18:59:53 -07:00
Supriya Rao	36fb14b68b	[quant] Add Graph Mode Passes to quantize EmbeddingBag operators (#41612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41612 This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights. To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions. Refer to the testplan for how to invoke the qconfig for the embedding_bag ops. Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it. NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device. Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Imported from OSS Reviewed By: vkuzo, jerryzh168 Differential Revision: D22609342 fbshipit-source-id: 23e33f44a451c26719e6e283e87fbf09b584c0e6	2020-07-23 18:54:59 -07:00
Priyanshu	401ac2dd39	Replaced whitelisted with allowed (#41867 ) Summary: Closes https://github.com/pytorch/pytorch/issues/41746 Closes https://github.com/pytorch/pytorch/issues/41745 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41867 Reviewed By: izdeby Differential Revision: D22703533 Pulled By: mrshenli fbshipit-source-id: 915895463a92e18f36db93b8884d9fd432c0997d	2020-07-23 16:53:51 -07:00
kenjihiraoka	a1cfcd4d22	Change whitelist to another context in binary_smoketest.py (#41822 ) Summary: Fix https://github.com/pytorch/pytorch/issues/41740 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41822 Reviewed By: izdeby Differential Revision: D22703682 Pulled By: mrshenli fbshipit-source-id: 1df82fd43890142dfd261eb7bf49dbd128295e03	2020-07-23 16:14:54 -07:00
Oren Amsalem	b6690eb29a	Might be good for newcomers to read what N means (#41851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41851 Reviewed By: izdeby Differential Revision: D22703602 Pulled By: mrshenli fbshipit-source-id: 44905f43cdf53b38e383347e5002a28c9363a446	2020-07-23 16:10:38 -07:00
Zhijian Liu	7646f3c77f	Fix type annotation for CosineAnnealingLR (#41866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41866 Reviewed By: izdeby Differential Revision: D22703576 Pulled By: mrshenli fbshipit-source-id: 10a0f593ffaaae82a2923a42815c36793a9043d5	2020-07-23 15:56:50 -07:00
cyy	c5fdcd85c7	check pruned attributes before deleting (#41913 ) Summary: I copyed a pruned model after deleteing the derived tensors. In order to be able to reparameter the model, we should check the existence of the tensors here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41913 Reviewed By: izdeby Differential Revision: D22703248 Pulled By: mrshenli fbshipit-source-id: f5274d2c634a4c9a038100d8a6e837f132eabd34	2020-07-23 15:56:48 -07:00
Hong Xu	183b43f323	Clarify Python 3.5 is the minimum supported version in the installation section. (#41937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41937 Reviewed By: izdeby Differential Revision: D22702924 Pulled By: mrshenli fbshipit-source-id: 67306435e80f80236b585f1d5406444daec782d6	2020-07-23 15:54:56 -07:00
Alexander Grund	a4b831a86a	Replace if(NOT ${var}) by if(NOT var) (#41924 ) Summary: As explained in https://github.com/pytorch/pytorch/issues/41922 using `if(NOT ${var})" is usually wrong and can lead to issues like https://github.com/pytorch/pytorch/issues/41922 where the condition is wrongly evaluated to FALSE instead of TRUE. Instead the unevaluated variable name should be used in all cases, see the CMake docu for details. This fixes the `NOT ${var}` cases by using a simple regexp replacement. It seems `pybind11_PREFER_third_party` is the only variable really prone to causing an issue as all others are set. However due to CMake evaluating unquoted strings in `if` conditions as variable names I recommend to never use unquoted `${var}` in an if condition. A similar regexp based replacement could be done on the whole codebase but as that does a lot of changes I didn't include this now. Also `if(${var})` will likely lead to a parser error if `var` is unset instead of a wrong result Fixes https://github.com/pytorch/pytorch/issues/41922 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41924 Reviewed By: seemethere Differential Revision: D22700229 Pulled By: mrshenli fbshipit-source-id: e2b3466039e4312887543c2e988270547a91c439	2020-07-23 15:49:20 -07:00
Shen Li	dbe6bfbd7e	Revert D22496604: NCCL Backend support for torch.bool Test Plan: revert-hammer Differential Revision: D22496604 (`3626473105`) Original commit changeset: a1a15381ec41 fbshipit-source-id: 693c2f9fd1df568508cbcf8c734c092cec3b0a72	2020-07-23 15:33:58 -07:00
Elias Ellison	b898bdd4d3	[JIT] Don't re run CSE on every block (#41479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41479 Previously we were re-running CSE every time we recursed into a new block, which in turn created a new Alias Db for the whole graph. This was O(# Nodes * # Blocks). For graphs which don't have any autodiff opportunities, such as Densenet, create_autodiff_subgraphs is now linear in number of nodes. For Densenet this pass was measured at ~.1 seconds. This pass is still non-linear for models which actually do create autodiff subgraphs, because in the ``` bool any_changed = true; while (any_changed) { AliasDb aliasDb(graph_); any_changed = false; for (auto it = workblock.end()->reverseIterator(); it != workblock.begin()->reverseIterator();) { bool changed; std::tie(it, changed) = scanNode(*it, aliasDb); any_changed \|= changed; } } ``` loop we recreate the AliasDb (which is O(N)) every time we merge something and scan node returns. I will make that linear in next PR in the stack. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D22600606 Pulled By: eellison fbshipit-source-id: b08abfde2df474f168104c5b477352362e0b7b16	2020-07-23 14:50:04 -07:00
Elias Ellison	25b6e2e5ee	[JIT] optimize autodiff subgraph slicing (#41437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41437 [copied from commented code] the IR has many nodes which can never be reordered around, such as a prim::Bailout. if a node N is surrounded by two nodes which cannot be reordered, A and B, then a differentiable subgraph that is created from N can only contain nodes from [A, B] The nodes from A to B represent one work block for the subgraph slicer to work on. By creating these up front, we avoid retraversing the whole graph block any time scanNode returns, and we can also avoid attempting to create differentiable subgraphs in work blocks that do not contain a minimum number of differentiable nodes This improved compilation time of e of densenet (the model with the slowest compilation time we're tracking) from 56s -> 28s, and for mobilenet from 8s -> 6s. Test Plan: Imported from OSS Reviewed By: Krovatkin, ZolotukhinM Differential Revision: D22600607 Pulled By: eellison fbshipit-source-id: e5ab6ed87bf6820b4e22c86eabafd9d17bf7cedc	2020-07-23 14:49:57 -07:00
Elias Ellison	da3ff5e473	[JIT] dont count constants in subgraph size (#41436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41436 Constants are not executed as instructions, we should ignore them when counting subgraph size, as we ignore them in counting block size for loop unrolling. Test Plan: Imported from OSS Reviewed By: Krovatkin, ZolotukhinM Differential Revision: D22600608 Pulled By: eellison fbshipit-source-id: 9770b21c936144a3d6a1df89cf3be5911095187e	2020-07-23 14:48:25 -07:00
Ann Shan	dfe7d27d0e	implement lite parameter serializer (#41403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41403 Test Plan: Imported from OSS Reviewed By: kwanmacher Differential Revision: D22611633 Pulled By: ann-ss fbshipit-source-id: b391e8c96234b2e69f350119a11f688e920c7817	2020-07-23 14:25:44 -07:00
Nikita Shulga	b85df3709a	Add __main__ entrypoint to test_futures.py (#41826 ) Summary: Per comment in run_test.py, every test module must have a __main__ entrypoint: `60e2baf5e0/test/run_test.py (L237-L238)` Also disable test_wait_all on Windows, as it fails with an uncaught exception: ``` test_wait_all (__main__.TestFuture) ... Traceback (most recent call last): File "run_test.py", line 744, in <module> main() File "run_test.py", line 733, in main raise RuntimeError(err) RuntimeError: test_futures failed! ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/41826 Reviewed By: seemethere, izdeby Differential Revision: D22654899 Pulled By: malfet fbshipit-source-id: ab7fdd7adce3f32c53034762ae37cf35ce08cafc	2020-07-23 12:56:03 -07:00
Rohan Varma	3626473105	NCCL Backend support for torch.bool (#41318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41318 Closes https://github.com/pytorch/pytorch/issues/24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Map `at::kBool` to `ncclUint8` 2) During reduction (allreduce for example), if the operation is SUM, we instead override to to a MAX, to avoid overflow issues. The rest of the operations work with no changes. In the boolean case, changing sum to max makes no correctness difference since they both function as a bitwise OR. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Tests are added to ensure that the reductions work as expected. ghstack-source-id: 108315417 Test Plan: Added unittests Reviewed By: mrshenli Differential Revision: D22496604 fbshipit-source-id: a1a15381ec41dc59923591885d40d966886ff556	2020-07-23 12:33:39 -07:00
Jiakai Liu	01c406cc22	[pytorch] bump up variable version regardless of differentiability (#41269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41269 The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. One remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. ghstack-source-id: 108318746 Test Plan: CI Reviewed By: ezyang Differential Revision: D22471643 fbshipit-source-id: 3e3a442c7fd851641eb4a9c4f024d1f5438acdb8	2020-07-23 12:07:32 -07:00
Hong Xu	1978188639	Remove two "return"s that return "void" (#41811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41811 Reviewed By: izdeby Differential Revision: D22673690 Pulled By: ezyang fbshipit-source-id: 10d4aff90e2e051116e682fa51fb9494af8482c1	2020-07-23 10:17:29 -07:00
Vishwak Srinivasan	77db93228b	Temporary fix for determinant bug on CPU (#35136 ) Summary: Changelog: - Make diagonal contiguous Temporarily Fixes https://github.com/pytorch/pytorch/issues/34061 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35136 Reviewed By: izdeby Differential Revision: D22673153 Pulled By: ezyang fbshipit-source-id: 850f537483f929fcb43bcdef9d4ec264a7c3d354	2020-07-23 10:12:06 -07:00
guol-fnst	17f76f9a78	Verbose param for schedulers that don't have it #38726 (#41580 ) Summary: Verbose param for schedulers that don't have it https://github.com/pytorch/pytorch/issues/38726 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41580 Reviewed By: izdeby Differential Revision: D22671163 Pulled By: vincentqb fbshipit-source-id: 53a6c9e929141d411b6846bc25f3fe7f46fdf3be	2020-07-23 09:57:33 -07:00
Alvaro	37e7f0caf6	Fix docstring in Unflatten (#41835 ) Summary: I'd like to amend the docstring introduced in https://github.com/pytorch/pytorch/issues/41564. It's not rendering correctly on the web, and this should fix it. cc albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/41835 Reviewed By: izdeby Differential Revision: D22672368 Pulled By: albanD fbshipit-source-id: f0b03c2b2a4c79b790d54f7c8f2ae28ef9d76a75	2020-07-23 09:55:11 -07:00
Taylor Robie	fab1795577	move benchmark utils into torch namespace (#41506 ) Summary: Move the timing utils to `torch.utils._benchmark`. I couldn't figure out how to get setuptools to pick it up and put it under `torch` unless it is in the `torch` directory. (And I think it has to be for `setup.py develop` anyway.) I also modified the record function benchmark since `Timer` and `Compare` should always be available now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41506 Reviewed By: ngimel Differential Revision: D22601460 Pulled By: robieta fbshipit-source-id: 9cea7ff1dcb0bb6922c15b99dd64833d9631c37b	2020-07-23 09:48:39 -07:00
kshitij12345	266657182a	Add `torch.movedim` (#41480 ) Summary: https://github.com/pytorch/pytorch/issues/38349 #36048 TODO: * [x] Tests * [x] Docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/41480 Reviewed By: zhangguanheng66 Differential Revision: D22649917 Pulled By: zou3519 fbshipit-source-id: a7f3920a24bae16ecf2ad731698ca65ca3e8c1ce	2020-07-23 09:41:01 -07:00
chenx	c0e3839845	fix #36801 (#41607 ) Summary: unittest actually did stdout testname like (test_accumulate_grad (__main__.TestAutograd) ... ) first befor test start running. export PYTHONUNBUFFERED=1 or python -u could record this msg. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/41607 Reviewed By: izdeby Differential Revision: D22673930 Pulled By: ezyang fbshipit-source-id: 18512b6f5f80485c2b0d812f2ebdecc1fdc4b4ec	2020-07-23 09:32:46 -07:00
Edgar Andrés Margffoy Tuay	272fb3635f	Add regression test for ONNX exports of modules that embed an Embedding layer inside a Sequential (#32598 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19227 This PR adds a regression test for ONNX exports where a module has a sequential that references an Embedding layer Pull Request resolved: https://github.com/pytorch/pytorch/pull/32598 Reviewed By: izdeby Differential Revision: D22672790 Pulled By: ezyang fbshipit-source-id: c88beb29a36b07378c28b0e4546efe887fcbc3be	2020-07-23 09:32:44 -07:00
Jeong Ukjae	e831299bae	Fix typing error of torch/optim/lr_scheduler.pyi (#41775 ) Summary: * add `_LRScheduler.get_last_lr` type stub. * remove `CosineAnnealingWarmRestarts.step` because its signature is same with `_LRScheduler`'s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41775 Reviewed By: izdeby Differential Revision: D22649350 Pulled By: vincentqb fbshipit-source-id: 5355dd062a5af437f4fc153244dda793a2382e7e	2020-07-23 09:30:32 -07:00
farhadrgh	4b4273a04e	Update Adam documentation (#41679 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/41477 Adam implementation is doing L2 regularization and not decoupled weight decay. However, the change mentioned in https://github.com/pytorch/pytorch/issues/41477 was motivated by Line 12 of algorithm 2 in [Decoupled Weight Decay Regularization](https://arxiv.org/pdf/1711.05101.pdf) paper. Please let me know if you have other suggestions about how to deliver this info in the docs. cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/41679 Reviewed By: izdeby Differential Revision: D22671329 Pulled By: vincentqb fbshipit-source-id: 2caf60e4f62fe31f29aa35a9532d1c6895a24224	2020-07-23 09:25:41 -07:00
Mengchi Zhang	30ce7b3740	Fix bug when compiling with caffe2 (#41868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41868 Fix bug when compiling with caffe2 Reviewed By: jianyuh Differential Revision: D22670707 fbshipit-source-id: aa654d7b9004257e0288c8ae8819ca5752eea443	2020-07-23 09:11:05 -07:00
Tao Xu	0ec7ba4088	[iOS] Bump up the cocoapods version (#41895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41895 ### Summary The iOS binary for 1.6.0 has been uploaded to AWS. This PR bumps up the version for cocoapods. ### Test Plan - Check CI Test Plan: Imported from OSS Reviewed By: husthyc Differential Revision: D22683787 Pulled By: xta0 fbshipit-source-id: bb95b670a7945d823d55e9c65b357765753f295a	2020-07-22 22:03:40 -07:00
Jerry Zhang	2a3ab71f28	[quant][graphmode][fix] Remove useQuantizable check for dynamic quant (#41892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41892 Currently the input of batch_norm is considered as dynamically quantizable but it shouldn't be this PR fixes that Test Plan: internal models Imported from OSS Reviewed By: vkuzo Differential Revision: D22681423 fbshipit-source-id: 7f428751de0c4af0a811b9c952e1d01afda42d85	2020-07-22 21:06:48 -07:00
Nikita Shulga	ca3ba1095e	Do not chown files inside docker for pytorch-job-tests (#41884 ) Summary: They are already owned by `jenkins` user after the build Pull Request resolved: https://github.com/pytorch/pytorch/pull/41884 Reviewed By: orionr Differential Revision: D22682441 Pulled By: malfet fbshipit-source-id: daf99532d300d30a5de591ad03af4597e145fdfc	2020-07-22 19:53:59 -07:00
ashishfarmer	586b7f991c	Enable skipped tests from test_torch on ROCm (#41611 ) Summary: This pull request enables the following tests from test_torch, previously skipped on ROCm: test_pow_-2_cuda_float32/float64 test_sum_noncontig_cuda_float64 test_conv_transposed_large The first two tests experienced precision issues on earlier ROCm version, whereas the conv_transposed test was hitting a bug in MIOpen which is fixed with the version shipping with ROCm 3.5 ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/41611 Reviewed By: xw285cornell Differential Revision: D22672690 Pulled By: ezyang fbshipit-source-id: 5585387c048f301a483c4c0566eb9665555ef874	2020-07-22 19:49:17 -07:00
Nikita Vedeneev	7fefa46820	scatter/gather - check that inputs are of the same dimensionality (#41672 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41672 Reviewed By: malfet, ngimel Differential Revision: D22678302 Pulled By: gchanan fbshipit-source-id: 95a1bde81e660b8963e5914d5348fd4fbff1338e	2020-07-22 18:51:51 -07:00
Eli Uriegas	b40ef422d3	.circleci: Separate out docs build from push (#41871 ) Summary: Separates out the docs build from the push and limits when the push actually happens. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/41871 Reviewed By: yns88 Differential Revision: D22673716 Pulled By: seemethere fbshipit-source-id: fff8b35ba8465dc15832214c4c9ef03ce12faa48	2020-07-22 17:01:24 -07:00
Rohith Menon	4e16be9073	[MemLeak] Fix memory leak from releasing unique ptr (#41883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41883 Fix memory leak from releasing unique ptr Test Plan: Tested serialization with and without the change. Heap profile without change: ``` Welcome to jeprof! For help, type 'help'. (jeprof) top Total: 7298.4 MB 4025.2 55.2% 55.2% 4025.2 55.2% c10::alloc_cpu (inline) 3195.3 43.8% 98.9% 3195.3 43.8% caffe2::SerializeUsingBytesOrInt32 63.6 0.9% 99.8% 63.6 0.9% __gnu_cxx::new_allocator::allocate (inline) 5.0 0.1% 99.9% 5.0 0.1% google::protobuf::RepeatedField::Reserve 2.5 0.0% 99.9% 2.5 0.0% folly::aligned_malloc (inline) 1.2 0.0% 99.9% 1.2 0.0% caffe2::detail::CopyFromProtoWithCast (inline) 1.0 0.0% 99.9% 1.0 0.0% __new_exitfn 1.0 0.0% 100.0% 1.0 0.0% std::_Function_base::_Base_manager::_M_init_functor (inline) 0.5 0.0% 100.0% 0.5 0.0% folly::HHWheelTimerBase::newTimer (inline) 0.5 0.0% 100.0% 0.5 0.0% std::__detail::_Hashtable_alloc::_M_allocate_node ``` Heap profile with change: ``` Welcome to jeprof! For help, type 'help'. (jeprof) top Total: 6689.2 MB 4025.2 60.2% 60.2% 4025.2 60.2% c10::alloc_cpu (inline) 2560.0 38.3% 98.4% 2560.0 38.3% caffe2::::HugePagesArena::alloc_huge (inline) 90.9 1.4% 99.8% 90.9 1.4% __gnu_cxx::new_allocator::allocate (inline) 5.0 0.1% 99.9% 5.0 0.1% google::protobuf::RepeatedField::Reserve 2.0 0.0% 99.9% 2.0 0.0% prof_backtrace_impl (inline) 1.0 0.0% 99.9% 20.3 0.3% std::__cxx11::basic_string::_M_construct (inline) 1.0 0.0% 99.9% 1.0 0.0% std::_Function_base::_Base_manager::_M_init_functor (inline) 0.5 0.0% 99.9% 0.5 0.0% folly::UnboundedQueue::allocNextSegment (inline) 0.5 0.0% 100.0% 0.5 0.0% folly::aligned_malloc (inline) 0.5 0.0% 100.0% 0.5 0.0% __new_exitfn ``` Reviewed By: yinghai Differential Revision: D22662093 fbshipit-source-id: d0b8ff1ed26c72b14bb02fb1146c51ef11a7e519	2020-07-22 16:54:19 -07:00
Jerry Zhang	dbc6a2904b	[quant][graphmode][fix] Remove assert for uses == 1 in remove dequantize pass (#41859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41859 A value can be used multiple times in the same node, we don't really need to assert uses of dequantize == 1 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D22673525 fbshipit-source-id: 2c4a770e0ddee722ca54e68d310c395e7f418b3b	2020-07-22 15:58:11 -07:00
Colin L Reliability Rice	dfa914a90c	Modify lazy_dyndep loading to trigger inside workspace. (#41687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41687 Specifically, this makes a new library (lazy), which can be used from both core and workspace. This allows workspace.Createnet to trigger lazy loading of dyndep dependencies. Test Plan: Added a unit test specifically for workspace.CreateNet Reviewed By: dzhulgakov Differential Revision: D22441877 fbshipit-source-id: 3a9d1af9962585d08ea2566c9c85bec7377d39f2	2020-07-22 15:36:43 -07:00
Ksenija Stanojevic	af5d0bff00	[ONNX] Add pass that fuses Conv and BatchNormalization (#40547 ) Summary: Add pass that fuses Conv and Batchnormalization nodes into one node Conv. This pass is only applied in inference mode (training is None or TrainingMode.Eval). Since this pass needs access to param_dict it is written outside peephole file where these kind of passes (fusing multiple nodes into one) is usually placed. This PR also adds wrapper skipIfNoEmbed to skip debug_embed_params test: Pass that fuses Conv and Batchnorm changes the params of resnet model and parameters of onnx and pytorch model won't match. Since parameters are not matching, debug_embed_params test for test_resnet will fail and that is expected, therefore debug_embed_params test for test_resnet should be skipped. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40547 Reviewed By: gchanan Differential Revision: D22631687 Pulled By: bzinodev fbshipit-source-id: fe45812400398a32541e797f727fd8697eb6d8c0	2020-07-22 14:59:27 -07:00
Daiming Yang	ad7133d3c1	Patch for #40026 RandomSampler generates samples one at a time when replacement=True (#41682 ) Summary: Fix https://github.com/pytorch/pytorch/issues/32530 Fix/Patch https://github.com/pytorch/pytorch/pull/40026 Resubmit this patch and fix the type error. Force the input type to `manual_seed()` in `sampler.py` to be `int`. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/41682 Reviewed By: izdeby Differential Revision: D22665477 Pulled By: ezyang fbshipit-source-id: 1725c8aa742c31e74321f20448f4b6a392afb38d	2020-07-22 13:45:09 -07:00
Yinghai Lu	2d15b39745	[Onnxifi] Support running with quantized int8 inputs (#41820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41820 Pull Request resolved: https://github.com/pytorch/glow/pull/4721 In order to support int8 quantized tensor as an input to OnnxifiOp, we need to - Add support to recognize and extract shape meta from int8 tensor at input of OnnxifiOp - Make a copy of the input data and shift by 128 in Glow if input data is uint8 quantized tensor to get correct result because Glow uses int8 to represent the quantized data regardless. - Propagate correct quantization parameters to through shape info in C2. This diff implements the above. Test Plan: ``` buck test caffe2/caffe2/contrib/fakelowp/test:test_int8_quantnnpi ``` Reviewed By: jackm321 Differential Revision: D22650584 fbshipit-source-id: 5e867f7ec7ce98bb066ec4128ceb7cad321b3392	2020-07-22 13:42:34 -07:00
Nikolay Korovaiko	47c57e8804	rename TestFuser to TestTEFuser (#41542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41542 Reviewed By: jamesr66a Differential Revision: D22579606 Pulled By: Krovatkin fbshipit-source-id: f65b2cae996b42d55ef864bc0b424d9d43d8a2e2	2020-07-22 13:37:27 -07:00
Jeremy Reizenstein	6ceb65f98c	Document default dim for cross being None (#41850 ) Summary: The function torch.cross is a bit confusing, in particular the defaulting of the dim argument. The default `dim` has been documented as -1 but it is actually `None`. This increases the confusion, in two possible ways depending on how carefully you read the rest. I also add a final warning to the final sentence. This partially addresses https://github.com/pytorch/pytorch/issues/39310. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41850 Reviewed By: izdeby Differential Revision: D22664625 Pulled By: albanD fbshipit-source-id: b8669e026fd01de9e4ec16da1414b9edfaa76bdd	2020-07-22 13:31:47 -07:00
Shen Li	b80ffd44b0	Revert D20781624: Add NCCL Alltoall to PT NCCL process group Test Plan: revert-hammer Differential Revision: D20781624 (`b87f0e5085`) Original commit changeset: 109436583ff6 fbshipit-source-id: 03f6ee4d56baea93a1cf795d26dd92b7d6d1df28	2020-07-22 13:22:17 -07:00
Kurt Mohler	ec683299eb	Reland Add non-deterministic alert to CUDA operations that use `atomicAdd()` (#41538 ) Summary: Reland PR https://github.com/pytorch/pytorch/issues/40056 A new overload of upsample_linear1d_backward_cuda was added in a recent commit, so I had to add the nondeterministic alert to it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41538 Reviewed By: zou3519 Differential Revision: D22608376 Pulled By: ezyang fbshipit-source-id: 54a2aa127e069197471f1feede6ad8f8dc6a2f82	2020-07-22 13:12:29 -07:00
Nick Gibson	aa91a65b59	[TensorExpr] Fix propagation of loop options when splitting loops (#40035 ) Summary: Fix a bug in SplitWithTail and SplitWithMask where loop_options such as Cuda block/thread bindings are overwritten by the split. This PR fixes this bug by propagating the loop options to the outer loop, which for axis bindings should be equivalent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40035 Reviewed By: ZolotukhinM Differential Revision: D22080263 Pulled By: nickgg fbshipit-source-id: b8a9583fd90f69319fc4bb4db644e91f6ffa8e67	2020-07-22 11:49:07 -07:00
mattip	9c7ca89ae6	Conda build (#38796 ) Summary: closes gh-37584. ~I think I need to do more to generate an image, but the `.circleci/README.md` is vague in the details. The first commit reflows and updates that document a bit, I will continue to update it as the PR progresses :)~ Dropped updating `.circleci/README.md`, will do that in a separate PR once this is merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38796 Reviewed By: gchanan Differential Revision: D22627522 Pulled By: ezyang fbshipit-source-id: 99d5c19e942f15b9fc10f0de425790474a4242ab	2020-07-22 11:42:39 -07:00
Xiang Gao	61511aa1d6	Remove zmath_std.h (#39835 ) Summary: std::complex is gone Pull Request resolved: https://github.com/pytorch/pytorch/pull/39835 Reviewed By: gchanan Differential Revision: D22639834 Pulled By: anjali411 fbshipit-source-id: 57da43d4e6c82261b1f9e5b876f1bbbdf9ae56ca	2020-07-22 11:08:17 -07:00
Venkata Chintapalli	ca68dc7fa2	replace std::clamp with shim (#41855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41855 replace std::clamp with shim Test Plan: test_op_nnpi_fp16.py covers the testing. Reviewed By: hyuen Differential Revision: D22667645 fbshipit-source-id: 5e7c94b499f381bde73f1984a6f0d01fb962a671	2020-07-22 11:06:36 -07:00
Srinivas Sridharan	b87f0e5085	Add NCCL Alltoall to PT NCCL process group (#39984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39984 Add Alltoall and Alltoallv to PT NCCL process group using NCCL Send/Recv. Reviewed By: jiayisuse Differential Revision: D20781624 fbshipit-source-id: 109436583ff69a3fea089703d32cfc5a75f973e0	2020-07-22 10:55:51 -07:00
Koki Nishihara	2da8c8df08	[quant] Reaname from quantized... to ...quantized_cpu in the native_functions.yaml (#41071 ) Summary: Issue https://github.com/pytorch/pytorch/issues/40315 Reaname from `quantized...` to `...quantized_cpu` in the native_functions.yaml Pull Request resolved: https://github.com/pytorch/pytorch/pull/41071 Reviewed By: z-a-f Differential Revision: D22487087 Pulled By: jerryzh168 fbshipit-source-id: f0d12907967739794839c1ffea44e78957f50b9b	2020-07-22 10:45:41 -07:00
Jeong Ukjae	f03156f9df	replace blacklist in caffe2/python/onnx/frontend.py (#41777 ) Summary: Close https://github.com/pytorch/pytorch/issues/41712 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41777 Reviewed By: izdeby Differential Revision: D22648532 Pulled By: yinghai fbshipit-source-id: 7f4c9f313e2887e70bb4eb1ab037aea6b549cec7	2020-07-22 10:02:16 -07:00
Jeff Daily	5152633258	[ROCm] update hip library name (#41813 ) Summary: With transition to hipclang, the HIP runtime library name was changed. A symlink was added to ease the transition, but is going to be removed. Conditionally set library name based on HIP compiler used. Patch gloo submodule as part of build_amd.py script until its associated fix is available. CC ezyang xw285cornell sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41813 Reviewed By: zhangguanheng66 Differential Revision: D22660077 Pulled By: xw285cornell fbshipit-source-id: c538129268d9947535b34523201f655b13c9e0a3	2020-07-22 09:42:45 -07:00
Facebook Community Bot	9fbcfe848b	Automated submodule update: FBGEMM (#41814 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `139c6f2292` Pull Request resolved: https://github.com/pytorch/pytorch/pull/41814 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: dskhudia Differential Revision: D22648844 fbshipit-source-id: 4cfa8d83585407f870ea2bdee74e1c1f371082eb	2020-07-22 09:38:15 -07:00
Gregory Chanan	71aad6ea66	Revert "port masked_select from TH to ATen and optimize perf on CPU (#33269 )" (#41828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41828 This reverts commit fe66bdb498efe912d8b9c437a14efa4295c04fdd. This also makes a sense to THTensorEvenMoreMath because sumall was removed, see THTensor_wrap. Test Plan: Imported from OSS Reviewed By: orionr Differential Revision: D22657473 Pulled By: malfet fbshipit-source-id: 95a806cedf1a3f4df91e6a21de1678252b117489	2020-07-22 09:28:04 -07:00
Edmund Williams Jr	fd62847eb2	cross_layer_equalization (#41685 ) Summary: The goal is to implement cross layer equalization as described in section 4.1 in this paper: https://arxiv.org/pdf/1906.04721.pdf Given two adjacent submodules in a trained model, A,B quantization might hurt one of the submodules more than the other. The paper poses the idea that a loss in accuracy from quantizing can be due to a difference in the channel ranges between the two submodules (the output channel range of A can be small, while the input channel range of B can be large). To minimize this source of error, we want to scale the tensors of A,B s.t. their channel ranges are equal (them being equal means no difference in ranges and minimizes this source of error). Pull Request resolved: https://github.com/pytorch/pytorch/pull/41685 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D22630219 Pulled By: edmundw314 fbshipit-source-id: ccc91ba12c10b652d7275222da8b85455b8a7cd5	2020-07-22 08:39:23 -07:00
Luca Wehrstedt	fced54aa67	[RPC tests] Fix test_init_(rpc\|pg)_then_(rpc\|pg) not shutting down RPC (#41558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41558 The problem was due to non-deterministic destruction order of two global static variables: the mutexes used by glog and the RPC agent (which was still set because we didn't call `rpc.shutdown()`). When the TensorPipe RPC agent shuts down some callbacks may fire with an error and thus attempt to log something. If the mutexes have already been destroyed this causes a SIGABRT. Fixes https://github.com/pytorch/pytorch/issues/41474 ghstack-source-id: 108231453 Test Plan: Verified in https://github.com/pytorch/pytorch/issues/41474. Reviewed By: fmassa Differential Revision: D22582779 fbshipit-source-id: 63e34d8a020c4af996ef079cfb7041b2474e27c9	2020-07-22 06:33:19 -07:00
Jiakai Liu	e17e55831d	[pytorch] disable per-op profiling for internal mobile build (#41825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41825 Add flag to gate D21374246 (`e7a09b4d17`) to mitigate mobile size regression. ghstack-source-id: 108212047 Test Plan: CI Reviewed By: linbinyu Differential Revision: D22650708 fbshipit-source-id: ac9318af824ac31f519b7d5b4fe72df892d8d3f9	2020-07-22 03:02:21 -07:00
Vinnam Kim	825a387ea2	Fix bug on the backpropagation of LayerNorm when create_graph=True (#41595 ) Summary: Solve an issue https://github.com/pytorch/pytorch/issues/41332 I found the bug at https://github.com/pytorch/pytorch/issues/41332 is caused by LayerNorm. Current implementations of LayerNorm have a disparity between 1. [`create_graph=False` CUDA implementation](`dde3d5f4a8/aten/src/ATen/native/cuda/layer_norm_kernel.cu (L145)`) 2. [`create_graph=True` implementation](`dde3d5f4a8/tools/autograd/templates/Functions.cpp (L2536)`) With this bug-fix, https://github.com/pytorch/pytorch/issues/41332 is solved. Ailing BIT-silence Signed-off-by: Vinnam Kim <vinnamkim@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/41595 Reviewed By: houseroad Differential Revision: D22598415 Pulled By: BIT-silence fbshipit-source-id: 63e390724bd935dc8e028b4dfb75d34a80558c3a	2020-07-22 00:19:12 -07:00
Mengchi Zhang	5c9918e757	Fix row-wise sparse SparseLengthSum and sparse adagrad fused operator (#41818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41818 Fix row-wise sparse SparseLengthSum and sparse adagrad fused operator Reviewed By: jianyuh Differential Revision: D22345013 fbshipit-source-id: 7c2d6c506b404f15a7aa8f1d0ccadb82e515a4c3	2020-07-21 19:32:16 -07:00
Jerry Zhang	a0f2a5625f	[quant][graphmode][fix] Make it work with CallMethod on non-Module objects (#41576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41576 Previously we are assuming CallMethod only happens on module instances, but it turns out this is not true, this PR fixes this issue. Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D22592789 fbshipit-source-id: 48217626d9ea8e82536f00a296b8f9a471582ebe	2020-07-21 19:03:40 -07:00
Yujun Zhao	ce8c7185de	Add unittests to Comparison Operator Kernels in `BinaryOpsKernel.cpp` (#41809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41809 Add new unittests to Operator Kernels. Explicitly announce function type in tests because it can't be inferred. Test Plan: CI Reviewed By: malfet Differential Revision: D22647221 fbshipit-source-id: ef2f0e8c847841e90aa26d028753f23c8c53d6b0	2020-07-21 18:26:53 -07:00
Vasiliy Kuznetsov	302e566205	add max_and_min function and cpu kernel to speed up observers (#41570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41570 For min/max based quantization observers, calculating min and max of a tensor takes most of the runtime. Since the calculation of min and max is done on the same tensor, we can speed this up by only reading the tensor once, and reducing with two outputs. One question I had is whether we should put this into the quantization namespace, since the use case is pretty specific. This PR implements the easier CPU path to get an initial validation. There is some needed additional work in future PRs, which durumu will take a look at: * CUDA kernel and tests * making this work per channel * benchmarking on observer * benchmarking impact on QAT overhead Test Plan: ``` python test/test_torch.py TestTorch.test_min_and_max ``` quick bench (not representative of real world use case): https://gist.github.com/vkuzo/7fce61c3456dbc488d432430cafd6eca ``` (pytorch) [vasiliy@devgpu108.ash6 ~/local/pytorch] OMP_NUM_THREADS=1 python ~/nfs/pytorch_scripts/observer_bench.py tensor(5.0390) tensor(-5.4485) tensor([-5.4485, 5.0390]) min and max separate 11.90243935585022 min and max combined 6.353186368942261 % decrease 0.466228209277153 (pytorch) [vasiliy@devgpu108.ash6 ~/local/pytorch] OMP_NUM_THREADS=4 python ~/nfs/pytorch_scripts/observer_bench.py tensor(5.5586) tensor(-5.3983) tensor([-5.3983, 5.5586]) min and max separate 3.468616485595703 min and max combined 1.8227086067199707 % decrease 0.4745142294372342 (pytorch) [vasiliy@devgpu108.ash6 ~/local/pytorch] OMP_NUM_THREADS=8 python ~/nfs/pytorch_scripts/observer_bench.py tensor(5.2146) tensor(-5.2858) tensor([-5.2858, 5.2146]) min and max separate 1.5707778930664062 min and max combined 0.8645427227020264 % decrease 0.4496085496757899 ``` Imported from OSS Reviewed By: supriyar Differential Revision: D22589349 fbshipit-source-id: c2e3f1b8b5c75a23372eb6e4c885f842904528ed	2020-07-21 18:16:22 -07:00
Paul Shao	9e0c746b15	Augmenting Concrete Observer Constructors to Support Dynamic Quantization Range; Modifying Utility Functions in _LearnableFakeQuantize Module for Better Logging and Baseline Construction. (#41815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41815 All are minor changes to enable better simulations. The constructors of MinMaxObserver, MovingAverageMinMaxObserver, PerChannelMinMaxObserver, and MovingAveragePerChannelMinMaxObserver are augmented so they can utilize the dynamic quantization range support in the _ObserverBase class. In addition, minor adjustments are made to the enable_static_observation function that allow observer to update parameters but do not fake quantize on the output (for constructing baseline). Test Plan: To ensure this modification is still backward compatible with past usages, numerics are verified by running the quantization unit test suite, which contains various observer tests. The following command executes the test suite, which also verifies the observer numerics: ``` buck test //caffe2/test:quantization -- observer ``` Reviewed By: z-a-f Differential Revision: D22649128 fbshipit-source-id: 32393b706f9b69579dc2f644fb4859924d1f3773	2020-07-21 17:59:40 -07:00
Xiao Wang	60e2baf5e0	[doc] Add LSTM non-deterministic workaround (#40893 ) Summary: Related: https://github.com/pytorch/pytorch/issues/35661 Preview ![image](https://user-images.githubusercontent.com/24860335/86861581-4b4c7100-c07c-11ea-950a-3145bfae9af9.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/40893 Reviewed By: vincentqb Differential Revision: D22535418 Pulled By: ngimel fbshipit-source-id: f194ddaff8ec6d03a3616c87466e2cbbe7e429a9	2020-07-21 16:20:02 -07:00
Bert Maher	941069ca09	[tensorexpr][trivial] Remove debug printing from test (#41806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41806 Generally a good practice not to have tests spew output. Test Plan: `build/bin/test_tensorexpr` Imported from OSS Reviewed By: zheng-xq Differential Revision: D22646833 fbshipit-source-id: 444e883307d058fe77e7550d436fa61b7d91a701	2020-07-21 15:54:31 -07:00
Nick Gibson	7ffdd765c8	[TensorExpr] more convenient outer Rfactor output (#40050 ) Summary: Auto fuse the output loops of outer Rfactors, so it is in a more convenient format for binding GPU axes. An example: ``` Tensor* c = Reduce("sum", {}, Sum(), b, {{m, "m"}, {n, "n"}, {k, "k"}}); LoopNest loop({c}); std::vector<For> loops = loop.getLoopStmtsFor(c); auto v = loops.at(0)->var(); loop.rfactor(c->body(), v); ``` Before: ``` { Allocate(tmp_buf, float, {m}); sum[0] = 0.f; for (int m_1 = 0; m_1 < m; m_1++) { tmp_buf[m_1] = 0.f; } for (int m_1 = 0; m_1 < m; m_1++) { for (int n = 0; n < n_1; n++) { for (int k = 0; k < k_1; k++) { tmp_buf[m_1] = (tmp_buf[m_1]) + (b[((n_1 m_1) * k_1 + k) + k_1 * n]); } } } for (int m_1 = 0; m_1 < m; m_1++) { sum[0] = (sum[0]) + (tmp_buf[m_1]); } Free(tmp_buf); } ``` After: ``` { sum[0] = 0.f; for (int m = 0; m < m_1; m++) { Allocate(tmp_buf, float, {m_1}); tmp_buf[m] = 0.f; for (int n = 0; n < n_1; n++) { for (int k = 0; k < k_1; k++) { tmp_buf[m] = (tmp_buf[m]) + (b[((n_1 * m) * k_1 + k) + k_1 * n]); } } sum[0] = (sum[0]) + (tmp_buf[m]); Free(tmp_buf); } } ``` The existing Rfactor tests cover this case, although I did rename a few for clarity. This change broke the LLVMRFactorVectorizedReduction test because it now does what its intending to (vectorize a loop with a reduction in it) rather than nothing, and since that doesn't work it correctly fails. I've disabled it for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40050 Reviewed By: ZolotukhinM Differential Revision: D22605639 Pulled By: nickgg fbshipit-source-id: e359be53ea62d9106901cfbbc42d55d0e300e8e0	2020-07-21 14:44:26 -07:00
Linbin Yu	dac393fa24	[PT] enforce duplicate op name check on mobile Summary: Enforce duplicate op name check on mobile Test Plan: run full/lite predictor Reviewed By: iseeyuan Differential Revision: D22639758 fbshipit-source-id: 2993c4bc1b14c833b273183f4f343ffad62121b3	2020-07-21 13:14:17 -07:00
superkirill	62f4f87914	Removed whitelist reference from tools/clang_format_ci.sh (#41636 ) Summary: Removed whitelist and blacklist references Fixes https://github.com/pytorch/pytorch/issues/41753 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41636 Reviewed By: SplitInfinity Differential Revision: D22648632 Pulled By: suo fbshipit-source-id: d22130a7cef96274f3fc73d00b50327dfcae332c	2020-07-21 12:32:14 -07:00
Alban Desmaison	1ad7160a59	fix backward compat (#41810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41810 Reviewed By: malfet Differential Revision: D22647763 Pulled By: albanD fbshipit-source-id: 8ce70ecb706bb98ed24b0b3e7e9ebf3d4c270964	2020-07-21 12:14:55 -07:00
Ralf Gommers	03186a86d9	Add test dependencies to CONTRIBUTING.md (#41799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41799 Reviewed By: zhangguanheng66 Differential Revision: D22645323 Pulled By: zou3519 fbshipit-source-id: 0a695bffb57b29024461472dd1c8518a9a0d1d3b	2020-07-21 11:29:38 -07:00
ariela	341c4045df	replaced blacklist with blocklist in test/test_type_hints.py (#41644 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41719. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41644 Reviewed By: zhangguanheng66 Differential Revision: D22645479 Pulled By: zou3519 fbshipit-source-id: 82710acae96ab508b8e9198dadb7d7911cb97235	2020-07-21 11:23:19 -07:00
kenjihiraoka	46808b49a8	Change whitelist to allow in file test_quantized_op.py (#41771 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41771 Reviewed By: zhangguanheng66 Differential Revision: D22641463 Pulled By: SplitInfinity fbshipit-source-id: 1a60af8d43ccdf1f35dc84dbf4a7bc64965eb44a	2020-07-21 11:08:07 -07:00
peter	72a1146339	Skip warning 4522 with MSVC (#41648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41648 Reviewed By: zhangguanheng66 Differential Revision: D22644623 Pulled By: malfet fbshipit-source-id: 7fb86f05b3d8cd6a4c7c0e3fdfd651b70a5094c9	2020-07-21 09:47:30 -07:00
Cloud Han	2da2b5c081	update CONTRIBUTING.md for ccache (#41619 ) Summary: ccache now use cmake for building, update installation script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41619 Reviewed By: zhangguanheng66 Differential Revision: D22644594 Pulled By: malfet fbshipit-source-id: f894dd408822231f8aab36efbce188f06f004057	2020-07-21 09:43:30 -07:00
Eli Uriegas	523f80e894	.circleci: Remove docker_hub_index_job, wasn't used (#41800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41800 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: soumith Differential Revision: D22645363 Pulled By: seemethere fbshipit-source-id: 35ed43ed5fb4053f71dc9525c4ed62f1c60eacc1	2020-07-21 09:16:02 -07:00
lcskrishna	1f11e930d0	[ROCm] skip test_streams on rocm. (#41697 ) Summary: Skipping the test test_streams as it is flaky on rocm. cc: jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41697 Reviewed By: zhangguanheng66 Differential Revision: D22644600 Pulled By: malfet fbshipit-source-id: b1b16d496e58a91c44c40d640851fd62a5d7393d	2020-07-21 08:55:07 -07:00
Wojciech Baranowski	48569cc330	Reland split (#41567 ) Summary: Take 3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41567 Reviewed By: zou3519 Differential Revision: D22586331 Pulled By: albanD fbshipit-source-id: ca08199da716d64a335455610edbce752fee224b	2020-07-21 08:06:27 -07:00
Alvaro	c89c294ef9	Add Unflatten Module (#41564 ) Summary: This PR implements a feature extension discussed in https://github.com/pytorch/pytorch/issues/41516. I followed this other PR https://github.com/pytorch/pytorch/issues/22245 to add this other module. While I was at it, I also added `extra_repr()` method in `Flatten` which was missing. I see there are no unit tests for these modules. Should I add those too? If so, what is the best place I should place these? Pull Request resolved: https://github.com/pytorch/pytorch/pull/41564 Reviewed By: gchanan Differential Revision: D22636766 Pulled By: albanD fbshipit-source-id: f9efdefd3ffe7d9af9482087625344af8f990943	2020-07-21 07:43:02 -07:00
Natalia Gimelshein	fe415589a9	disable mkl for expm1 (#41654 ) Summary: On some systems/mkl versions it produces expm1(nan)=-1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41654 Reviewed By: mruberry Differential Revision: D22621333 Pulled By: ngimel fbshipit-source-id: 84544679fe96aed7de6873dce6f31f488e5e35dd	2020-07-20 23:40:17 -07:00
Hongyi Jia	65bd38127a	GLOO process group GPU alltoall (#41690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41690 Gloo alltoall for GPU Test Plan: buck test mode/dev-nosan caffe2/torch/lib/c10d:ProcessGroupGlooTest Reviewed By: osalpekar Differential Revision: D22631554 fbshipit-source-id: 4b126d9d991a118f3925c005427f399fc60f92f7	2020-07-20 19:01:12 -07:00
Paul Shao	5c50cb567c	Generalized Learnable Fake Quantizer Module (#41535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41535 A generalized fake quantization module is built to support lower-bit fake quantization with back propagation on the scale and zero point. The module supports both per tensor and per channel fake quantization. Test Plan: Please see diff D22337313 for a related experiment performed on the fake quantizer module. The `_LearnableFakeQuantize` module supports the following use cases: - Per Tensor Fake Quantization or Per Channel Fake Quantization - Static Estimation from Observers or Quantization Parameter Learning through Back Propagation By default, the module assumes per tensor affine fake quantization. To switch to per channel, during initialization, declare `channel_size` with the appropriate length. To toggle between utilizing static estimation and parameter learning with back propagation, you can invoke the call `enable_param_learning` or `enable_static_estimate`. For more information on the flags that support these operations, please see the doc string of the `_LearnableFakeQuantize` module. The `_LearnableFakeQuantizer` module relies on 2 operators for its forward and backward paths: `_LearnableFakeQuantizePerTensorOp` and `_LearnableFakeQuantizePerChannelOp`. The backpropagation routine is developed based on the following literature: - Learned Step Size Quantization: https://openreview.net/pdf?id=rkgO66VKDS - Trained Quantization Thresholds: https://arxiv.org/pdf/1903.08066.pdf Reviewed By: z-a-f Differential Revision: D22573645 fbshipit-source-id: cfd9ece8a959ae31c00d9beb1acf9dfed71a7ea1	2020-07-20 18:24:21 -07:00
Venkata Chintapalli	3a9a64a4da	Add non zero offset test cases for Quantize and Dequantize Ops. (#41693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41693 Add non zero offset test cases for Quantize and Dequantize Ops. Test Plan: Added new test case test_int8_non_zero_offset_quantize part of the test_int8_ops_nnpi.py test file. Reviewed By: hyuen Differential Revision: D22633796 fbshipit-source-id: be17ee7a0caa6e9bc7b175af539be2e6625ad47a	2020-07-20 16:03:32 -07:00
Ann Shan	1039bbf4eb	add named parameters to mobile module (#41376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41376 torch::jit::mobile::Module does not currently support accessing parameters via their attribute names, but torch::jit::Module does. This diff adds an equivalent functionality to mobile::Module. Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D22609142 Pulled By: ann-ss fbshipit-source-id: 1a5272ff336f99a3c0bb6194c6a6384754f47846	2020-07-20 15:57:49 -07:00
Nikita Shulga	30551ea7b2	Update NCCL from 2.4.8 to 2.7.3 (#41608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41608 Reviewed By: mrshenli, ngimel Differential Revision: D22604953 Pulled By: malfet fbshipit-source-id: 28151e2d5b6ea360b79896cb79c761756687d121	2020-07-20 13:21:47 -07:00
Eileen Pan	f07816003a	[2/n][Compute Meta] support analysis for null flag features Summary: ## TLDR Support using NaN default value for missing dense features in RawInputProcessor for DPER2. In preparation for subsequent support for null flag features in compute meta. For train_eval this is already supported in DPER3 and we do not plan to support this in DPER2 train eval. Differential Revision: D22439142 fbshipit-source-id: 99ae9755bd41a5d5f43bf5a9a2819d64f3883005	2020-07-20 13:13:45 -07:00
Ann Shan	897cabc081	Add operators for smart keyboard to lite interpreter (#41539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41539 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D22574746 Pulled By: ann-ss fbshipit-source-id: 3e2b78385149d7bde2598c975e60845a766ef86a	2020-07-20 12:08:58 -07:00
Elias Ellison	de400fa5ac	[JIT] handle specially mapped ops (#41503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41503 Fix for https://github.com/pytorch/pytorch/issues/41192 We can map fill_ and zero_ to their functional equivalents full_like and zeros_like Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D22629269 Pulled By: eellison fbshipit-source-id: f1c62684dc55682c0b3845022e0461ec77d07179	2020-07-20 12:03:31 -07:00
Elias Ellison	6161730174	[JIT] move remove mutation to its own test file (#41502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41502 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D22629270 Pulled By: eellison fbshipit-source-id: fcec6ae4ff8f108164539d67427ef3d72fa07494	2020-07-20 12:03:28 -07:00
Eli Uriegas	cfcee816f1	.circleci: Prefix docker jobs with docker- (#41689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41689 It's annoying not to know which jobs are actually related to docker builds so let's just add the prefix. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22631578 Pulled By: seemethere fbshipit-source-id: ac0cdd983ccc3bebcc360ba479b378d8f0eaa9c0	2020-07-20 12:00:53 -07:00
Venkata Chintapalli	cc3c18edbc	More LayerNorm Vectorization in calcMeanStd function. (#41618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41618 More LayerNorm Vectorization in calcMeanStd function. Test Plan: test covered in test_layernorm_nnpi_fp16.py Reviewed By: hyuen Differential Revision: D22606585 fbshipit-source-id: be773e62f0fc479dbc2d6735f60c2e98441916e9	2020-07-20 11:55:54 -07:00
Taras Savchyn	26bbbeaea4	[DOCS] Fix the docs for the inputs arg of trace_module func (#41586 ) Summary: Fix the docs for the `inputs` arg of `trace_module` func. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41586 Reviewed By: ezyang Differential Revision: D22598453 Pulled By: zou3519 fbshipit-source-id: c2d182238b5a51f6d0a7d0683372d72a239146c5	2020-07-20 10:57:56 -07:00
Alphons Jaimon	ce443def01	Grammar patch 1 (.md) (#41599 ) Summary: A minor spell check! I have gone through a dozen of .md files to fix the typos. zou3519 take a look! Pull Request resolved: https://github.com/pytorch/pytorch/pull/41599 Reviewed By: ezyang Differential Revision: D22601629 Pulled By: zou3519 fbshipit-source-id: 68d8f77ad18edc1e77874f778b7dadee04b393ef	2020-07-20 10:19:08 -07:00
Alexander Grund	6769b850b2	Remove needless test duplication (#41583 ) Summary: The test loops over `upper` but does not use it effectively running the same test twice which increases test times for no gain. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41583 Reviewed By: soumith, seemethere, izdeby Differential Revision: D22598475 Pulled By: zou3519 fbshipit-source-id: d100f20143293a116ff3ba08b0f4eaf0cc5a8099	2020-07-20 10:14:11 -07:00
Paul Shao	16dde6e3a0	Augmenting Observers to Support Dynamic Quantization Range (#41113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41113 In this diff, the `ObserverBase` class is augmented with 2 additional optional arguments qmin and qmax. Correspondingly the calculation of qmin and qmax and the related quantization parameters are modified to accommodate this additional flexibility should the number of bits for quantization be lower than 8 (the default value). Additional logic in the base class `_calculate_qparams` function has also been modified to provide support for dynamic quantization range. Test Plan: To ensure this modification is still backward compatible with past usages, numerics are verified by running the quantization unit test suite, which contains various observer tests. The following command executes the test suite, which also verifies the observer numerics: `buck test //caffe2/test:quantization -- observer` This modified observer script can be tested within the experiments for lower bit fake quantization. Please see the following diffs for reference. - Single Fake Quantizer: D22337447 - Single Conv Layer: D22338532 Reviewed By: z-a-f Differential Revision: D22427134 fbshipit-source-id: f405e633289322078b0f4a417f54b684adff2549	2020-07-20 08:51:31 -07:00
wudenggang	9600ed9af3	typo fixes (#41632 ) Summary: typo fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/41632 Reviewed By: ezyang Differential Revision: D22617827 Pulled By: mrshenli fbshipit-source-id: c2bfcb7cc36913a8dd32f13fc9adc3aa0a9b682f	2020-07-20 07:23:00 -07:00
Mike Ruberry	bd42e1a082	Doc language fixes (#41643 ) Summary: Updates doc for abs, acos, and isinf for clarity and consistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41643 Reviewed By: ngimel Differential Revision: D22622957 Pulled By: mruberry fbshipit-source-id: 040f01b4e101153098577bf10dcd569b679aae2c	2020-07-19 21:31:51 -07:00
Natalia Gimelshein	a69a262810	workaround segfault in deviceGuard construction (#41621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41621 Per title. In some situation, deviceGuard constructor in mul_kernel_cuda segfaults, so construct deviceGuard conditionally only when first argument is scalar. This does not root cause why deviceGuard constructor segfaults, so the issue might come back. Test Plan: pytorch oss CI Reviewed By: jianyuh Differential Revision: D22616460 fbshipit-source-id: b91bbe55c6eb0bbe80b8d6a61c41f09288752658	2020-07-18 23:41:43 -07:00
Yanan Cao	4a3aad354a	[1/N] Implement Enum JIT support (#41390 ) Summary: * Add EnumType and AnyEnumType as first-class jit type * Add Enum-typed IValue * Enhanced aten::eq to support Enum Supported: Enum-typed function targuments using Enum type and comparing them TODO: Add PyThon sugared value for Enum Support getting name/value attrs of enums Support Enum-typed return values Support enum values of different types in same Enum class Support serialization and deserialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/41390 Reviewed By: eellison Differential Revision: D22524388 Pulled By: gmagogsfm fbshipit-source-id: 1627154a64e752d8457cd53270f3d14aea4b1150	2020-07-18 22:15:06 -07:00
Linbin Yu	46eb8d997c	Revert D22533824: [PT] add check for duplicated op names in JIT Test Plan: revert-hammer Differential Revision: D22533824 (`d72c9f4200`) Original commit changeset: b36884531d41 fbshipit-source-id: 8bf840a09b4001cc68858a5dc3540505a0e1abdc	2020-07-18 17:26:42 -07:00
Mike Ruberry	c7bcb285f3	Makes elementwise comparison docs more consistent (#41626 ) Summary: - Removes outdated language like "BoolTensor" - Consistently labels keyword arguments, like out - Uses a more natural string to describe their return type - A few bonus fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/41626 Reviewed By: ngimel Differential Revision: D22617322 Pulled By: mruberry fbshipit-source-id: 03cc3562b78a07ed30bd1dc7936d7a4f4e31f01d	2020-07-18 16:30:59 -07:00
Ilia Cherniavskii	e7a09b4d17	RecordFunction in Dispatcher (#37587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37587 Lifting RecordFunction up into the dispatcher code Test Plan: Imported from OSS Differential Revision: D21374246 fbshipit-source-id: 19f9c1719e6fd3990e451c5bbd771121e91128f7	2020-07-17 22:20:05 -07:00
Justin Huber	c6d0fdd215	torch.isreal (#41298 ) Summary: https://github.com/pytorch/pytorch/issues/38349 mruberry Not entirely sure if all the changes are necessary in how functions are added to Pytorch. Should it throw an error when called with a non-complex tensor? Numpy allows non-complex arrays in its imag() function which is used in its isreal() function but Pytorch's imag() throws an error for non-complex arrays. Where does assertONNX() get its expected output to compare to? Pull Request resolved: https://github.com/pytorch/pytorch/pull/41298 Reviewed By: ngimel Differential Revision: D22610500 Pulled By: mruberry fbshipit-source-id: 817d61f8b1c3670788b81690636bd41335788439	2020-07-17 22:07:24 -07:00
Hongzheng Shi	581e9526bb	[GradualGating] support better k value change (#41557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41557 - add new learning rate functor "slope" - use "slope" learning rate in gated_sparse_feature module Test Plan: buck test dper3/dper3/modules/tests:core_modules_test -- test_gated_sparse_features_shape_num_warmup_tensor_k buck test caffe2/caffe2/python/operator_test:learning_rate_op_test -- test_slope_learning_rate_op Reviewed By: huayuli00 Differential Revision: D22544628 fbshipit-source-id: f2fcae564e79e1d8bcd3a2305d0c11ca7c0d3b3c	2020-07-17 20:44:28 -07:00
Linbin Yu	d72c9f4200	[PT] add check for duplicated op names in JIT (#41549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41549 D22467871 (`a548c6b18f`) was reverted due to double linking torch_mobile_train. Re-do this change after D22531358 (`7a33d8b001`). Test Plan: buck install fb4a Train mnist in Internal Settings. Reviewed By: iseeyuan Differential Revision: D22533824 fbshipit-source-id: b36884531d41cea2e76b7fb1a567f21106c612b6	2020-07-17 20:26:48 -07:00
Linbin Yu	96ac12fdf4	[PT] add overload name for int prim ops (#41578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41578 A new op aten::gcd(Tensor...) was added while the duplicated op name check was disabled. It's not a prime op, but it has the same name with one prime op aten::gcd(int, int). It will be safer to enforce all prim ops have overload name, even there is no duplicated name right now. People may add tensor ops without overload name in the future. This diff added the overload name for all ops defined using "DEFINE_INT_OP". ``` aten::__and__ aten::__or__ aten::__xor__ aten::__lshift__ aten::__rshift__ aten::__round_to_zero_floordiv aten::gcd ``` Test Plan: run full JIT predictor Reviewed By: iseeyuan Differential Revision: D22593689 fbshipit-source-id: b3335d356a774d33450a09d0a43ff947197f9b8a	2020-07-17 18:18:38 -07:00
Presley Graham	445e7eb01b	Add quantized CELU operator by adding additional parameters to quantized ELU (#39199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39199 Test Plan: Imported from OSS Differential Revision: D21771202 Pulled By: durumu fbshipit-source-id: 910de6202fa3d5780497c5bf85208568a09297dd	2020-07-17 17:56:33 -07:00
Heitor Schueroff de Souza	1734f24276	Revert D22525217: [pytorch][PR] Initial implementation of quantile operator Test Plan: revert-hammer Differential Revision: D22525217 (`c7798ddf7b`) Original commit changeset: 27a8bb23feee fbshipit-source-id: 3beb3d4f8a4d558e993fbdfe977af12c7153afc8	2020-07-17 17:22:48 -07:00
Stanislau Hlebik	b774ce54f8	remediation of S205607 fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3	2020-07-17 17:19:47 -07:00
Stanislau Hlebik	8fdea489af	remediation of S205607 fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac	2020-07-17 17:17:03 -07:00
Hao Lu	39b4701d31	[caffe2][redo] Reimplement RemoveOpsByType with SSA (#41606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41606 The previous diff (D22220798 (`59294fbbb9`) and D22220797) was recently reverted (D22492356 (`28291d3cf8`), D22492355) because of a bug associated with the op AsyncIf. The AsyncIf op has net_defs as args and the SSA rewriting didn't take that into account. It has a special path for the op If, but not for AsyncIf. Several changes I made to fix the bug: 1) Add op AsyncIf to the special path for If op in SSA rewriting 2) clear inputs/outputs of the netdefs that are args in If/AsyncIf ops because they're no longer valid 3) revert renamed inputs/outputs in the arg netdefs that are in the external_outputs in the parent netdef 2) and 3) are existing bugs in the `SsaRewrite` function that were just never exposed before. The algorithm for `RemoveOpsByType` is the same as in my previous diff D22220798 (`59294fbbb9`). The only new changes in this diff are in `onnx::SsaRewrite` and a few newly added unit tests. (Note: this ignores all push blocking failures!) Reviewed By: yinghai Differential Revision: D22588652 fbshipit-source-id: ebb68ecd1662ea2bae14d4be8f61a75cd8b7e3e6	2020-07-17 16:06:43 -07:00
Nikita Shulga	349c40507c	Revert "[CircleCI] Delete docker image after testing" (#41601 ) Summary: Per AMD request, this reverts commit 1e64bf4c40ef82d6bc3dcc42b3874353f7632be0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41601 Reviewed By: ezyang Differential Revision: D22603147 Pulled By: malfet fbshipit-source-id: f423d406601383f26ea83a51f1de37e60b53810e	2020-07-17 14:42:27 -07:00
suffian khan	92b95e5243	Fix NCCL version check when nccl.h in non-standard location. (#40982 ) Summary: The NCCL discovery process fails to compile detect_nccl_version.cc when nccl.h resides in a non-standard location. Pass __NCCL_INCLUDE_DIRS__ to _try_run(... detect_nccl_version.cc)_ to fix this. Can reproduce with Dockerfile .. ```Dockerfile FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 as build WORKDIR /stage # install conda ARG CONDA_VERSION=4.7.10 ARG CONDA_URL=https://repo.anaconda.com/miniconda/Miniconda3-${CONDA_VERSION}-Linux-x86_64.sh RUN cd /stage && curl -fSsL --insecure ${CONDA_URL} -o install-conda.sh &&\ /bin/bash ./install-conda.sh -b -p /opt/conda &&\ /opt/conda/bin/conda clean -ya ENV PATH=/opt/conda/bin:${PATH} # install prerequisites RUN conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi # attempt compile ENV CUDA_HOME="/usr/local/cuda" \ CUDNN_LIBRARY="/usr/lib/x86_64-linux-gnu" \ NCCL_INCLUDE_DIR="/usr/local/cuda/include" \ NCCL_LIB_DIR="/usr/local/cuda/lib64" \ USE_SYSTEM_NCCL=1 RUN apt-get -y update &&\ apt-get -y install git &&\ cd /stage && git clone https://github.com/pytorch/pytorch.git &&\ cd pytorch &&\ git submodule update --init --recursive &&\ python setup.py bdist_wheel ``` This generates the following error .. ``` -- Found NCCL: /usr/local/cuda/include -- Determining NCCL version from /usr/local/cuda/include/nccl.h... -- Looking for NCCL_VERSION_CODE -- Looking for NCCL_VERSION_CODE - found CMake Error at cmake/Modules/FindNCCL.cmake:78 (message): Found NCCL header version and library version do not match! (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/libnccl.so) Please set NCCL_INCLUDE_DIR and NCCL_LIB_DIR manually. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/40982 Reviewed By: zou3519 Differential Revision: D22603911 Pulled By: malfet fbshipit-source-id: 084870375a270fb9c7daf3c2e731992a03614ad6	2020-07-17 13:54:17 -07:00
Heitor Schueroff de Souza	cf811d2fb3	retain undefined tensors in backward pass (#41490 ) Summary: Leave undefined tensors / None returned from custom backward functions as undefined/None instead of creating a tensor full of zeros. This change improves performance in some cases. This is BC-Breaking: Custom backward functions that return None will now see it potentially being propagated all the way up to AccumulateGrad nodes. Potential impact is that .grad field of leaf tensors as well as the result of autograd.grad may be undefined/None where it used to be a tensor full of zeros. Also, autograd.grad may raise an error, if so, consider using allow_unused=True ([see doc](https://pytorch.org/docs/stable/autograd.html?highlight=autograd%20grad#torch.autograd.grad)) if it applies to your case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41490 Reviewed By: albanD Differential Revision: D22578241 Pulled By: heitorschueroff fbshipit-source-id: f4966f4cb520069294f8c5c1691eeea799cc0abe	2020-07-17 12:42:50 -07:00
Mike Ruberry	a874c1e584	Adds missing abs to lcm (#41552 ) Summary: lcm was missing an abs. This adds it plus extends the test for NumPy compliance. Also includes a few doc fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41552 Reviewed By: ngimel Differential Revision: D22580997 Pulled By: mruberry fbshipit-source-id: 5ce1db56f88df4355427e1b682fcf8877458ff4e	2020-07-17 12:29:50 -07:00
Thomas Viehmann	0f78e596ba	ROCm: Fix linking of custom ops in load_inline (#41257 ) Summary: Previously we did not link against amdhip64 (roughly equivalent to cudart). Apparently, the recent RTDL_GLOBAL fixes prevent the extensions from finding the symbols needed for launching kernels. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41257 Reviewed By: zou3519 Differential Revision: D22573288 Pulled By: ezyang fbshipit-source-id: 89f9329b2097df26785e2f67e236d60984d40fdd	2020-07-17 12:14:50 -07:00
Rohan Varma	3c862c80cf	Move list size constants for profiler::Event and profiler::ProfilerConfig into (#40474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40474 These constants are unnecessary since there is an enum, and we can add the size at the end of the enum and it will be equal to the list size. I believe that this is the typical pattern used to represent enum sizes. ghstack-source-id: 107969012 Test Plan: CI Reviewed By: ezyang Differential Revision: D22147754 fbshipit-source-id: 7064a897a07f9104da5953c2f87b58179df8ea84	2020-07-17 12:00:18 -07:00
Meghan Lele	fbd960801a	[JIT] Replace use of "whitelist" in lower_tuples pass (#41460 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41460 Test Plan Continuous integration. Fixes This commit partially addresses #41443. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D22544272 Pulled By: SplitInfinity fbshipit-source-id: b46940d1e24f81756daaace260bad7a1feda1e8f	2020-07-17 11:33:14 -07:00
Meghan Lele	c2c2c1c106	[JIT] Remove use of "whitelist" in quantization/helper.cpp (#41459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41459 Test Plan Continuous integration. Fixes This commit partially addresses #41443. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D22544269 Pulled By: SplitInfinity fbshipit-source-id: d4bb7c0c9c71e953677a34f0530b66e5119447d0	2020-07-17 11:33:12 -07:00
Meghan Lele	4f4e3a0f15	[JIT] Replace uses of "whitelist" in jit/_script.py (#41458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41458 Test Plan Continuous integration. Fixes This commit partially fixes #41443. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D22544273 Pulled By: SplitInfinity fbshipit-source-id: 8148e5338f90a5ef19177cf68bf36b56926d5a6c	2020-07-17 11:33:10 -07:00
Meghan Lele	bf0d0900a7	[JIT] Replace uses of "blacklist" in jit/_recursive.py (#41457 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41457 Test Plan Continuous integration. Fixes This commit partially addresses #41443. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D22544274 Pulled By: SplitInfinity fbshipit-source-id: ee74860c48d85d819d46c8b8848960e77bb5013e	2020-07-17 11:33:07 -07:00
Meghan Lele	758edcd7df	[JIT] Replace use of "blacklist" in python/init.cpp (#41456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41456 Test Plan Continuous integration. Fixes This commit partially addresses #41443. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D22544270 Pulled By: SplitInfinity fbshipit-source-id: 649b30e1fcc6516a4def6b148a1da07bc3ce941d	2020-07-17 11:33:05 -07:00
Meghan Lele	c9bdf474d7	[JIT] Replace use of "blacklist" in xnnpack_rewrite (#41455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41455 Test Plan Continuous integration. Fixes This commit partially addresses #41443. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D22544275 Pulled By: SplitInfinity fbshipit-source-id: 5037b16e6ebc9e3b40dd03d2ce5a0671d7867892	2020-07-17 11:33:03 -07:00
Meghan Lele	3b7c05b11b	[JIT] Replace uses of "blacklist" in gen_unboxing_wrappers.py (#41454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41454 Test Plan Continuous integration (if this file is still used). Fixes This commit partially addresses #41443. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D22544271 Pulled By: SplitInfinity fbshipit-source-id: 84a4d552745fe5163b2e3200103c3b1f2a9ffb2a	2020-07-17 11:33:01 -07:00
Meghan Lele	f85a27e100	[JIT] Replace "blacklist" in test_jit.py (#41453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41453 Test Plan `python test/test_jit.py` Fixes This commit partially addresses #41443. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D22544268 Pulled By: SplitInfinity fbshipit-source-id: 8b6b94211a626209c3960fda6c860593148dcbf2	2020-07-17 11:30:27 -07:00
Venkata Chintapalli	43b1923d98	Enable SLS FP32 accumulation SparseLengthsWeightedSumFused8BitRowwiseFakeFP32NNPI Op. (#41577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41577 * Remove skipping test * Use fma_avx_emulation * Increase test examples to 100 (Note: this ignores all push blocking failures!) Test Plan: Tests are covered in test_sls_8bit_nnpi.py Reviewed By: hyuen Differential Revision: D22585742 fbshipit-source-id: e1f62f47eb10b402b11893ffca7a6786e31daa79	2020-07-17 11:19:47 -07:00
Negin Raoof	319b20b7db	[ONNX] Update ORT version (#41372 ) Summary: Update ORT version [1.4 candidate]. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41372 Reviewed By: houseroad Differential Revision: D22580050 Pulled By: bzinodev fbshipit-source-id: c66e3bab865b3221d52eea30db48e0870ae5b681	2020-07-17 11:17:17 -07:00
Negin Raoof	346c69a626	[ONNX] Export embedding_bag (#41234 ) Summary: Enable export of embedding_bag op to ONNX Pull Request resolved: https://github.com/pytorch/pytorch/pull/41234 Reviewed By: houseroad Differential Revision: D22567470 Pulled By: bzinodev fbshipit-source-id: 2fcf74e54f3a9dee4588d7877a4ac9eb6c2a3629	2020-07-17 11:11:43 -07:00
Michael Wootton	7eb71b4beb	Profiler: Do not record zero duration kernel events (#41540 ) Summary: Changes in the ROCm runtime have improved hipEventRecord. The events no longer take ~4 usec to execute on the gpu stream, instead they appear instantaneous. If you record two events, with no other activity in between, then they will have the same timestamp and the elapsed duration will be 0. The profiler uses hip/cuda event pairs to infer gpu execution times. It wraps functions whether they send work to the gpu or not. Functions that send no gpu work will show as having zero duration. Also they will show as running at the same time as neighboring functions. On a trace, all those functions combine into a 'call stack' that can be tens of functions tall (when indeed they should be sequential). This patch suppresses recording the zero duration 'kernel' events, leaving only the CPU execution part. This means functions that do not use the GPU do not get an entry for how long they were using the GPU, which seams reasonable. This fixes the 'stacking' on traces. It also improves the signal to noise of the GPU trace beyond what was available previously. This patch will not effect CUDA or legacy ROCm as those are not able to 'execute' eventRecord markers instantaneously. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41540 Reviewed By: zou3519 Differential Revision: D22597207 Pulled By: albanD fbshipit-source-id: 5e89de2b6d53888db4f9dbcb91a94478cde2f525	2020-07-17 11:03:43 -07:00
Natalia Gimelshein	324c18fcad	fix division by low precision scalar (#41446 ) Summary: Before, inverse for division by scalar is calculated in the precision of the non-scalar operands, which can lead to underflow: ``` >>> x = torch.tensor([3388.]).half().to(0) >>> scale = 524288.0 >>> x.div(scale) tensor([0.], device='cuda:0', dtype=torch.float16) >>> x.mul(1. / scale) tensor([0.0065], device='cuda:0', dtype=torch.float16) ``` This PR makes results of multiplication by inverse and division the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41446 Reviewed By: ezyang Differential Revision: D22542872 Pulled By: ngimel fbshipit-source-id: b60e3244809573299c2c3030a006487a117606e9	2020-07-17 10:41:28 -07:00
Mikhail Zolotukhin	5d7046522b	[JIT] Teach IRPrinter and IRParser to handle 'requires_grad' and 'device' as a part of type info. (#41507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41507 These fields have always been a part of tensor types, this change just makes them serializable through IR dumps. Test Plan: Imported from OSS Reviewed By: Krovatkin, ngimel Differential Revision: D22563661 Pulled By: ZolotukhinM fbshipit-source-id: f01aaa130b7e0005bf1ff21f65827fc24755b360	2020-07-17 10:27:04 -07:00
Sinan Nasir	241bc648c9	Adding missing setting `state_.ptr()` and `hook_.ptr()` to `nullptr`. (#41537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41537 Explicitly setting PyObject* state_ and hook_ to nullptr to prevent py::object's dtor to decref on the PyObject again. Reference PR [#40848](https://github.com/pytorch/pytorch/pull/40848). ghstack-source-id: 107959254 Test Plan: `python test/distributed/test_c10d.py` Reviewed By: zou3519 Differential Revision: D22573858 fbshipit-source-id: 84cc5949a370ffdb4ac3ca7a16a6f0f136563c1c	2020-07-17 10:21:03 -07:00
Heitor Schueroff de Souza	c7798ddf7b	Initial implementation of quantile operator (#39417 ) Summary: Implementing the quantile operator similar to [numpy.quantile](https://numpy.org/devdocs/reference/generated/numpy.quantile.html). For this implementation I'm reducing it to existing torch operators to get free CUDA implementation. It is more efficient to implement multiple quickselect algorithm instead of sorting but this can be addressed in a future PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39417 Reviewed By: mruberry Differential Revision: D22525217 Pulled By: heitorschueroff fbshipit-source-id: 27a8bb23feee24fab7f8c228119d19edbb6cea33	2020-07-17 10:15:57 -07:00
kshitij12345	71fdf748e5	Add `torch.atleast_{1d/2d/3d}` (#41317 ) Summary: https://github.com/pytorch/pytorch/issues/38349 TODO: * [x] Docs * [x] Tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/41317 Reviewed By: ngimel Differential Revision: D22575456 Pulled By: mruberry fbshipit-source-id: cc79f4cd2ca4164108ed731c33cf140a4d1c9dd8	2020-07-17 10:10:41 -07:00
Edward Yang	840ad94ef5	Add reference documentation for torch/library.h (#41470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41470 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D22577426 Pulled By: ezyang fbshipit-source-id: 4bfe5806061e74181a74d161c868acb7c1ecd1e4	2020-07-17 10:05:16 -07:00
Nathan Goldbaum	1e230a5c52	rewrite C++ __torch_function__ handling to work with TensorList operands (#41575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41575 Fixes https://github.com/pytorch/pytorch/issues/34294 This updates the C++ argument parser to correctly handle `TensorList` operands. I've also included a number of updates to the testing infrastructure, this is because we're now doing a much more careful job of testing the signatures of aten kernels, using the type information about the arguments as read in from `Declarations.yaml`. The changes to the tests are required because we're now only checking for `__torch_function__` attributes on `Tensor`, `Optional[Tensor]` and elements of `TensorList` operands, whereas before we were checking for `__torch_function__` on all operands, so the relatively simplistic approach the tests were using before -- assuming all positional arguments might be tensors -- doesn't work anymore. I now think that checking for `__torch_function__` on all operands was a mistake in the original design. The updates to the signatures of the `lambda` functions are to handle this new, more stringent checking of signatures. I also added override support for `torch.nn.functional.threshold` `torch.nn.functional.layer_norm`, which did not yet have python-level support. Benchmarks are still WIP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34725 Reviewed By: mruberry Differential Revision: D22357738 Pulled By: ezyang fbshipit-source-id: 0e7f4a58517867b2e3f193a0a8390e2ed294e1f3	2020-07-17 08:54:29 -07:00
Will Constable	cb9029df9d	Assert valid inner type for OptionalType creation (#41509 ) Summary: Assert in OptionalType::create for valid TypePtr to catch all uses, as well as in python resolver to propagate slightly more helpful error message. Closes https://github.com/pytorch/pytorch/issues/40713. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41509 Reviewed By: suo Differential Revision: D22563710 Pulled By: wconstab fbshipit-source-id: ee6314b1694a55c1ba7c8251260ea120be148b17	2020-07-17 07:22:41 -07:00
Nikolay Korovaiko	e3e58e20cd	enable jit profiling tests on macos (#41550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41550 Reviewed By: SplitInfinity Differential Revision: D22579593 Pulled By: Krovatkin fbshipit-source-id: 3e67bcf418ef266d5416b7fac413e94b1ac1ec7e	2020-07-16 22:55:24 -07:00
Yinghai Lu	eb3bf96f95	During inbatch broadcast, move Tile op after Fused8BitRowwiseQuantizedToFloat if applicable (#41464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41464 If input is int8 rowwise quantized, currently we cannot low it to Glow. And previously, we had some error when running with inbatch broadcast. The main issue is that Tile op doesn't support uint8_t type, which is very easily added here. However, this will result in non-ideal situation that we will leave Tile -> Fused8BitRowwiseQuantizedToFloat on host side, which probably hurt the memory bw a lot. Even we later add the support to Fused8BitRowwiseQuantizedToFloat in Glow, it's still not ideal because we are doing redudant compute on identical columns. So the solution here is to swap the order of Fused8BitRowwiseQuantizedToFloat and Tile to make it Tile -> Fused8BitRowwiseQuantizedToFloat. In this way, it will resolve the error we saw immediately. For the short term, we can still run Tile in card. And for longer term, things runs faster on card. The optimization is a heuristic. If in the net, there isn't such pattern, inbatch broadcast will work as it was before. (Note: this ignores all push blocking failures!) Test Plan: ``` buck test caffe2/caffe2/opt/custom:in_batch_broadcast_test ``` Reviewed By: benjibc Differential Revision: D22544162 fbshipit-source-id: b6dd36a5925a9c8103b80f034e7730a7a085a6ff	2020-07-16 21:25:18 -07:00
Nikita Shulga	5376785a70	Run NO_AVX jobs on CPU (#41565 ) Summary: Delete "nogpu" job since both "AVX" and "AVX2" jobs already act like one Fix naming problem when NO_AVX_NO_AVX2 job and NO_AVX2 jobs were semantically identical, due to the following logic in test.sh: ``` if [[ "${BUILD_ENVIRONMENT}" == -NO_AVX- ]]; then export ATEN_CPU_CAPABILITY=default elif [[ "${BUILD_ENVIRONMENT}" == -NO_AVX2- ]]; then export ATEN_CPU_CAPABILITY=avx fi ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/41565 Reviewed By: seemethere Differential Revision: D22584743 Pulled By: malfet fbshipit-source-id: 783cce60f35947b5d1e8b93901db36371ef78243	2020-07-16 21:21:48 -07:00
Elias Ellison	728fd37d92	[JIT] make fastrnns runnable on cpu (#41483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41483 Reviewed By: gmagogsfm Differential Revision: D22580275 Pulled By: eellison fbshipit-source-id: f2805bc7fa8037cfde7862b005d2940add3ac864	2020-07-16 15:53:39 -07:00
Alban Desmaison	b1d4e33c8b	Revert D22552377: [pytorch][PR] Reland split unsafe version Test Plan: revert-hammer Differential Revision: D22552377 (`5bba973afd`) Original commit changeset: 1d1b713d2429 fbshipit-source-id: 8194458f99bfd5f077b7daa46ca3e81b549adc1b	2020-07-16 15:24:19 -07:00
Colin L Reliability Rice	415ff0bceb	Create lazy_dyndeps to avoid caffe2 import costs. (#41343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41343 Currently caffe2.InitOpLibrary does the dll import uniliaterally. Instead if we make a lazy version and use it, then many pieces of code which do not need the caffe2urrenoperators get a lot faster. One a real test, the import time went from 140s to 68s. 8s. This also cleans up the algorithm slightly (although it makes a very minimal difference), by parsing the list of operators once, rather than every time a new operator is added, since we defer the RefreshCall until after we've imported all the operators. The key way we maintain safety, is that as soon as someone does an operation which requires a operator (or could), we force importing of all available operators. Future work could include trying to identify which code is needed for which operator and only import the needed ones. There may also be wins available by playing with dlmopen (which opens within a namespace), or seeing if the dl flags have an impact (I tried this and didn't see an impact, but dlmopen may make it better). Note that this was previously landed and reverted. The issue was that if a import failed and raised an exception, the specific library would not be removed from the lazy imports. This caused our tests which had libraries that failed to poison all other tests that ran after it. This has been fixed and a unit test has been added for this case (to help make it obvious what failed). Test Plan: I added a new test a lazy_dyndep_test.py (copied from all_compare_test.py). I'm a little concerned that I don't see any explicit tests for dyndep, but this should provide decent coverage. I've added a specific test to handle the poisoning issues mentioned above, which caused the previous version to get reverted. Differential Revision: D22506369 fbshipit-source-id: 7395df4778e8eb0220630c570360b99a7d60eb83	2020-07-16 15:17:41 -07:00
maokaiyu	9ed825746a	Use c10::cuda:: primitives rather than make CUDA runtime calls directly (#41405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41405 Test Plan: Imported from GitHub: all checks have passed {F244195355} The Intern Builds & Tests have 127 success, 5 no signals, and 1 failure. Double check the failed test log file, the failure is result differences: - AssertionError: 0.435608434677124 != 0.4356083869934082 - AssertionError: 0.4393022060394287 != 0.4393021583557129 - AssertionError: 0.44707541465759276 != 0.44707536697387695 These are all very small numerical errors (within 0.0000001). Reviewed By: malfet Differential Revision: D22531486 Pulled By: threekindoms fbshipit-source-id: 21543ec76bb9b502885b5146c8ba5ede719be9ff	2020-07-16 15:11:57 -07:00
Mike Ruberry	a0e58996fb	Makes the use of the term "module" consistent through the serialization note (#41563 ) Summary: module -> torch.nn.Module or ScriptModule, as appropriate. + bonus grammar fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41563 Reviewed By: gchanan Differential Revision: D22584173 Pulled By: mruberry fbshipit-source-id: 8c90f1f9a194bfdb277c97cf02c9b8c1c6ddc601	2020-07-16 14:59:49 -07:00
Nikita Shulga	454cd3ea2e	Fix RocM resource class allocation (#41553 ) Summary: Add Conf.is_test_stage() method to avoid duplicating state in ['test', 'test1', 'test2'] throughout the code Pull Request resolved: https://github.com/pytorch/pytorch/pull/41553 Test Plan: Make sure that in modified config.yml ROCM tests jobs are assigned `pytorch/amd-gpu` resource class Reviewed By: yns88 Differential Revision: D22580471 Pulled By: malfet fbshipit-source-id: 514555f0c0ac94c807bf837ba209560055335587	2020-07-16 14:13:25 -07:00
Yujun Zhao	e324ea85ea	Add tests to logical operation in BinaryOpsKernel.cpp (#41515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41515 add test in atest.cpp to cover logical_and_kernel, logical_or_kernel and logical_nor_kernel in Aten/native/cpu/BinaryOpsKernel.cpp https://pxl.cl/1drmV Test Plan: CI Reviewed By: malfet Differential Revision: D22565235 fbshipit-source-id: 7ad9fd8420d7fdd23fd9a703c75da212f72bde2c	2020-07-16 13:21:57 -07:00
Mike Ruberry	f49d97a848	Notes for lcm and gcd, formatting doc fixes (#41526 ) Summary: A small PR fixing some formatting in lcm, gcd, and the serialization note. Adds a note to lcm and gcd explaining behavior that is not always defined. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41526 Reviewed By: ngimel Differential Revision: D22569341 Pulled By: mruberry fbshipit-source-id: 5f5ff98c0831f65e82b991ef444a5cee8e3c8b5a	2020-07-16 13:15:29 -07:00
Shen Li	86590f226e	Revert D22519869: [pytorch][PR] RandomSampler generates samples one at a time when replacement=True Test Plan: revert-hammer Differential Revision: D22519869 (`09647e1287`) Original commit changeset: be6585002586 fbshipit-source-id: 31ca5ceb24dd0b291f46f427a6f30f1037252a5d	2020-07-16 12:59:10 -07:00
Nikita Shulga	ba6b235461	[RocM] Switch to rocm-3.5.1 image (#41273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41273 Reviewed By: seemethere Differential Revision: D22575277 Pulled By: malfet fbshipit-source-id: 6f43654c8c8c33adbc1de928dd43911931244978	2020-07-16 12:52:17 -07:00
Daiming Yang	09647e1287	RandomSampler generates samples one at a time when replacement=True (#40026 ) Summary: Fix https://github.com/pytorch/pytorch/issues/32530 I used the next() function to generate samples one at a time. To compensate replacement=False, I added a variable called "sample_list" to RandomSampler for random permutation. cc SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/40026 Reviewed By: zhangguanheng66 Differential Revision: D22519869 Pulled By: ezyang fbshipit-source-id: be65850025864d659a713b3bc461b25d6d0048a2	2020-07-16 11:42:32 -07:00
Hongyi Jia	6f5f455c54	[Gloo] alltoall to ProcessGroupGloo (#41424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41424 Adding alltoall to Gloo process group Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest Verified on TSC as well D22141532 Reviewed By: osalpekar Differential Revision: D22451929 fbshipit-source-id: 695c4655c894c85229b16097fa63352ed04523ef	2020-07-16 11:27:26 -07:00
Rohan Varma	1ac4692489	Remove unnecessary test in rpc_test.py (#41218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41218 This test doesn't assert anything and was accidentally committed as part of a larger diff a few months ago. ghstack-source-id: 107882848 Test Plan: CI Reviewed By: ezyang Differential Revision: D22469852 fbshipit-source-id: 0baa23da56b08200e16cf66df514566223dd9b15	2020-07-16 11:23:52 -07:00
Rohan Varma	b5e32528d0	Fix flaky test_udf_remote_message_delay_timeout_to_self (#41217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41217 Fixes this flaky test. Due to the possibility of callback finishCreatingOwnerRRef running after request_callback has processed and created the owner RRef, we could actually end up with 0 owners on the node, since the callback removes from the owners_ map. In this case, shutdown is fine since there are no owners. On the other hand, if the callback runs first, there will be 1 owner which we will delete in shutdown when we detect it has no forks. So either way, shutdown works fine and we don't need to enforce there to be 1 owner. ghstack-source-id: 107883497 Test Plan: Ran the test 500 times with TSAN. Reviewed By: ezyang Differential Revision: D22469806 fbshipit-source-id: 02290d6d5922f91a9e2d5ede21d1cf1c4598cb46	2020-07-16 11:20:56 -07:00
Nikita Shulga	94e4248d80	Split ASAN and ROCM tests into test1 and test2 (#41520 ) Summary: This should reduce end-to-end test runtime for 2 slowest configs Pull Request resolved: https://github.com/pytorch/pytorch/pull/41520 Reviewed By: seemethere Differential Revision: D22575028 Pulled By: malfet fbshipit-source-id: a65bfa5932fcda3cf0f4fdd97bcc7ebb3f54c281	2020-07-16 11:15:03 -07:00
Omkar Salpekar	81e964904e	[Gloo] Tests for Gloo Async Work Wait-level Timeouts (#41265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41265 This PR adds tests for the Async Work wait-level timeouts that were added in the previous PR ghstack-source-id: 107835732 Test Plan: New tests are in this diff - Running on local machine and Sandcastle Reviewed By: jiayisuse Differential Revision: D22470084 fbshipit-source-id: 5552e384d384962e359c5f665e6572df03b6aa63	2020-07-16 10:59:01 -07:00
Omkar Salpekar	b979129cba	[Gloo] Support work-level timeouts in ProcessGroupGloo (#40948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40948 Add work-level timeouts to ProcessGroupGloo. This uses the timeout support in `waitSend` and `waitRecv` functions from Gloo's `unbound_buffer` construct. ghstack-source-id: 107835738 Test Plan: Tests are in the last PR in this stack Reviewed By: jiayisuse Differential Revision: D22173763 fbshipit-source-id: e0493231a23033464708ee2bc0e295d2b087a1c9	2020-07-16 10:58:59 -07:00
Omkar Salpekar	01dcef2e15	[NCCL] Tests for WorkNCCL::wait with Timeouts (#40947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40947 This PR adds tests for work-level timeouts in WorkNCCL objects. We kick off an allgather operation that waits for 1000ms before actually starting computation. We wait on completion of this allgather op with a timeout of 250ms, expecting the operation to timeout and throw a runtime error. ghstack-source-id: 107835734 Test Plan: This diff added tests - checking CI/Sandcastle for correctness. These are NCCL tests so they require at least 2 GPUs to run. Reviewed By: jiayisuse Differential Revision: D22173101 fbshipit-source-id: 8595e4b67662cef781b20ced0befdcc53d157c39	2020-07-16 10:58:56 -07:00
Omkar Salpekar	edf3dc73f2	[NCCL] Support Wait Timeout in ProcessGroupNCCL (#40946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40946 Adds timeout to ProcessGroupNCCL::wait. Currently, WorkNCCL objects already have a timeout set during ProcessGroupNCCL construction. The new wait function will override the existing timeout with the user-defined timeout if one is provided. Timed out operations result in NCCL communicators being aborted and an exception being thrown. ghstack-source-id: 107835739 Test Plan: Test added to `ProcessGroupNCCLTest` in the next PR in this stack. Reviewed By: jiayisuse Differential Revision: D22127898 fbshipit-source-id: 543964855ac5b41e464b2df4bb6c211ef053e73b	2020-07-16 10:58:54 -07:00
Omkar Salpekar	9d92fa2679	[NCCL] Add timeout to ProcessGroup Work Wait (#40944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40944 This stack adds Work-level timeout for blocking wait. This PR just changes the API to accept a default wait arg for the wait function in each ProcessGroup backend. The ProcessGroup superclass correctly waits for the given timeout by changing the CV wait to wait_for. Closes: https://github.com/pytorch/pytorch/issues/37571 ghstack-source-id: 107835735 Test Plan: Tests in 4th PR in this stack Reviewed By: jiayisuse Differential Revision: D22107135 fbshipit-source-id: b38c07cb5e79e6c86c205e580336e7918ed96501	2020-07-16 10:56:58 -07:00
Mike Ruberry	fef30220fd	Runs CUDA test_istft_of_sine on CUDA (#41523 ) Summary: The test was always running on the CPU. This actually caused it to throw an error on non-MKL builds, since the CUDA test (which ran on the CPU) tried to execute but the test requires MKL (a requirement only checked for the CPU variant of the test). Fixes https://github.com/pytorch/pytorch/issues/41402. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41523 Reviewed By: ngimel Differential Revision: D22569344 Pulled By: mruberry fbshipit-source-id: e9908c0ed4b5e7b18cc7608879c6213fbf787da2	2020-07-16 10:43:51 -07:00
Mike Ruberry	b2b8af9645	Removes assertAlmostEqual (#41514 ) Summary: This test function is confusing since our `assertEqual` behavior allows for tolerance to be specified, and this is a redundant mechanism. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41514 Reviewed By: ngimel Differential Revision: D22569348 Pulled By: mruberry fbshipit-source-id: 2b2ff8aaa9625a51207941dfee8e07786181fe9f	2020-07-16 10:35:12 -07:00
Facebook Community Bot	58244a9586	Automated submodule update: FBGEMM (#40332 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `73ea1f5828` Pull Request resolved: https://github.com/pytorch/pytorch/pull/40332 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: gchanan, yns88 Differential Revision: D22150737 fbshipit-source-id: fe7e6787adef9e2fedee5d1a0a1e57bc4760b88c	2020-07-16 10:32:39 -07:00
Zhang, Xiaobing	2b14f2d368	[reland][DNNL]:enable max_pool3d and avg_pool3d (#40996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40996 Test Plan: Imported from OSS Differential Revision: D22440766 Pulled By: VitalyFedyunin fbshipit-source-id: 242711612920081eb4a7e5a7e80bc8b2d4c9f978	2020-07-16 10:26:45 -07:00
albanD	45c5bac870	[WIP] Fix cpp grad accessor API (#40887 ) Summary: Update the API to access grad in cpp to avoid unexpected thread safety issues. In particular, with the current API, a check like `t.grad().defined()` is not thread safe. - This introduces `t.mutable_grad()` that should be used when getting a mutable version of the saved gradient. This function is not thread safe. - The `Tensor& grad()` API is now removed. We could not do a deprecation cycle as most of our call side use non-const Tensors that use the non-const overload. This would lead to most calls hitting the warning. This would be too verbose for all the users. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40887 Reviewed By: ezyang Differential Revision: D22343932 Pulled By: albanD fbshipit-source-id: d5eb909bb743bc20caaf2098196e18ca4110c5d2	2020-07-16 09:11:12 -07:00
Wojciech Baranowski	5bba973afd	Reland split unsafe version (#41484 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/39299 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41484 Reviewed By: glaringlee Differential Revision: D22552377 Pulled By: albanD fbshipit-source-id: 1d1b713d2429ae162e04bda845ef0838c52df789	2020-07-16 09:01:45 -07:00
anjali411	b9442bb03e	Doc note for complex (#41252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41252 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D22553266 Pulled By: anjali411 fbshipit-source-id: f6dc409da048496d72b29b0976dfd3dd6645bc4d	2020-07-16 08:53:27 -07:00
Hector Yuen	d80e0c62be	fix dequantization to match nnpi (#41505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41505 fix the dequantization to match the fixes from quantization Test Plan: test is not conclusive, since only comparing emulation with reference collected from Amy's run running an evaluation workflow at the moment Reviewed By: venkatacrc Differential Revision: D22558092 fbshipit-source-id: 3ff00ea15eac76007e194659c3b4949f07ff02a4	2020-07-16 00:40:57 -07:00
Hector Yuen	26790fb26d	fix quantization mechanism to match nnpi (#41494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41494 revert back to the changes from amylittleyang to make quantization work Test Plan: ran against a dump from ctr_instagram, and verified that: -nnpi and fakelowp match bitwise -nnpi is different at most by 1 vs fbgemm, most likely due to the type of rounding Reviewed By: venkatacrc Differential Revision: D22555276 fbshipit-source-id: 7074521d181f15ef6270985bb71c4b44d25d1c30	2020-07-16 00:40:55 -07:00
Hector Yuen	e6859ec78f	resurrect single quantization op test (#41476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41476 deleted this test by default, re-adding it in its own file to make it more explicit Test Plan: ran the test Reviewed By: yinghai Differential Revision: D22550217 fbshipit-source-id: 758e279b2bab3b23452a3d0ce75fb366f7afb7be	2020-07-16 00:37:46 -07:00
Nikolay Korovaiko	04c0f2e3cc	enable TE on windows (#41501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41501 Reviewed By: ZolotukhinM Differential Revision: D22563872 Pulled By: Krovatkin fbshipit-source-id: 2b5730017b34af27800cc03f3ba62f1cc8b4f240	2020-07-15 23:00:05 -07:00
Lu Fang	b2e52186b9	Rename capacity to nbytes in ShareExternalPointer to avoid confusion in future (#41461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41461 capacity is misleading, and we have many wrong uses internally. Let's rename to nbytes to avoid the confusion in future. Ultimately, we could remove this parameter if possible. So far I haven't seen any case this capacity is necessary. Test Plan: oss ci Differential Revision: D22544189 fbshipit-source-id: f310627f2ab8f4ebb294e0dd5eabc380926991eb	2020-07-15 22:04:18 -07:00
Nikita Shulga	702140758f	Move GLOG_ constants into c10 namespace (#41504 ) Summary: Declaring GLOG_ constants in google namespace causes a conflict in C++ project that uses GLOG and links with LibPyTorch compiled without GLOG. For example, see https://github.com/facebookresearch/ReAgent/issues/288 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41504 Reviewed By: kaiwenw Differential Revision: D22564308 Pulled By: malfet fbshipit-source-id: 2167bd2c6124bd14a67cc0a1360521d3c375e3c2	2020-07-15 21:56:00 -07:00
Hongyi Jia	f27e395a4a	[Gloo] update gloo submodule for PyTorch (#41462 ) Summary: To include alltoall Pull Request resolved: https://github.com/pytorch/pytorch/pull/41462 Test Plan: CI Reviewed By: osalpekar Differential Revision: D22544255 Pulled By: jiayisuse fbshipit-source-id: ad55a50a31e5e5affaf3e14e2401d38f99657dc9	2020-07-15 21:50:08 -07:00
Hao Wu	1fb2a7e5a2	onnx export of fake quantize functions (#39738 ) Summary: As discussed in https://github.com/pytorch/pytorch/issues/39502. This PR adds support for exporting `fake_quantize_per_tensor_affine` to a pair of `QuantizeLinear` and `DequantizeLinear`. Exporting `fake_quantize_per_channel_affine` to ONNX depends on https://github.com/onnx/onnx/pull/2772. will file another PR once ONNX merged the change. It will generate ONNX graph like this: ![image](https://user-images.githubusercontent.com/1697840/84180123-ddd90080-aa3b-11ea-81d5-eaf6f5f26715.png) jamesr66a Pull Request resolved: https://github.com/pytorch/pytorch/pull/39738 Reviewed By: hl475 Differential Revision: D22517911 Pulled By: houseroad fbshipit-source-id: e998b4012e11b0f181b193860ff6960069a91d70	2020-07-15 21:20:23 -07:00
Martin Yuan	7a33d8b001	[PyTorch Mobile] Modularize the autograd source files shared by mobile and full-jit (#41430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41430 To avoid duplication at compile time, modularize the common autograd files used by both mobile and full-jit. ghstack-source-id: 107742889 Test Plan: CI Reviewed By: kwanmacher Differential Revision: D22531358 fbshipit-source-id: 554f10be89b7ed59c9bde13387a0e1b08000c116	2020-07-15 21:14:47 -07:00
Xiang Gao	23174ca71b	[reland] Enable TF32 support for cuBLAS (#41498 ) Summary: fix rocm Pull Request resolved: https://github.com/pytorch/pytorch/pull/41498 Reviewed By: mruberry Differential Revision: D22560572 Pulled By: ngimel fbshipit-source-id: 5ee79e96cb29e70d9180830d058efb53d1c6c041	2020-07-15 21:00:55 -07:00
Aayush Naik	200c343184	Implement gcd, lcm (#40651 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/40018. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40651 Reviewed By: ezyang Differential Revision: D22511828 Pulled By: mruberry fbshipit-source-id: 3ef251e45da4688b1b64c79f530fb6642feb63ab	2020-07-15 20:56:23 -07:00
Cloud Han	e44f460079	[jit] Fix jit not round to even if const is folded (#40897 ) Summary: Fixed https://github.com/pytorch/pytorch/issues/40771 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40897 Reviewed By: Krovatkin Differential Revision: D22543261 Pulled By: gmagogsfm fbshipit-source-id: 0bd4b1d910a42d5aa87e120c81acfdfb7ca895fa	2020-07-15 20:13:12 -07:00
Hong Xu	1770937c9c	Restore the contiguity preprocessing of linspace (#41286 ) Summary: The contiguity preprocessing was mistakenly removed in cd48fb503088af2c00884f1619db571fffbcdafa . It causes erroneous output when the output tensor is not contiguous. Here we restore this preprocessing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41286 Reviewed By: zou3519 Differential Revision: D22550822 Pulled By: ezyang fbshipit-source-id: ebad4e2ba83d2d808e3f958d4adc9a5513a95bec	2020-07-15 20:02:16 -07:00
morrme	d90fb72b5a	remove use of the term "blacklist" from docs/cpp/source/Doxyfile (#41450 ) Summary: As requested in https://github.com/pytorch/pytorch/issues/41443 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41450 Reviewed By: ezyang Differential Revision: D22561782 Pulled By: SplitInfinity fbshipit-source-id: b38ab5e2725735d1f0c70a4d0012678636e992c3	2020-07-15 19:45:53 -07:00
peter	404799d43f	Disable failed caffe2 tests for BoundShapeInference on Windows (#41472 ) Summary: Related: https://github.com/pytorch/pytorch/issues/40861 https://github.com/pytorch/pytorch/issues/41471 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41472 Reviewed By: yns88 Differential Revision: D22562385 Pulled By: malfet fbshipit-source-id: aebc600915342b984f4fc47cef0a1e79d8965c10	2020-07-15 19:39:45 -07:00
Mike Ruberry	60f2fa6a84	Updates serialization note to explain versioned symbols and dynamic versioning (#41395 ) Summary: Doc update intended to clarify and expand our current serialization behavior, including explaining the difference between torch.save/torch.load, torch.nn.Module.state_dict/torch.nn.Module.load_state_dict, and torch.jit.save/torch.jit.load. Also explains, for the time, when historic serialized Torchscript behavior is preserved and our recommendation for preserving behavior (using the same PyTorch version to consume a model as produced it). Pull Request resolved: https://github.com/pytorch/pytorch/pull/41395 Reviewed By: ngimel Differential Revision: D22560538 Pulled By: mruberry fbshipit-source-id: dbc2f1bb92ab61ff2eca4888febc21f7dda76ba1	2020-07-15 19:05:19 -07:00
Yuxin Wu	488ee3790e	Support @torch.jit.unused on a @torch.no_grad decorated function (#41496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41496 use the wrapped function (instead of the wrapper) to obtain argument names Test Plan: ``` buck test mode/dev-nosan //caffe2/test:jit -- 'test_unused_decorator $test_jit\.TestScript$' ``` Before: ``` > Traceback (most recent call last): > File "/data/users/yuxinwu/fbsource2/fbcode/buck-out/dev/gen/caffe2/test/jit#binary,link-tree/test_jit.py", line 3014, in test_unused_decorator > torch.jit.script(MyMod()) > File "/data/users/yuxinwu/fbsource2/fbcode/buck-out/dev/gen/caffe2/test/jit#binary,link-tree/torch/jit/_script.py", line 888, in script > obj, torch.jit._recursive.infer_methods_to_compile > File "/data/users/yuxinwu/fbsource2/fbcode/buck-out/dev/gen/caffe2/test/jit#binary,link-tree/torch/jit/_recursive.py", line 317, in create_script_module > return create_script_module_impl(nn_module, concrete_type, stubs_fn) > File "/data/users/yuxinwu/fbsource2/fbcode/buck-out/dev/gen/caffe2/test/jit#binary,link-tree/torch/jit/_recursive.py", line 376, in create_script_module_impl > create_methods_from_stubs(concrete_type, stubs) > File "/data/users/yuxinwu/fbsource2/fbcode/buck-out/dev/gen/caffe2/test/jit#binary,link-tree/torch/jit/_recursive.py", line 292, in create_methods_from_stubs > concrete_type._create_methods(defs, rcbs, defaults) > RuntimeError: > Non-static method does not have a self argument: > File "/data/users/yuxinwu/fbsource2/fbcode/buck-out/dev/gen/caffe2/test/jit#binary,link-tree/test_jit.py", line 3012 > def forward(self, x): > return self.fn(x) > ~~~~~~~ <--- HERE > ``` Reviewed By: eellison Differential Revision: D22554479 fbshipit-source-id: 03e432ea92ed973cc57ff044da80ae7a36f6af4c	2020-07-15 16:54:43 -07:00
Steve Sylvain	71c3b397a6	Reduce Image Size (2) (#41301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41301 Reviewed By: malfet Differential Revision: D22559626 Pulled By: ssylvain fbshipit-source-id: 32da88b7efe2e8d134f74b6ff2dff0bffede012c	2020-07-15 16:47:15 -07:00
morrme	5bd71259ed	remove blacklist reference (#41447 ) Summary: Reference : issue https://github.com/pytorch/pytorch/issues/41443 Removed blacklist reference Pull Request resolved: https://github.com/pytorch/pytorch/pull/41447 Reviewed By: ezyang Differential Revision: D22542428 Pulled By: SplitInfinity fbshipit-source-id: 09728c7718bb99ff56b16fda6971ebd887a99c97	2020-07-15 16:25:12 -07:00
Paul Shao	b7147fe6d7	Learnable Fake Quantizer Benchmark Test (#41429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41429 This diff contains the benchmark test to evaluate the speed of executing the learnable fake quantization operator, both in the forward path and the backward path, with respect to both per tensor and per channel usages. Test Plan: Inside the path `torch/benchmarks/operator_benchmark` (The root directory will be `caffe2` inside `fbcode` if working on a devvm): - On a devvm, run the command `buck run pt:fake_quantize_learnable_test` - On a personal laptop, run the command `python3 -m pt.fake_quantize_learnable_test` Benchmark Results (Locally on CPU): Each sample has dimensions 3x256x256; Each batch has 16 samples (`N=16`) - Per Tensor Forward: 0.023688 sec/sample - Per Tensor Backward: 0.165926 sec/sample - Per Channel Forward: 0.040432 sec / sample - Per Channel Backward: 0.173528 sec / sample Reviewed By: vkuzo Differential Revision: D22535252 fbshipit-source-id: e8e953ff2de2107c6f2dde4c8d5627bdea67ef7f	2020-07-15 14:00:20 -07:00
Zhang, Xiaobing	2b8db35c7e	[reland][DNNL]:enable batchnorm3d (#40995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40995 Test Plan: Imported from OSS Differential Revision: D22440765 Pulled By: VitalyFedyunin fbshipit-source-id: b4bf427bbb7010ee234a54e81ade371627f9e82c	2020-07-15 13:56:47 -07:00
Zhang, Xiaobing	b48ee175e6	[reland][DNNL]:enable conv3d (#40691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40691 Test Plan: Imported from OSS Differential Revision: D22296548 Pulled By: VitalyFedyunin fbshipit-source-id: 8e2a7cf14e8bdfa2f29b735a89e8c83f6119e68d	2020-07-15 13:54:41 -07:00
Pritam Damania	ff6e560301	Add C++ end to end test for RPC and distributed autograd. (#36893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36893 Adding an end to end test for running a simple training loop in C++ for the distributed RPC framework. The goal of this change is to enable LeakSanitizer and potentially catch memory leaks in the Future. Enabling LSAN with python multiprocessing is tricky and we haven't found a solution for this. As a result, adding a C++ test that triggers most of the critical codepaths would be good for now. As an example, this unit test would've caught the memory leak fixed by: https://github.com/pytorch/pytorch/pull/31030 ghstack-source-id: 107781167 Test Plan: 1) Verify the test catches memory leaks. 2) waitforbuildbot Reviewed By: mrshenli Differential Revision: D21112208 fbshipit-source-id: 4eb2a6b409253108f6b6e14352e593d250c7a64d	2020-07-15 12:59:19 -07:00
Xiang Gao	8940a4e684	Pull upstream select_compute_arch from cmake for Ampere (#41133 ) Summary: This pulls the following merge requests from CMake upstream: - https://gitlab.kitware.com/cmake/cmake/-/merge_requests/4979 - https://gitlab.kitware.com/cmake/cmake/-/merge_requests/4991 The above two merge requests improve the Ampere build: - If `TORCH_CUDA_ARCH_LIST` is not set, it can now automatically pickup 8.0 as its part of its default value - If `TORCH_CUDA_ARCH_LIST=Ampere`, it no longer fails with `Unknown CUDA Architecture Name Ampere in CUDA_SELECT_NVCC_ARCH_FLAGS` Codes related to architecture < 3.5 are manually removed because PyTorch no longer supports it. cc: ngimel ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/41133 Reviewed By: malfet Differential Revision: D22540547 Pulled By: ezyang fbshipit-source-id: 6e040f4054ef04f18ebb7513497905886a375632	2020-07-15 12:53:32 -07:00
Paul Shao	c62550e3f4	Cuda Support for Learnable Fake Quantize Per Channel (GPU) (#41262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41262 In this diff, implementation is provided to support the GPU kernel running the learnable fake quantize per tensor kernels. Test Plan: On a devvm, run `buck test //caffe2/test:quantization -- learnable` to test both the forward and backward for the learnable per tensor fake quantize kernels. The test will test the `cuda` version if a gpu is available. Reviewed By: vkuzo Differential Revision: D22478832 fbshipit-source-id: 2731bd8b57bc83416790f6d65ef42d450183873c	2020-07-15 12:23:43 -07:00
Paul Shao	4367a73399	Cuda Support for Learnable Fake Quantize Per Tensor (GPU) (#41127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41127 In this diff, implementation is provided to support the GPU kernel running the learnable fake quantize per tensor kernels. Test Plan: On a devvm, run `buck test //caffe2/test:quantization -- learnable` to test both the forward and backward for the learnable per tensor fake quantize kernels. The test will test the `cuda` version if a gpu is available. Reviewed By: z-a-f Differential Revision: D22435037 fbshipit-source-id: 515afde13dd224d21fd47fb7cb027ee8d704cbdd	2020-07-15 12:21:48 -07:00
Venkata Chintapalli	225289abc6	Adding epsilon input argument to the Logit Op Summary: Adding epsilon input argument to the Logit Op Test Plan: Added test_logit test case. Reviewed By: hyuen Differential Revision: D22537133 fbshipit-source-id: d6f89afd1589fda99f09550a9d1b850cfc0b9ee1	2020-07-15 12:16:19 -07:00
Shen Li	954c260061	Revert D22480638: [pytorch][PR] Add non-deterministic alert to CUDA operations that use `atomicAdd()` Test Plan: revert-hammer Differential Revision: D22480638 (`6ff306b8b5`) Original commit changeset: 4cc913cb3ca6 fbshipit-source-id: e47fa14b5085bb2b74a479bd0830efc2d7604eea	2020-07-15 12:10:05 -07:00
Supriya Rao	008ab27b22	[quant][pyper] Add embedding_bag weight quantize and dequantize ops (#41293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41293 Add new operators that does quantize and packing for 8 bit and 4 bit embedding bag operators. This is an initial change to help unblock testing. This will be follwed by adding graph mode passes to enable quantization of embedding_bag module Note to reviewers: Future PRs will replace this op with a separate quantize and pack operator and add support for floating point scale and zero point. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Imported from OSS Reviewed By: vkuzo Differential Revision: D22506700 fbshipit-source-id: 090cc85a8f56da417e4b7e45818ea987ae97ca8a	2020-07-15 11:34:53 -07:00
Sinan Nasir	d5ae4a07ef	DDP Communication Hook Main Structure (#40848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40848 Sub-tasks 1 and 2 of [39272](https://github.com/pytorch/pytorch/issues/39272) ghstack-source-id: 107787878 Test Plan: 1\. Perf tests to to validate new code (if conditions before `allreduce`) doesn't slow down today's DDP. Execute the following command with diff patched/unpatched (with V25): * Unpatched Runs: ``` hg checkout D22514243 flow-cli canary pytorch.benchmark.main.workflow --parameters-json '{"model_arch": "resnet50", "batch_size": 32, "world_size": 1, "use_fp16": false, "print_percentile": true, "backend": "gloo"}' --entitlement pytorch_ftw_gpu --name test_torchelastic_gloo_masterD22514243 --run-as-secure-group pytorch_distributed ``` * Run 1 (unpatched): `elastic_gang:benchmark_single.elastic_operator` Ran for 2 mins 59 s f204539235 ``` sum: 8 GPUs: p25: 0.156 205/s p50: 0.160 200/s p75: 0.164 194/s p90: 0.169 189/s p95: 0.173 185/s fwds: 8 GPUs: p25: 0.032 1011/s p50: 0.032 1006/s p75: 0.032 1000/s p90: 0.032 992/s p95: 0.033 984/s bwds: 8 GPUs: p25: 0.121 265/s p50: 0.125 256/s p75: 0.129 248/s p90: 0.134 239/s p95: 0.137 232/s opts: 8 GPUs: p25: 0.003 11840/s p50: 0.003 11550/s p75: 0.004 8037/s p90: 0.006 5633/s p95: 0.007 4631/s ``` * Run 2 (unpatched): `elastic_gang:benchmark_single.elastic_operator` Ran for 3 mins 1 s f204683840 ``` sum: 8 GPUs: p25: 0.145 220/s p50: 0.147 217/s p75: 0.150 213/s p90: 0.154 207/s p95: 0.157 204/s fwds: 8 GPUs: p25: 0.032 1015/s p50: 0.032 1009/s p75: 0.032 1002/s p90: 0.032 994/s p95: 0.032 990/s bwds: 8 GPUs: p25: 0.107 297/s p50: 0.111 288/s p75: 0.115 278/s p90: 0.119 268/s p95: 0.122 262/s opts: 8 GPUs: p25: 0.003 11719/s p50: 0.004 9026/s p75: 0.006 5160/s p90: 0.009 3700/s p95: 0.010 3184/s ``` * Patched Runs: ``` hg checkout D22328310 flow-cli canary pytorch.benchmark.main.workflow --parameters-json '{"model_arch": "resnet50", "batch_size": 32, "world_size": 1, "use_fp16": false, "print_percentile": true, "backend": "gloo"}' --entitlement pytorch_ftw_gpu --name test_torchelastic_gloo_localD22328310 --run-as-secure-group pytorch_distributed ``` * Run 1 (patched): `elastic_gang:benchmark_single.elastic_operator` Ran for 3 mins 30 s f204544541 ``` sum: 8 GPUs: p25: 0.148 216/s p50: 0.152 210/s p75: 0.156 205/s p90: 0.160 200/s p95: 0.163 196/s fwds: 8 GPUs: p25: 0.032 1011/s p50: 0.032 1005/s p75: 0.032 999/s p90: 0.032 991/s p95: 0.033 984/s bwds: 8 GPUs: p25: 0.112 286/s p50: 0.116 275/s p75: 0.120 265/s p90: 0.125 256/s p95: 0.128 250/s opts: 8 GPUs: p25: 0.003 11823/s p50: 0.003 10948/s p75: 0.004 7225/s p90: 0.007 4905/s p95: 0.008 3873/s ``` * Run 2 (patched): `elastic_gang:benchmark_single.elastic_operator` Ran for 3 mins 14 s f204684520 ``` sum: 8 GPUs: p25: 0.146 219/s p50: 0.147 217/s p75: 0.150 214/s p90: 0.152 210/s p95: 0.153 208/s fwds: 8 GPUs: p25: 0.032 1013/s p50: 0.032 1008/s p75: 0.032 1002/s p90: 0.032 996/s p95: 0.032 990/s bwds: 8 GPUs: p25: 0.107 299/s p50: 0.110 290/s p75: 0.114 280/s p90: 0.117 274/s p95: 0.119 269/s opts: 8 GPUs: p25: 0.003 11057/s p50: 0.005 6490/s p75: 0.008 4110/s p90: 0.010 3309/s p95: 0.010 3103/s ``` * Run 3 (patched): `elastic_gang:benchmark_single.elastic_operator` Ran for 2 mins 54 s f204692872 ``` sum: 8 GPUs: p25: 0.145 220/s p50: 0.147 217/s p75: 0.150 213/s p90: 0.154 207/s p95: 0.156 204/s fwds: 8 GPUs: p25: 0.032 1001/s p50: 0.032 995/s p75: 0.032 988/s p90: 0.033 980/s p95: 0.033 973/s bwds: 8 GPUs: p25: 0.108 295/s p50: 0.111 287/s p75: 0.114 280/s p90: 0.119 269/s p95: 0.121 264/s opts: 8 GPUs: p25: 0.003 11706/s p50: 0.003 9257/s p75: 0.005 6333/s p90: 0.008 4242/s p95: 0.009 3554/s ``` * Memory: * Unpatched: ``` CUDA Memory Summary After first iteration: \|===========================================================================\| \| PyTorch CUDA memory summary, device ID 0 \| \|---------------------------------------------------------------------------\| \| CUDA OOMs: 0 \| cudaMalloc retries: 0 \| \|===========================================================================\| \| Metric \| Cur Usage \| Peak Usage \| Tot Alloc \| Tot Freed \| \|---------------------------------------------------------------------------\| \| Allocated memory \| 428091 KB \| 2892 MB \| 9825 MB \| 9407 MB \| \| from large pool \| 374913 KB \| 2874 MB \| 9752 MB \| 9386 MB \| \| from small pool \| 53178 KB \| 52 MB \| 73 MB \| 21 MB \| \|---------------------------------------------------------------------------\| \| Active memory \| 428091 KB \| 2892 MB \| 9825 MB \| 9407 MB \| \| from large pool \| 374913 KB \| 2874 MB \| 9752 MB \| 9386 MB \| \| from small pool \| 53178 KB \| 52 MB \| 73 MB \| 21 MB \| \|---------------------------------------------------------------------------\| \| GPU reserved memory \| 3490 MB \| 3490 MB \| 3490 MB \| 0 B \| \| from large pool \| 3434 MB \| 3434 MB \| 3434 MB \| 0 B \| \| from small pool \| 56 MB \| 56 MB \| 56 MB \| 0 B \| \|---------------------------------------------------------------------------\| \| Non-releasable memory \| 315332 KB \| 343472 KB \| 2295 MB \| 1987 MB \| \| from large pool \| 311166 KB \| 340158 KB \| 2239 MB \| 1935 MB \| \| from small pool \| 4166 KB \| 4334 KB \| 56 MB \| 52 MB \| \|---------------------------------------------------------------------------\| \| Allocations \| 704 \| 705 \| 1390 \| 686 \| \| from large pool \| 60 \| 131 \| 395 \| 335 \| \| from small pool \| 644 \| 645 \| 995 \| 351 \| \|---------------------------------------------------------------------------\| \| Active allocs \| 704 \| 705 \| 1390 \| 686 \| \| from large pool \| 60 \| 131 \| 395 \| 335 \| \| from small pool \| 644 \| 645 \| 995 \| 351 \| \|---------------------------------------------------------------------------\| \| GPU reserved segments \| 102 \| 102 \| 102 \| 0 \| \| from large pool \| 74 \| 74 \| 74 \| 0 \| \| from small pool \| 28 \| 28 \| 28 \| 0 \| \|---------------------------------------------------------------------------\| \| Non-releasable allocs \| 34 \| 54 \| 430 \| 396 \| \| from large pool \| 15 \| 48 \| 208 \| 193 \| \| from small pool \| 19 \| 19 \| 222 \| 203 \| \|===========================================================================\| ``` * Patched: ``` CUDA Memory Summary After first iteration: \|===========================================================================\| \| PyTorch CUDA memory summary, device ID 0 \| \|---------------------------------------------------------------------------\| \| CUDA OOMs: 0 \| cudaMalloc retries: 0 \| \|===========================================================================\| \| Metric \| Cur Usage \| Peak Usage \| Tot Alloc \| Tot Freed \| \|---------------------------------------------------------------------------\| \| Allocated memory \| 428091 KB \| 2892 MB \| 9825 MB \| 9407 MB \| \| from large pool \| 374913 KB \| 2874 MB \| 9752 MB \| 9386 MB \| \| from small pool \| 53178 KB \| 52 MB \| 73 MB \| 21 MB \| \|---------------------------------------------------------------------------\| \| Active memory \| 428091 KB \| 2892 MB \| 9825 MB \| 9407 MB \| \| from large pool \| 374913 KB \| 2874 MB \| 9752 MB \| 9386 MB \| \| from small pool \| 53178 KB \| 52 MB \| 73 MB \| 21 MB \| \|---------------------------------------------------------------------------\| \| GPU reserved memory \| 3490 MB \| 3490 MB \| 3490 MB \| 0 B \| \| from large pool \| 3434 MB \| 3434 MB \| 3434 MB \| 0 B \| \| from small pool \| 56 MB \| 56 MB \| 56 MB \| 0 B \| \|---------------------------------------------------------------------------\| \| Non-releasable memory \| 315332 KB \| 343472 KB \| 2295 MB \| 1987 MB \| \| from large pool \| 311166 KB \| 340158 KB \| 2239 MB \| 1935 MB \| \| from small pool \| 4166 KB \| 4334 KB \| 56 MB \| 52 MB \| \|---------------------------------------------------------------------------\| \| Allocations \| 704 \| 705 \| 1390 \| 686 \| \| from large pool \| 60 \| 131 \| 395 \| 335 \| \| from small pool \| 644 \| 645 \| 995 \| 351 \| \|---------------------------------------------------------------------------\| \| Active allocs \| 704 \| 705 \| 1390 \| 686 \| \| from large pool \| 60 \| 131 \| 395 \| 335 \| \| from small pool \| 644 \| 645 \| 995 \| 351 \| \|---------------------------------------------------------------------------\| \| GPU reserved segments \| 102 \| 102 \| 102 \| 0 \| \| from large pool \| 74 \| 74 \| 74 \| 0 \| \| from small pool \| 28 \| 28 \| 28 \| 0 \| \|---------------------------------------------------------------------------\| \| Non-releasable allocs \| 34 \| 54 \| 431 \| 397 \| \| from large pool \| 15 \| 48 \| 208 \| 193 \| \| from small pool \| 19 \| 19 \| 223 \| 204 \| \|===========================================================================\| ``` 2\. As of v18: `python test/distributed/test_c10d.py` ``` ....................s.....s.....................................................s................................ ---------------------------------------------------------------------- Ran 114 tests in 215.983s OK (skipped=3) ``` 3\. Additional tests in `python test/distributed/test_c10d.py`: * `test_ddp_comm_hook_future_passing_cpu`: This unit test verifies whether the Future object is passed properly. The callback function creates a Future object and sets a value to it. * `_test_ddp_comm_hook_future_passing_gpu`: This unit test verifies whether the Future object is passed properly. The callback function creates a Future object and sets a value to it. * `test_ddp_comm_hook_future_passing_gpu_gloo`: This unit test executes _test_ddp_comm_hook_future_passing_gpu using gloo backend. * `test_ddp_comm_hook_future_passing_gpu_nccl`: This unit test executes _test_ddp_comm_hook_future_passing_gpu using nccl backend. * `test_ddp_invalid_comm_hook_init`: This unit test makes sure that register_comm_hook properly checks the format of hook defined by user. The Python hook must be callable. This test also checks whether bucket annotation checked properly if defined. * `test_ddp_invalid_comm_hook_return_type`: This test checks whether return annotation checked properly if defined. It also checks whether an internal error is thrown if return type is incorrect and user hasn't specified any return type annotation. * `test_ddp_comm_hook_register_just_once`: DDP communication hook can only be registered once. This test validates whether the error is thrown properly when register_comm_hook is called more than once. Reviewed By: ezyang Differential Revision: D22328310 fbshipit-source-id: 77a6a71808e7b6e947795cb3fcc68c8c8f024549	2020-07-15 11:25:29 -07:00
Anush Elangovan	c86699d425	[cmake] Use PROJECT_SOURCE_DIR instead of CMAKE_* (#41387 ) Summary: Add support for including pytorch via an add_subdirectory() This requires using PROJECT_* instead of CMAKE_* which refer to the top-most project including pytorch. TEST=add_subdirectory() into a pytorch checkout and build. There are still some hardcoded references to TORCH_SRC_DIR, I will fix in a follow on commit. For now you can create a symlink to <pytorch>/torch/ in your project. Change-Id: Ic2a8aec3b08f64e2c23d9e79db83f14a0a896abc Pull Request resolved: https://github.com/pytorch/pytorch/pull/41387 Reviewed By: zhangguanheng66 Differential Revision: D22539944 Pulled By: ezyang fbshipit-source-id: b7e9631021938255f0a6ea897a7abb061759093d	2020-07-15 11:09:05 -07:00
Alexander Grund	563b60b890	Fix flaky test_stream_event_nogil due to missing event sync (#41398 ) Summary: The test asserts that the stream is "ready" but doesn't wait for the event to be "executed" which makes it fail on some platforms where the `query` call occurs "soon enough". Fixes https://github.com/pytorch/pytorch/issues/38807 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41398 Reviewed By: zhangguanheng66 Differential Revision: D22540012 Pulled By: ezyang fbshipit-source-id: 6f56d951e48133ce4f6a9a54534298b7d2877c80	2020-07-15 11:03:35 -07:00
Kurt Mohler	6ff306b8b5	Add non-deterministic alert to CUDA operations that use `atomicAdd()` (#40056 ) Summary: Issue https://github.com/pytorch/pytorch/issues/15359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40056 Differential Revision: D22480638 Pulled By: ezyang fbshipit-source-id: 4cc913cb3ca6d4206de80f4665bbc9031aa3ca01	2020-07-15 10:57:32 -07:00
Peter Bell	dddac948a3	Add CUDA to pooling benchmark configs (#41438 ) Summary: Related to https://github.com/pytorch/pytorch/issues/41368 These benchmarks support CUDA already so there is no reason for it not to be in the benchmark config. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41438 Reviewed By: zhangguanheng66 Differential Revision: D22540756 Pulled By: ezyang fbshipit-source-id: 621eceff37377c1ab06ff7483b39fc00dc34bd46	2020-07-15 10:51:43 -07:00
Nikolay Korovaiko	3971777ebb	Krovatkin/reenable test tensorexpr (#41445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41445 Reviewed By: ZolotukhinM Differential Revision: D22543075 Pulled By: Krovatkin fbshipit-source-id: fd8c0a94f5b3aff34d2b444dbf551425fdc1df04	2020-07-15 10:42:40 -07:00
Xingying Cheng	04320a47d7	Add optimizer_for_mobile doc into python api root doc (#41211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41211 Test Plan: Imported from OSS Reviewed By: xta0 Differential Revision: D22543608 fbshipit-source-id: bf522a6c94313bf2696eca3c5bb5812ea98998d0	2020-07-15 09:57:40 -07:00
Shen Li	3a63a939d4	Revert D22517785: [pytorch][PR] Enable TF32 support for cuBLAS Test Plan: revert-hammer Differential Revision: D22517785 (`288ece89e1`) Original commit changeset: 87334c893561 fbshipit-source-id: 0a0674f49c1bcfc98f7f88af5a8c7de93b76e458	2020-07-15 08:15:48 -07:00
Shen Li	8548a21c00	Revert D22543215: Adjust bound_shape_inferencer to take 4 inputs for FCs Test Plan: revert-hammer Differential Revision: D22543215 (`86a2bdc35e`) Original commit changeset: 0977fca06630 fbshipit-source-id: b440f9b1eaeb35ec8b08e899890691e7a77a9f6d	2020-07-15 08:10:39 -07:00
Dinesh Govindaraj	f153b35b9b	Shape inference for SparseToDense in ExpertCombiner Summary: Adding shape inference for SpraseToDense. Proposal impl of shape inference only works when data_to_infer_dim is given, otherwise SpraseToDense output dimension depends on max value of input tensor Test Plan: buck test //caffe2/caffe2/python:sparse_to_dense_test buck test //caffe2/caffe2/python:hypothesis_test -- test_sparse_to_dense Dper3 Changes: f204594813 buck test dper3/dper3_models/ads_ranking/model_impl/sparse_nn/tests:sparse_nn_lib_test Reviewed By: zhongyx12, ChunliF Differential Revision: D22479511 fbshipit-source-id: 8983a9baea8853deec53ad6f795c874c3fb93de0	2020-07-15 08:04:48 -07:00
Summer Deng	86a2bdc35e	Adjust bound_shape_inferencer to take 4 inputs for FCs (#41452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41452 The model exported from online training workflow with int8 quantization contains FCs with 4 inputs. The extra input is the quant_param blob. This diff is to adjust the bound_shape_inferencer to get shape info for the quant_param input. Test Plan: ``` buck test caffe2/caffe2/opt:bound_shape_inference_test ``` Reviewed By: anurag16 Differential Revision: D22543215 fbshipit-source-id: 0977fca06630e279d47292e6b44f3d8180a767a5	2020-07-15 01:43:39 -07:00
Wojciech Baranowski	14f19ab833	Port index_select to ATen (CUDA) (#39946 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24578 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39946 Reviewed By: ngimel Differential Revision: D22520160 Pulled By: mruberry fbshipit-source-id: 7eb3029e3917e793f3c020359acb0989d5deb61e	2020-07-15 01:11:32 -07:00
Mike Ruberry	9552ec787c	Revert D22516606: [pytorch][PR] Temporary fix for determinant bug on CPU Test Plan: revert-hammer Differential Revision: D22516606 (`fcd6d91045`) Original commit changeset: 7ea8299b9d2c fbshipit-source-id: 41e19d5e1ba843cd70dce677869892f2e33fac09	2020-07-14 23:44:32 -07:00
Yan Xie	921d2a164f	SparseAdagrad/RowWiseSparseAdagrad mean fusion on CPU & GPU and dedup version for RowWiseSparse mean fusion on GPU Summary: 1. Support SparseAdagradFusedWithSparseLengthsMeanGradient and RowWiseSparseAdagradFusedWithSparseLengthsMeanGradient on CPU and GPU 2. Add the dedup implementation of fused RowWiseAdagrad op on GPUs for mean pooling Reviewed By: xianjiec Differential Revision: D22165603 fbshipit-source-id: 743fa55ed5893c34bc6406ddfbbbb347b88091d1	2020-07-14 22:36:16 -07:00
Taewook Oh	44b9306d0a	Export replaceAllUsesAfterNodeWith for PythonAPI (#41414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41414 This diff exports replaceAllUsesAfterNodeWith to PythonAPI. Test Plan: Tested locally. Please let me know if there is a set of unit tests to be passed outside of the default ones triggered by Sandcastle. Reviewed By: soumith Differential Revision: D22523211 fbshipit-source-id: 3f075bafa6208ada462abc57d495c15179a6e53d	2020-07-14 22:20:19 -07:00
Wojciech Baranowski	20f3051f7d	[adaptive_]max_pool{1,2,3}d: handle edge case when input is filled with -inf (#40665 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40131 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40665 Differential Revision: D22463538 Pulled By: ezyang fbshipit-source-id: 7e08fd0205926911d45aa150012154637e64a8d4	2020-07-14 21:51:40 -07:00
vishwakftw	fcd6d91045	Temporary fix for determinant bug on CPU (#35136 ) Summary: Changelog: - Make diagonal contiguous Temporarily Fixes https://github.com/pytorch/pytorch/issues/34061 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35136 Reviewed By: vincentqb Differential Revision: D22516606 Pulled By: ezyang fbshipit-source-id: 7ea8299b9d2c1c244995955b333a1dffb0cdff73	2020-07-14 21:20:50 -07:00
Hector Yuen	f074994a31	vectorize rounding ops (#41439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41439 use RoundToFloat16 on arrays Test Plan: layernorm unittest Reviewed By: venkatacrc Differential Revision: D22540118 fbshipit-source-id: dc84fd22b5dc6a3bd15ad4ec1eecb9db13d64e97	2020-07-14 20:59:39 -07:00
Hector Yuen	96f124e623	remove template arguments of layernorm Summary: remove layernorm templates and make them float since that's the only variant minor fixes in logging and testing Test Plan: ran the test Reviewed By: venkatacrc Differential Revision: D22527359 fbshipit-source-id: d6eec362a6e88e1c12fddf820ae629ede13fb2b8	2020-07-14 20:56:23 -07:00
Kurt Mohler	0b73ea0ea2	Change BCELoss size mismatch warning into an error (#41426 ) Summary: BCELoss currently uses different broadcasting semantics than numpy. Since previous versions of PyTorch have thrown a warning in these cases telling the user that input sizes should match, and since the CUDA and CPU results differ when sizes do not match, it makes sense to upgrade the size mismatch warning to an error. We can consider supporting numpy broadcasting semantics in BCELoss in the future if needed. Closes https://github.com/pytorch/pytorch/issues/40023 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41426 Reviewed By: zou3519 Differential Revision: D22540841 Pulled By: ezyang fbshipit-source-id: 6c6d94c78fa0ae30ebe385d05a9e3501a42b3652	2020-07-14 20:34:06 -07:00
Rohan Varma	fd0329029f	Fix flaky profiler and test_callback_simple RPC tests (#41287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41287 Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic. The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue. Test Plan: Ensured that the tests are no longer flaky by running them a bunch of times. Reviewed By: vincentqb Differential Revision: D22489197 fbshipit-source-id: 8452b93e4d45703453f77d968350c0d32f3f63fe	2020-07-14 19:26:44 -07:00
Meghan Lele	0d4a110c28	[JIT] Fix dead stores in JIT (#41202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41202 This commit fixes dead stores in JIT surfaced by the Quality Analyzer. Test Plan: Continuous integration. Reviewed By: jerryzh168 Differential Revision: D22461492 fbshipit-source-id: c587328f952054fb9449848e90b7d28a20aed4af	2020-07-14 17:59:50 -07:00
Mingzhe Li	4ddf27ba48	[op-bench] check device attribute in user inputs Summary: The device attribute in the op benchmark can only include 'cpu' or 'cuda'. So adding a check in this diff. Test Plan: buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --warmup_iterations 1 --iterations 1 Reviewed By: ngimel Differential Revision: D22538252 fbshipit-source-id: 3e5af72221fc056b8d867321ad22e35a2557b8c3	2020-07-14 17:17:59 -07:00
mattip	a0f110190c	clamp Categorical logit from -inf to min_fifo when calculating entropy (#41002 ) Summary: Fixes gh-40553 by clamping logit values when calculating Categorical.entropy Pull Request resolved: https://github.com/pytorch/pytorch/pull/41002 Reviewed By: mruberry Differential Revision: D22436432 Pulled By: ngimel fbshipit-source-id: 08b7c7b0c15ab4e5a56b3a8ec0d0237ad360202e	2020-07-14 16:21:12 -07:00
Qiao Tan	359cdc20e2	Revert D22432885: [pytorch][PR] unsafe_split, unsafe_split_with_sizes, unsafe_chunk operations Test Plan: revert-hammer Differential Revision: D22432885 (`c17670ac50`) Original commit changeset: 324aef091b32 fbshipit-source-id: 6b7c52bde46932e1cf77f61e7035d8a641b0beb6	2020-07-14 16:06:42 -07:00
Mingzhe Li	144f04e7ef	Fix qobserver test Summary: Change the device config in qobserver test to a string to honor --device flag. Test Plan: buck run caffe2/benchmarks/operator_benchmark/pt:qobserver_test -- --iterations 1 --device cpu Reviewed By: ngimel Differential Revision: D22536379 fbshipit-source-id: 8926b2393be1f52f9183f8205959a3ff18e3ed2a	2020-07-14 15:47:03 -07:00
Edward Yang	c68c5ea0e6	Upgrade cpp docs Sphinx/breathe/exhale to latest version (#41312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41312 I was hoping that exhale had gotten incremental recompilation in its latest version, but experimentally this does not seem to have been the case. Still, I had gotten the whole shebang to be working on the latest version of these packages, so might as well land the upgrade. There was one bug in Optional.h that I had to fix; see the cited bug report. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D22526349 Pulled By: ezyang fbshipit-source-id: d4169c2f48ebd8dfd8a593cc8cd232224d008ae9	2020-07-14 15:35:43 -07:00
Eli Uriegas	05207b7371	.circleci: Re-split postnightly into its own thing (#41354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41354 The nightly pipeline has the potential to be flaky and thus the html pages have the potential not to be updated. This should actually be done as an automatic lambda job that runs whenever the S3 bucket updates but this is intermediate step in order to get there. Closes https://github.com/pytorch/pytorch/issues/40998 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22530283 Pulled By: seemethere fbshipit-source-id: 0d80b7751ede83e6dd466690cc0a0ded68f59c5d	2020-07-14 14:49:01 -07:00
Wojciech Baranowski	c17670ac50	unsafe_split, unsafe_split_with_sizes, unsafe_chunk operations (#39299 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/36403 Copy-paste of the issue description: * Escape hatch: Introduce unsafe_* version of the three functions above that have the current behavior (outputs not tracked as views). The documentation will explain in detail why they are unsafe and when it is safe to use them. (basically, only the outputs OR the input can be modified inplace but not both. Otherwise, you will get wrong gradients). * Deprecation: Use the CreationMeta on views to track views created by these three ops and throw warning when any of the views is modified inplace saying that this is deprecated and will raise an error soon. For users that really need to modify these views inplace, they should look at the doc of the unsafe_* version to make sure their usecase is valid: * If it is not, then pytorch is computing wrong gradients for their use case and they should not do inplace anymore. * If it is, then they can use the unsafe_* version to keep the current behavior. * Removal: Use the CreationMeta on view to prevent any inplace on these views (like we do for all other views coming from multi-output Nodes). The users will still be able to use the unsafe_ versions if they really need to do this. Note about BC-breaking: - This PR changes the behavior of the regular function by making them return proper views now. This is a modification that the user will be able to see. - We skip all the view logic for these views and so the code should behave the same as before (except the change in the `._is_view()` value). - Even though the view logic is not performed, we do raise deprecation warnings for the cases where doing these ops would throw an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39299 Differential Revision: D22432885 Pulled By: albanD fbshipit-source-id: 324aef091b32ce69dd067fe9b13a3f17d85d0f12	2020-07-14 14:15:41 -07:00
Peter Bell	e2c4c2f102	addmm: Reduce constant time overhead (#41374 ) Summary: Fixes the overhead reported by ngimel in https://github.com/pytorch/pytorch/pull/40927#issuecomment-657709646 As it turns out, `Tensor.size(n)` has more overhead than `Tensor.sizes()[n]`. Since addmm does a lot of introspection of the input matrix sizes and strides, this added up to a noticeable (~1 us) constant time overhead. With this change, a 1x1 matmul takes 2.85 us on my machine compared to 2.90 us on pytorch 1.5. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41374 Reviewed By: ailzhang Differential Revision: D22519924 Pulled By: ngimel fbshipit-source-id: b29504bee7de79ce42e5e50f91523dde42b073b7	2020-07-14 13:47:16 -07:00
Xiang Gao	288ece89e1	Enable TF32 support for cuBLAS (#40800 ) Summary: Benchmark on a fully connected network and torchvision models (time in seconds) on GA100: \| model \| batch size \| forward(TF32) \| forward(FP32) \| backward(TF32) \| backward(FP32) \| \|--------------------\|------------\|---------------\|---------------\|----------------\|----------------\| \| FC 512-128-32-8 \| 512 \| 0.000211 \| 0.000321 \| 0.000499 \| 0.000532 \| \| alexnet \| 512 \| 0.0184 \| 0.0255 \| 0.0486 \| 0.0709 \| \| densenet161 \| 128 \| 0.0665 \| 0.204 \| 0.108 \| 0.437 \| \| googlenet \| 256 \| 0.0925 \| 0.110 \| 0.269 \| 0.326 \| \| inception_v3 \| 256 \| 0.155 \| 0.214 \| 0.391 \| 0.510 \| \| mnasnet1_0 \| 512 \| 0.108 \| 0.137 \| 0.298 \| 0.312 \| \| mobilenet_v2 \| 512 \| 0.114 \| 0.294 \| 0.133 \| 0.303 \| \| resnet18 \| 512 \| 0.0722 \| 0.100 \| 0.182 \| 0.228 \| \| resnext50_32x4d \| 256 \| 0.170 \| 0.237 \| 0.373 \| 0.479 \| \| shufflenet_v2_x1_0 \| 512 \| 0.0463 \| 0.0473 \| 0.125 \| 0.123 \| \| squeezenet1_0 \| 512 \| 0.0870 \| 0.0948 \| 0.205 \| 0.214 \| \| vgg16 \| 256 \| 0.167 \| 0.234 \| 0.401 \| 0.502 \| \| wide_resnet50_2 \| 512 \| 0.186 \| 0.310 \| 0.415 \| 0.638 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/40800 Reviewed By: mruberry Differential Revision: D22517785 Pulled By: ngimel fbshipit-source-id: 87334c8935616f72a6af5abbd3ae69f76923dc3e	2020-07-14 13:21:10 -07:00
rohithkrn	c528faac7d	[ROCm] Skip problematic mgpu tests on ROCm3.5 (#41409 ) Summary: nccl tests and parallelize_bmuf_distributed test are failing on rocm3.5.1. Skipping these tests to upgrade the CI to rocm3.5.1 jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41409 Reviewed By: orionr Differential Revision: D22528928 Pulled By: seemethere fbshipit-source-id: 928196b7a62a441d391e69f54b278313ecc75d77	2020-07-14 11:55:43 -07:00
Hector Yuen	5f146a4125	fix include file path in unary ops Summary: fix include file path in unary ops Test Plan: compile Reviewed By: amylittleyang Differential Revision: D22527312 fbshipit-source-id: 589efd2231ff8bd3133cb7844738429927ecee68	2020-07-14 11:08:51 -07:00
Meghan Lele	4972cf06a2	[JIT] Add out-of-source-tree to_backend tests (#41145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41145 Summary This commit adds out-of-source-tree tests for `to_backend`. These tests check that a Module can be lowered to a backend, exported, loaded (in both Python and C++) and executed. Fixes This commit fixes #40067. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D22510076 Pulled By: SplitInfinity fbshipit-source-id: f65964ef3092a095740f06636ed5b1eb0884492d	2020-07-14 10:57:04 -07:00
Xiaomeng Yang	0e7b9d4ff8	Fix logit doc (#41384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41384 Fix logit doc Test Plan: unittest Reviewed By: houseroad Differential Revision: D22521730 fbshipit-source-id: 270462008c6ac73cd90aecd77c5de112fc93ea8d	2020-07-14 10:40:52 -07:00
Peter Bell	87bf04fe12	AvgPool: Ensure all cells are valid in ceil mode (#41368 ) Summary: Closes https://github.com/pytorch/pytorch/issues/36977 This avoid the division by zero that was causing NaNs to appear in the output. `AvgPooling2d` and `AvgPooling3d` both had this issue on CPU and CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41368 Reviewed By: ailzhang Differential Revision: D22520013 Pulled By: ezyang fbshipit-source-id: 3ece7829f858f5bc17c2c1d905266ac510f11194	2020-07-14 09:24:30 -07:00
Kyle Johnson	535e8814a4	Add operators for LiteLMLSTM to Lite Interpreter (#41270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41270 The Smart Keyboard model for Oculus requires operators previously not in the lite interpreter: aten::exp (for floats), aten::ord, aten::lower, aten::__contains__.str_list, aten::slice.str, aten::strip, aten::split.str, and aten::__getitem__.str. Test Plan: Verify smart keyboard model can be used: Check out next diff in stack and follow test instructions there Reviewed By: iseeyuan Differential Revision: D22289812 fbshipit-source-id: df574d5af4d4fafb40f0e209b66a93fe02d83020	2020-07-14 09:18:41 -07:00
Edward Yang	befb22790f	Fix a number of deprecation warnings (#40179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40179 - Pass no-psabi to shut up GCC about # Suppress "The ABI for passing parameters with 64-byte alignment has changed in GCC 4.6" - Fix use of deprecated data() accessor (and minor optimization: hoist accessor out of loop) - Undeprecate NetDef.num_workers, no one is serious about fixing these - Suppress warnings about deprecated pthreadpool types Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D22234138 Pulled By: ezyang fbshipit-source-id: 6a1601b6d7551a7e6487a44ae65b19acdcb7b849	2020-07-14 09:11:34 -07:00
generatedunixname89002005287564	13dd53b3d2	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D22523334 fbshipit-source-id: e687e26f68a4f923164a51ce0b69ec1d131b9022	2020-07-14 08:42:23 -07:00
anjali411	e888c3bca1	Update torch.set_default_dtype doc (#41263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41263 Test Plan: Imported from OSS Differential Revision: D22482989 Pulled By: anjali411 fbshipit-source-id: 2aadfbb84bbab66f3111970734a37ba74d817ffd	2020-07-14 07:29:49 -07:00
Luca Wehrstedt	c20426f86d	Fix torch.cuda.check_error type errors (#41330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41330 `torch.cuda.check_error` is annotated as taking an `int` as argument but when running `torch.cuda.check_error(34)` one would get: ``` TypeError: cudaGetErrorString(): incompatible function arguments. The following argument types are supported: 1. (arg0: torch._C._cudart.cudaError) -> str Invoked with: 34 ``` Even if one explicitly casted the argument, running `torch.cuda.check_error(torch._C._cudart.cudaError(34))` would give: ``` AttributeError: 'str' object has no attribute 'decode' ``` This PR fixes both issues (thus allowing `check_error` to be called with a un-casted int) and adds a test. ghstack-source-id: 107628709 Test Plan: Unit tests Reviewed By: ezyang Differential Revision: D22500549 fbshipit-source-id: 9170c1e466dd554d471e928b26eb472a712da9e1	2020-07-14 00:47:14 -07:00
Xiaomeng Yang	80d5b3785b	Add torch.logit function (#41062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41062 Add torch.logit function Test Plan: buck test mode/dev-nosan //caffe2/test:torch -- "logit" Reviewed By: hl475 Differential Revision: D22406912 fbshipit-source-id: b303374f4c68850eb7477eb0645546a24b844606	2020-07-13 19:33:20 -07:00
Xiang Gao	34e11b45c9	Remove thrust casting from static_cast_with_inter_type (#39905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39905 Reviewed By: ZolotukhinM Differential Revision: D22510307 Pulled By: ngimel fbshipit-source-id: 34357753fca4f2a8d5e2b1bbf8de8d642ca9bb20	2020-07-13 19:16:00 -07:00
Sebastian Messmer	5f6c6ed157	Fix FC issue (#41198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41198 https://github.com/pytorch/pytorch/pull/39611 unified signatures of some ops taking TensorOptions arguments by making them optional. That has FC implications but only for models writting with a PyTorch version after that version (see explanation in description of that PR). However, it also changed the default from `pin_memory=False` to `pin_memory=None`, which actually breaks FC for preexisting models too if they're re-exported with a newer PyTorch, because we materialize default values when exporting. This is bad. This PR reverts that particular part of https://github.com/pytorch/pytorch/pull/39611 to revert the FC breakage. ghstack-source-id: 107475024 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D22461661 fbshipit-source-id: ba2776267c3bba97439df66ecb50be7c1971d20d	2020-07-13 18:48:56 -07:00
Michael Suo	ca1b8ebbcb	move misc implementation out of `jit/__init__.py` (#41154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41154 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D22445213 Pulled By: suo fbshipit-source-id: 200545715c5ef13beb1437f49e01efb21498ddb7	2020-07-13 16:59:55 -07:00
蔡舒起	6392713584	add spaces in .md annotation for python indent (#41260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41260 Reviewed By: ezyang Differential Revision: D22504634 Pulled By: ailzhang fbshipit-source-id: 9d2d605dc19b07896ee4b1811fcd34d4dcb9b0c7	2020-07-13 15:11:46 -07:00
Eli Uriegas	b6e1944d35	.circleci: Explicitly remove nvidia apt repos (#41367 ) Summary: The nvidia apt repositories seem to be left over on the amd nodes so let's just go ahead and remove them explicitly if we're not testing for CUDA Example: https://app.circleci.com/pipelines/github/pytorch/pytorch/190222/workflows/8f75b5cd-1afd-43dc-9fa7-f7b058f07b46/jobs/6223743/steps Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/41367 Reviewed By: ezyang Differential Revision: D22513844 Pulled By: seemethere fbshipit-source-id: 6da4dd8423de5f7ec80c7904187cf80c1b91ab14	2020-07-13 15:05:57 -07:00
Hector Yuen	d601325de4	update operators in the mapping to fp16 emulation Summary: add logit and swish to this list Test Plan: f203925461 Reviewed By: amylittleyang Differential Revision: D22506814 fbshipit-source-id: b449e4ea16354cb76915adb01cf317cffb494733	2020-07-13 14:08:24 -07:00
Yi Huang (PyTorch)	4196605776	helper function to print out all DDP-relevant env vars (#41297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41297 GH issue: https://github.com/pytorch/pytorch/issues/40105 Add a helper function to DDP to print out all relevant env vars for debugging Test Plan: test through unittest, example output: --- env:RANK=3 env:LOCAL_RANK=N/A env:WORLD_SIZE=N/A env:MASTER_PORT=N/A env:MASTER_ADDR=N/A env:CUDA_VISIBLE_DEVICES=N/A env:GLOO_SOCKET_IFNAME=N/A env:GLOO_DEVICE_TRANSPORT=N/A env:NCCL_SOCKET_IFNAME=N/A env:NCCL_BLOCKING_WAIT=N/A ... --- Reviewed By: mrshenli Differential Revision: D22490486 fbshipit-source-id: 5dc7d2a18111e5a5a12a1b724d90eda5d35acd1c	2020-07-13 14:03:04 -07:00
Michael Ranieri	6e6931e234	fix duplicate extern sdot and missing flags (#41195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41195 `BLAS_F2C` is set in `THGeneral.h`. `sdot` redefined with double return type in the case that `BLAS_F2C` is set and `BLAS_USE_CBLAS_DOT` is not. Test Plan: CircleCI green, ovrsource green Reviewed By: malfet Differential Revision: D22460253 fbshipit-source-id: 75f17b3e47da0ed33fcadc2843a57ad616f27fb5	2020-07-13 13:43:48 -07:00
emil	0c77bd7c0b	Quantization: preserving pre and post forward hooks (#37233 ) Summary: 1. While do convert() preserve module's pre and post forward hooks 2. While do fusion preserve only module's pre forward hooks (because after fusion output no longer the same) Pull Request resolved: https://github.com/pytorch/pytorch/pull/37233 Differential Revision: D22425141 Pulled By: jerryzh168 fbshipit-source-id: e69b81821d507dcd110d2ff3594ba94b9593c8da	2020-07-13 12:41:24 -07:00
Summer Deng	c451ddaeda	Add shape inference functions for int8 quantization related ops (#41215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41215 To unblock int8 model productization on accelerators, we need the shape and type info for all the blobs after int8 quantization. This diff added shape inference functions for int8 quantization related ops. Test Plan: ``` buck test caffe2/caffe2/quantization/server:int8_gen_quant_params_test buck test caffe2/caffe2/quantization/server:fully_connected_dnnlowp_op_test ``` Reviewed By: hx89 Differential Revision: D22467487 fbshipit-source-id: 8298abb0df3457fcb15df81f423f557c1a11f530	2020-07-13 12:02:11 -07:00
David Reiss	7183fd20f8	Add interpolate-style overloads to aten::upsample* ops (#37176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37176 The non-deprecated user-facing interface to these ops (F.interpolate) has a good interface: output size and scale are both specified as a scalar or list, and exactly one must be present. These aten ops have an older, clunkier interface where output size is required and scales are specified as separate optional scalars per dimension. This change adds new overloads to the aten ops that match the interface of interpolate. The plan is to eventually remove the old overloads, resulting in roughly net-zero code added. I also believe it is possible to push this interface down further, eliminating multiple optional<double> arguments, and simplifying the implementations. The rollout plan is to land this, wait for a reasonable interval for forwards-compatibility (maybe 1 week?), land the change that updates interpolate to call these overloads, wait for a reasonable interval for backwards-compatibility (maybe 6 months?), then remove the old overloads. This diff does not add the `.out` variants of the ops because they are not currently accessible through any user-facing API. ghstack-source-id: 106938113 Test Plan: test_nn covers these ops fairly well, so that should prevent this diff from breaking anything on its own. test_nn on the next diff in the stack actually uses these new overloads, so that should validate that they are actually correct. Differential Revision: D21209989 fbshipit-source-id: 2b74d230401f071364eb05e138cdaa55279cfe91	2020-07-13 11:53:29 -07:00
David Reiss	fb9e44f8dd	Add support for float[]? arguments in native_functions.yaml (#37175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37175 ghstack-source-id: 106938114 Test Plan: Upcoming diffs use this for upsampling. Differential Revision: D21209994 fbshipit-source-id: 1a71c07e45e28772a2bbe450b68280dcc0fe2def	2020-07-13 11:51:10 -07:00
Yavuz Yetim	d04a2e4dae	Back out "Revert D22329069: Self binning histogram" (#41313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41313 This diff backs out the backout diff. The failure was due to C++ `or` not being supported in MSVC. This is now replaced with \|\| Original commit changeset: fc7f3f8c968d Test Plan: Existing unit tests, check github CI. Reviewed By: malfet Differential Revision: D22494777 fbshipit-source-id: 3271288919dc3a6bfb82508ab9d021edc910ae45	2020-07-13 11:46:34 -07:00
Eli Uriegas	86d803a9da	.cirlceci: Setup nvidia runtime for cu as well (#41268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41268 We also want nvidia runtime packages to get installed when the BUILD_ENVIRONMENT also includes "cu" Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Differential Revision: D22505885 Pulled By: seemethere fbshipit-source-id: 4d8e70ed8aed9c6fd1828bc13cf7d5b0f8f50a0a	2020-07-13 10:29:25 -07:00
Hector Yuen	dea39b596e	reduce logging for layernorm (#41305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41305 added a warning message when layernorm under/overflows, which is what nnpi does, reducing the frequency of the logging to every 1000 Test Plan: compilation Reviewed By: yinghai Differential Revision: D22492726 fbshipit-source-id: 9343beeae6e65bf3846c6b3d2edd2a08dac85ed6	2020-07-13 10:23:46 -07:00
Mengchi Zhang	67a4f375cd	Pass the number of indices but not embedding size in PyTorch operator (#41315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41315 We should pass the number of indices but not embedding size in SparseAdagrad fused PyTorch operator Reviewed By: jianyuh Differential Revision: D22495422 fbshipit-source-id: ec5d3a5c9547fcd8f95106d912b71888217a5af0	2020-07-12 20:55:40 -07:00
yyn19951228	98df9781a7	Impl for ParameterList (#41259 ) Summary: This is a new PR for https://github.com/pytorch/pytorch/issues/40850, https://github.com/pytorch/pytorch/issues/40987 and https://github.com/pytorch/pytorch/issues/41206(I unintentionally closed), as I have some issues for rebates for that one. Very sorry about that. And I have fixed the tests failed in that PR. This diff contains the implementation of C++ API for ParameterList from https://github.com/pytorch/pytorch/issues/25883. Refer to the Python API: `bc9e8af218/torch/nn/modules/container.py (L376)` Not sure about some naming difference between C++ API and Python API, like `append`, should it be called `push_back` Pull Request resolved: https://github.com/pytorch/pytorch/pull/41259 Test Plan: Add unit tests in this diff Differential Revision: D22495780 Pulled By: glaringlee fbshipit-source-id: 79ea3592db640f35477d445ecdaeafbdad814bec	2020-07-12 20:50:31 -07:00
Paul Shao	fa153184c8	Fake Quantization Per Channel Kernel Core Implementation (CPU) (#41037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41037 This diff contains the core implementation for the fake quantizer per channel kernel that supports back propagation on the scale and zero point. Test Plan: On a devvm, use: - `buck test //caffe2/test:quantization -- learnable_forward_per_channel` - `buck test //caffe2/test:quantization -- learnable_backward_per_channel` Reviewed By: z-a-f Differential Revision: D22395665 fbshipit-source-id: 280c2405d04adfeda9fb9cfc94d89e8d868e0d41	2020-07-12 12:14:00 -07:00
Paul Shao	5e72ebeda3	Fake Quantization Per Tensor Kernel Core Implementation (CPU) (#41029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41029 This diff contains the core implementation for the fake quantizer per tensor kernel that supports back propagation on the scale and zero point. Test Plan: On a devvm, use: - `buck test //caffe2/test:quantization -- learnable_forward_per_tensor` - `buck test //caffe2/test:quantization -- learnable_backward_per_tensor` Reviewed By: z-a-f Differential Revision: D22394145 fbshipit-source-id: f6748b635b86679aa9174a8065e6be5e20a95d81	2020-07-12 12:11:38 -07:00
Zafar	402be850a8	[quant] Adding zero point type check for per channel quantization (#40811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40811 Test Plan: Imported from OSS Differential Revision: D22319417 Pulled By: z-a-f fbshipit-source-id: 7be3a511ddd33b5fe749a83166bbc5874d1bd539	2020-07-12 11:40:19 -07:00
Jerry Zhang	4b4184fc69	[quant][graphmode] use RemoveMutation to remove append (#41161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41161 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D22446714 fbshipit-source-id: 15da28ef773300a141603d67a1c4524f1ec32239	2020-07-11 16:49:56 -07:00
Anurag Gupta	106b0b6a62	Op to create quant scheme blob (#40760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40760 Add op to create a quant scheme. Test Plan: buck test mode/opt caffe2/caffe2/quantization/server:int8_quant_scheme_blob_fill_test {F241838981} Reviewed By: csummersea Differential Revision: D22228154 fbshipit-source-id: 1b7a02c06937c68e2fcccf77eb10a965300ed732	2020-07-11 10:53:10 -07:00
Jerry Zhang	edcf2cdf86	[quant] dequantize support list and tuple of tensors (#41079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41079 Test Plan: Imported from OSS Differential Revision: D22420700 fbshipit-source-id: bc4bf0fb47dcf8b94b11fbdc91e8d5a75142b7be	2020-07-11 10:44:19 -07:00
Mengchi Zhang	c864158475	Add fp16 support to SparseLengthSum PyTorch operator (#41058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41058 SparseLengthSum PyTorch operator just accept float and double type before, this diff add fp16 support to SparseLengthSum PT operator. Reviewed By: jianyuh Differential Revision: D22387253 fbshipit-source-id: 2a7d03ceaadbb7b04077cff72ab77da6457ba989	2020-07-11 07:54:32 -07:00
Hao Lu	28291d3cf8	[caffe2] Revert D22220798 (#41302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41302 Test Plan: ``` buck test //caffe2/caffe2/fb/predictor:black_box_predictor_test ``` Differential Revision: D22492356 fbshipit-source-id: efcbc3c67abda5cb9da47e633804a4800d92f89b	2020-07-11 03:28:29 -07:00
Hector Yuen	e544bf2924	fix the range of the random weights used in the int8fc test (#41303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41303 the error came from I0710 18:02:48.025024 1780875 NNPIOptions.cpp:49] [NNPI_LOG][D] [KS] convert_base_kernel_ivp.cpp(524): Output Scale 108240.101562 is out of valid range +-(Min 0.000061 Max 65504.000000)!!! Seems like the weights we are using are too small, thus generating scaling factors out of the range of fp16 (>65k). I am tentatively increasing this factor to a higher value to avoid this. (10x bigger) Also increased max_examples to 100 Test Plan: ran this test Reviewed By: yinghai Differential Revision: D22492481 fbshipit-source-id: c0f9e59b0e70895ab787868ef1d87e6e80106554	2020-07-11 00:19:29 -07:00
Linbin Yu	a1ed6e1eb3	Revert D22467871: add check for duplicated op registration in JIT Test Plan: revert-hammer Differential Revision: D22467871 (`a548c6b18f`) Original commit changeset: 9b7a40a217e6 fbshipit-source-id: b594d4d0a079f7e24ef0efb45476ded2838cbef1	2020-07-10 23:39:23 -07:00
Jianyu Huang	095886fa42	[caffe2] Fix the issues when using CUB RadixSort (#41299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41299 When using `cub::DeviceRadixSort::SortPairs` (https://nvlabs.github.io/cub/structcub_1_1_device_radix_sort.html), the `end_bit` argument, or the most-significant bit index (exclusive) needed for key comparison, should be passed with `int(log2(float(num_rows)) + 1)` instead of `int(log2(float(num_indice)) + 1)`. This is because all the values in indices array are guaranteed to be less than num_rows (hash_size), not num_indices. Thanks ngimel for pointing this point and thanks malfet for quickly fixing the log2() compilation issues. Note: An optional bit subrange [begin_bit, end_bit) of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement. Test Plan: ``` buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient $caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps$' --print-passing-details ``` Reviewed By: malfet Differential Revision: D22491662 fbshipit-source-id: 4fdabe86244c948af6244f9bd91712844bf1dec1	2020-07-10 22:39:43 -07:00
Nikita Shulga	d1f06da9b7	Solve log2(x:int) ambiguity by using log2(float(x)) (#41295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41295 Differential Revision: D22490995 Pulled By: malfet fbshipit-source-id: 17037e551ce5986f3162389a61932099563c02a7	2020-07-10 20:12:36 -07:00
Jannik Bamberger	1c098ae339	Fix arg type annotations in jit.trace and onnx.export (#41093 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40350 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41093 Differential Revision: D22477950 Pulled By: malfet fbshipit-source-id: f1141c129b6d9efb373d22291b441df86c529ddd	2020-07-10 20:07:05 -07:00
Xiang Gao	877a59967f	Ampere has CUDA_MAX_THREADS_PER_SM == 2048 (#41138 ) Summary: See: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf page 44, table 5 ![image](https://user-images.githubusercontent.com/1032377/86958633-56051580-c111-11ea-94da-c726a61dc00a.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/41138 Differential Revision: D22488904 Pulled By: malfet fbshipit-source-id: 97bd585d91e1a368f51aa6bd52081bc57d42dbf8	2020-07-10 20:02:20 -07:00
Nikita Shulga	6cbb92494d	Better THGeneric.h generation rules in bazel (#41285 ) Summary: It doesn't do a good job of checking BLAS library capabilities, so hardcode the undef of BLAS_F2C Pull Request resolved: https://github.com/pytorch/pytorch/pull/41285 Differential Revision: D22489781 Pulled By: malfet fbshipit-source-id: 13a14f31e08d7f9ded49731e4fd23663bac75cd2	2020-07-10 17:40:04 -07:00
Nikita Shulga	67f5d68fdf	Revert D22465221: [pytorch][PR] Reducing size of docker Linux image Test Plan: revert-hammer Differential Revision: D22465221 (`7c143e5d3e`) Original commit changeset: 487542597294 fbshipit-source-id: f085763a13497bd5ceea0ed6aa7676320c8806bf	2020-07-10 17:12:26 -07:00
Alexander Grund	ac3542fa59	Define PSIMD_SOURCE_DIR when including FP16 (#41233 ) Summary: Avoids a superflous redownload when *NNPACK is not used (e.g. on Power) Example: https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/1128/consoleFull Search for "Downloading PSimd" See also https://github.com/pytorch/pytorch/issues/41178 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41233 Differential Revision: D22488833 Pulled By: malfet fbshipit-source-id: 637291419ddd3b2a8dc25e211a4ebbba955e5855	2020-07-10 16:55:10 -07:00
Michael Ranieri	abea7cd561	msvc anonymous namespace bug (#41199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41199 workaround for: https://developercommunity.visualstudio.com/content/problem/900452/variable-in-anonymous-namespace-has-external-linka.html Test Plan: CI green, ovrsource green Reviewed By: malfet Differential Revision: D22462050 fbshipit-source-id: 11a2fd6a4db1f29ce350699cfc3121dc89ab7ef6	2020-07-10 16:45:14 -07:00
Lingyi Liu	48d6e2adce	Disable the mkldnn for conv2d in some special cases (#40610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40610 We have benchmarked several models, which shows the native implementation of conv2d is faster mkldnn path. For group conv, the native implementation does not batch all the groups. Test Plan: ``` import torch import torch.nn.functional as F import numpy as np from timeit import Timer num = 50 S = [ # [1, 1, 100, 40, 16, 3, 3, 1, 1, 1, 1], # [1, 2048, 4, 2, 512, 1, 1, 1, 1, 0, 0], # [1, 512, 4, 2, 512, 3, 3, 1, 1, 1, 1], # [1, 512, 4, 2, 2048, 1, 1, 1, 1, 0, 0], # [1, 2048, 4, 2, 512, 1, 1, 1, 1, 0, 0], # [1, 512, 4, 2, 512, 3, 3, 1, 1, 1, 1], # [1, 512, 4, 2, 2048, 1, 1, 1, 1, 0, 0], # [1, 2048, 4, 2, 512, 1, 1, 1, 1, 0, 0], # [1, 512, 4, 2, 512, 3, 3, 1, 1, 1, 1], # [1, 512, 4, 2, 2048, 1, 1, 1, 1, 0, 0], # [1, 2048, 4, 2, 512, 1, 1, 1, 1, 0, 0], # [1, 512, 4, 2, 512, 3, 3, 1, 1, 1, 1], # [1, 512, 4, 2, 2048, 1, 1, 1, 1, 0, 0], # [1, 2048, 4, 2, 512, 1, 1, 1, 1, 0, 0], # [1, 512, 4, 2, 512, 3, 3, 1, 1, 1, 1], # [1, 512, 4, 2, 2048, 1, 1, 1, 1, 0, 0], # [1, 2048, 4, 2, 512, 1, 1, 1, 1, 0, 0], # [1, 512, 4, 2, 512, 3, 3, 1, 1, 1, 1], # [1, 512, 4, 2, 2048, 1, 1, 1, 1, 0, 0], [1, 3, 224, 224, 64, 7, 7, 2, 2, 3, 3, 1], [1, 64, 56, 56, 128, 1, 1, 1, 1, 0, 0, 1], [1, 128, 56, 56, 128, 3, 3, 1, 1, 1, 1, 32], [1, 128, 56, 56, 256, 1, 1, 1, 1, 0, 0, 1], [1, 64, 56, 56, 256, 1, 1, 1, 1, 0, 0, 1], [1, 256, 56, 56, 128, 1, 1, 1, 1, 0, 0, 1], [1, 128, 56, 56, 128, 3, 3, 1, 1, 1, 1, 32], [1, 128, 56, 56, 256, 1, 1, 1, 1, 0, 0, 1], [1, 256, 56, 56, 128, 1, 1, 1, 1, 0, 0, 1], [1, 128, 56, 56, 128, 3, 3, 1, 1, 1, 1, 32], [1, 128, 56, 56, 256, 1, 1, 1, 1, 0, 0, 1], [1, 256, 56, 56, 256, 1, 1, 1, 1, 0, 0, 1], [1, 256, 56, 56, 256, 3, 3, 2, 2, 1, 1, 32], [1, 256, 28, 28, 512, 1, 1, 1, 1, 0, 0, 1], [1, 256, 56, 56, 512, 1, 1, 2, 2, 0, 0, 1], [1, 512, 28, 28, 256, 1, 1, 1, 1, 0, 0, 1], [1, 256, 28, 28, 256, 3, 3, 1, 1, 1, 1, 32], [1, 256, 28, 28, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 28, 28, 256, 1, 1, 1, 1, 0, 0, 1], [1, 256, 28, 28, 256, 3, 3, 1, 1, 1, 1, 32], [1, 256, 28, 28, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 28, 28, 256, 1, 1, 1, 1, 0, 0, 1], [1, 256, 28, 28, 256, 3, 3, 1, 1, 1, 1, 32], [1, 256, 28, 28, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 28, 28, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 28, 28, 512, 3, 3, 2, 2, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 512, 28, 28, 1024, 1, 1, 2, 2, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1], [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32], [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 1024, 3, 3, 2, 2, 1, 1, 32], [1, 1024, 7, 7, 2048, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 14, 14, 2048, 1, 1, 2, 2, 0, 0, 1], [1, 2048, 7, 7, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 7, 7, 1024, 3, 3, 1, 1, 1, 1, 32], [1, 1024, 7, 7, 2048, 1, 1, 1, 1, 0, 0, 1], [1, 2048, 7, 7, 1024, 1, 1, 1, 1, 0, 0, 1], [1, 1024, 7, 7, 1024, 3, 3, 1, 1, 1, 1, 32], [1, 1024, 7, 7, 2048, 1, 1, 1, 1, 0, 0, 1], ] for x in range(105): P = S[x] print(P) (N, C, H, W) = P[0:4] M = P[4] (kernel_h, kernel_w) = P[5:7] (stride_h, stride_w) = P[7:9] (padding_h, padding_w) = P[9:11] X_np = np.random.randn(N, C, H, W).astype(np.float32) W_np = np.random.randn(M, C, kernel_h, kernel_w).astype(np.float32) X = torch.from_numpy(X_np) g = P[11] conv2d_pt = torch.nn.Conv2d( C, M, (kernel_h, kernel_w), stride=(stride_h, stride_w), padding=(padding_h, padding_w), groups=g, bias=True) class ConvNet(torch.nn.Module): def __init__(self): super(ConvNet, self).__init__() self.conv2d = conv2d_pt def forward(self, x): return self.conv2d(x) model = ConvNet() def pt_forward(): with torch.no_grad(): model(X) torch._C._set_mkldnn_enabled(True) t = Timer("pt_forward()", "from __main__ import pt_forward, X") print("MKLDNN pt time = {}".format(t.timeit(num) / num * 1000.0)) torch._C._set_mkldnn_enabled(False) t = Timer("pt_forward()", "from __main__ import pt_forward, X") print("TH pt time = {}".format(t.timeit(num) / num * 1000.0)) OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 python bm.py ``` output: ``` [1, 3, 224, 224, 64, 7, 7, 2, 2, 3, 3, 1] MKLDNN pt time = 5.891108009964228 TH pt time = 7.0624795742332935 [1, 64, 56, 56, 128, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 1.4464975893497467 TH pt time = 0.721491202712059 [1, 128, 56, 56, 128, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 1.4036639966070652 TH pt time = 3.299683593213558 [1, 128, 56, 56, 256, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.3908068016171455 TH pt time = 2.227546200156212 [1, 64, 56, 56, 256, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.226586602628231 TH pt time = 1.3865559734404087 [1, 256, 56, 56, 128, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.31307839602232 TH pt time = 2.4284918047487736 [1, 128, 56, 56, 128, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 1.5028003975749016 TH pt time = 3.824346773326397 [1, 128, 56, 56, 256, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.4405963867902756 TH pt time = 2.6227117888629436 [1, 256, 56, 56, 128, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.405764400959015 TH pt time = 2.644723802804947 [1, 128, 56, 56, 128, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 1.5220053866505623 TH pt time = 3.9365867897868156 [1, 128, 56, 56, 256, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.606868200004101 TH pt time = 2.5387956015765667 [1, 256, 56, 56, 256, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 6.0041105933487415 TH pt time = 5.305919591337442 [1, 256, 56, 56, 256, 3, 3, 2, 2, 1, 1, 32] MKLDNN pt time = 1.4830979891121387 TH pt time = 7.532084975391626 [1, 256, 28, 28, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.025687597692013 TH pt time = 2.2185291908681393 [1, 256, 56, 56, 512, 1, 1, 2, 2, 0, 0, 1] MKLDNN pt time = 3.5893129743635654 TH pt time = 2.696530409157276 [1, 512, 28, 28, 256, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.8203356079757214 TH pt time = 2.0819314010441303 [1, 256, 28, 28, 256, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.8583215996623039 TH pt time = 2.7761065773665905 [1, 256, 28, 28, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.9077288135886192 TH pt time = 2.045416794717312 [1, 512, 28, 28, 256, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.805021796375513 TH pt time = 2.131381593644619 [1, 256, 28, 28, 256, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.9023251943290234 TH pt time = 2.9028950072824955 [1, 256, 28, 28, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.1174601800739765 TH pt time = 2.275596000254154 [1, 512, 28, 28, 256, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.100480604916811 TH pt time = 2.399571593850851 [1, 256, 28, 28, 256, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.9321337938308716 TH pt time = 2.886691205203533 [1, 256, 28, 28, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.065785188227892 TH pt time = 2.1640316024422646 [1, 512, 28, 28, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 5.891813579946756 TH pt time = 4.2956990003585815 [1, 512, 28, 28, 512, 3, 3, 2, 2, 1, 1, 32] MKLDNN pt time = 0.9399276040494442 TH pt time = 4.7622935846447945 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.2426914013922215 TH pt time = 2.3699573799967766 [1, 512, 28, 28, 1024, 1, 1, 2, 2, 0, 0, 1] MKLDNN pt time = 3.0341636016964912 TH pt time = 2.6606030017137527 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.991385366767645 TH pt time = 2.6313263922929764 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.7330256141722202 TH pt time = 3.008321188390255 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.880081795156002 TH pt time = 2.289068605750799 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.9583285935223103 TH pt time = 2.6302105747163296 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.7322711870074272 TH pt time = 2.8230775892734528 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.8620235808193684 TH pt time = 2.4078205972909927 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.828651014715433 TH pt time = 2.616014201194048 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.7084695994853973 TH pt time = 2.8024527989327908 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.7884829975664616 TH pt time = 2.4237345717847347 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.89030060172081 TH pt time = 2.5852439925074577 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.724627785384655 TH pt time = 2.651805803179741 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.249914798885584 TH pt time = 2.0440668053925037 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.722136974334717 TH pt time = 2.531316000968218 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.7164162024855614 TH pt time = 2.8521843999624252 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.8891782090067863 TH pt time = 2.436912599951029 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.0049769952893257 TH pt time = 2.649025786668062 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.7299130037426949 TH pt time = 2.67714099958539 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.799382768571377 TH pt time = 2.4427592009305954 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.0201382003724575 TH pt time = 2.6285660080611706 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.6983320042490959 TH pt time = 2.9118607938289642 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.8802538104355335 TH pt time = 2.385452575981617 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.9600497893989086 TH pt time = 2.594646792858839 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.5688861943781376 TH pt time = 2.5941073894500732 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.7758505940437317 TH pt time = 2.336081601679325 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.6135251857340336 TH pt time = 2.3902921937406063 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.6303061917424202 TH pt time = 2.6228136010468006 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.8868251852691174 TH pt time = 2.5620524026453495 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.057632204145193 TH pt time = 2.691414188593626 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.7316274009644985 TH pt time = 3.14683198928833 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.2674955762922764 TH pt time = 2.602821197360754 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.0993166007101536 TH pt time = 2.609328981488943 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.7257938012480736 TH pt time = 2.9255208000540733 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.3086097799241543 TH pt time = 2.544360812753439 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.0537622049450874 TH pt time = 2.6343842037022114 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.7194169983267784 TH pt time = 2.9009717889130116 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.6461398042738438 TH pt time = 2.3600555770099163 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.6328082010149956 TH pt time = 2.415131386369467 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.6832938082516193 TH pt time = 2.6299685798585415 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.9594415985047817 TH pt time = 2.509857602417469 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.956229578703642 TH pt time = 2.691046390682459 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.7222409918904305 TH pt time = 2.938339803367853 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.9467295855283737 TH pt time = 2.4219116009771824 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.0215882137417793 TH pt time = 2.7782391756772995 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.719242412596941 TH pt time = 2.8529402054846287 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.8062099777162075 TH pt time = 2.9951974004507065 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.1621821969747543 TH pt time = 2.5330167822539806 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.690075010061264 TH pt time = 2.5531245954334736 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.832614816725254 TH pt time = 2.339891381561756 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.7835668064653873 TH pt time = 2.513139396905899 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.7026367820799351 TH pt time = 2.796882800757885 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.6479675993323326 TH pt time = 2.4971639923751354 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.9846629686653614 TH pt time = 2.4657804146409035 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.5969022028148174 TH pt time = 2.697007991373539 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.7602720074355602 TH pt time = 2.4498093873262405 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.963611613959074 TH pt time = 2.6310251839458942 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.7004458084702492 TH pt time = 2.9164502024650574 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.887732572853565 TH pt time = 2.4575488083064556 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.8350806050002575 TH pt time = 2.23197178915143 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.5626789852976799 TH pt time = 2.704860605299473 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.6168799959123135 TH pt time = 2.2481359727680683 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.5654693879187107 TH pt time = 2.2636358067393303 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.6836861930787563 TH pt time = 2.825192976742983 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.7971909940242767 TH pt time = 2.471243590116501 [1, 1024, 14, 14, 512, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.8480279818177223 TH pt time = 2.553586605936289 [1, 512, 14, 14, 512, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.7191735878586769 TH pt time = 2.6465672068297863 [1, 512, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 2.7811027877032757 TH pt time = 2.457349617034197 [1, 1024, 14, 14, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 5.434317365288734 TH pt time = 4.639615211635828 [1, 1024, 14, 14, 1024, 3, 3, 2, 2, 1, 1, 32] MKLDNN pt time = 0.9400106035172939 TH pt time = 2.9971951991319656 [1, 1024, 7, 7, 2048, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 4.494664408266544 TH pt time = 3.478870000690222 [1, 1024, 14, 14, 2048, 1, 1, 2, 2, 0, 0, 1] MKLDNN pt time = 4.8432330042123795 TH pt time = 3.6410867795348167 [1, 2048, 7, 7, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 4.779010973870754 TH pt time = 3.4093930013477802 [1, 1024, 7, 7, 1024, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.8385192044079304 TH pt time = 3.0921380035579205 [1, 1024, 7, 7, 2048, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 3.9088409766554832 TH pt time = 3.130124807357788 [1, 2048, 7, 7, 1024, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 4.0072557888925076 TH pt time = 2.977220807224512 [1, 1024, 7, 7, 1024, 3, 3, 1, 1, 1, 1, 32] MKLDNN pt time = 0.8867520093917847 TH pt time = 3.1505179964005947 [1, 1024, 7, 7, 2048, 1, 1, 1, 1, 0, 0, 1] MKLDNN pt time = 4.118196591734886 TH pt time = 3.46621660515666 ``` Reviewed By: dzhulgakov Differential Revision: D22250817 fbshipit-source-id: c9dc61b633e11a378a05810d711a696effd7f02b	2020-07-10 16:43:29 -07:00
Meghan Lele	ce3ba3b9bc	[JIT] Add support for backend-lowered submodules (#41146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41146 Summary This commit adds support for using `Modules` that have been lowered as submodules in `ScriptModules`. Test Plan This commit adds execution and save/load tests to test_backends.py for backend-lowered submodules. Fixes This commit fixes #40069. Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D22459543 Pulled By: SplitInfinity fbshipit-source-id: 02e0c0ccdce26c671ade30a34aca3e99bcdc5ba7	2020-07-10 16:35:24 -07:00
Kimish Patel	1f2e91fa4f	Impilcit casting resulting internal build failure. (#41272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41272 Implicit casting from int to float is resulting in vec256_test build failure internally. This diff fixes that. Test Plan: Build vec256_test for android and run it on android phone. Reviewed By: ljk53, paulshaoyuqiao Differential Revision: D22484635 fbshipit-source-id: ebb9fc2eccb8261ab01d8266150fc3b05166f1e7	2020-07-10 16:29:54 -07:00
Nikita Shulga	7bae5780a2	Revert D22329069: Self binning histogram Test Plan: revert-hammer Differential Revision: D22329069 (`16c8146da9`) Original commit changeset: 28406b94e284 fbshipit-source-id: fc7f3f8c968d1ec7d2a1cf7a4d05900f51055d82	2020-07-10 16:22:29 -07:00
Ksenija Stanojevic	dd0c98d82a	[ONNX]Add tests for ConvTranspose 1D and 3D (#40703 ) Summary: Add tests for ConvTranspose 1D and 3D Pull Request resolved: https://github.com/pytorch/pytorch/pull/40703 Reviewed By: hl475 Differential Revision: D22480087 Pulled By: houseroad fbshipit-source-id: 92846ed7181f543af20669e5ea191bfb5522ea13	2020-07-10 16:10:09 -07:00
Sebastian Messmer	9daba76ba1	Change to.dtype_layout to c10-full (#41169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41169 - ghstack-source-id: 107537240 Test Plan: waitforsandcastle Differential Revision: D22289257 fbshipit-source-id: ed3cc06327951fa886eb3b8f1c8bcc014ae2bc41	2020-07-10 16:04:34 -07:00
Steve Sylvain	7c143e5d3e	Reducing size of docker Linux image (#41207 ) Summary: # Description The goal is to reduce the size of the docker image. I checked a few things: * Docker layer overlaps * Removing .git folder * Removing intermediate build artifacts (.o and .a) The only one that gave satisfying result was the 3rd approach, removing .o and .a. The final image went from 10 GB to 9.7 GB. I used Dive (https://github.com/wagoodman/dive) to inspect the Docker image manually. # Test: * Check the image size was reduced * No test failures in CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/41207 Test Plan: * Check the image size was reduced * No test failures in CI Differential Revision: D22465221 Pulled By: ssylvain fbshipit-source-id: 48754259729401e3c08447b0fa0630ca7217cb98	2020-07-10 15:59:18 -07:00
xueht-fnst	0651887eb4	Improve repr for torch.iinfo & torch.finfo (#40488 ) Summary: - fix https://github.com/pytorch/pytorch/issues/39991 - Include directly `min`/`max`/`eps`/`tiny` values in repr of `torch.iinfo` & `torch.finfo` for inspection - Use `torch.float16` / `torch.int16` instead of uncorrespond names `Half` / `Short` - The improved repr is shown just like: ``` >>> torch.iinfo(torch.int8) iinfo(type=torch.int8, max=127, min=-128) >>> torch.iinfo(torch.int16) iinfo(type=torch.int16, max=32767, min=-32768) >>> torch.iinfo(torch.int32) iinfo(type=torch.int32, max=2.14748e+09, min=-2.14748e+09) >>> torch.iinfo(torch.int64) iinfo(type=torch.int64, max=9.22337e+18, min=-9.22337e+18) >>> torch.finfo(torch.float16) finfo(type=torch.float16, eps=0.000976563, max=65504, min=-65504, tiny=6.10352e-05) >>> torch.finfo(torch.float32) finfo(type=torch.float32, eps=1.19209e-07, max=3.40282e+38, min=-3.40282e+38, tiny=1.17549e-38) >>> torch.finfo(torch.float64) finfo(type=torch.float64, eps=2.22045e-16, max=1.79769e+308, min=-1.79769e+308, tiny=2.22507e-308) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/40488 Differential Revision: D22445301 Pulled By: mruberry fbshipit-source-id: 552af9904c423006084b45d6c4adfb4b5689db54	2020-07-10 15:22:55 -07:00
Peter Bell	cb6c3526c6	Migrate addmm, addbmm and THBlas_gemm to ATen (#40927 ) Summary: Resubmit #40927 Closes https://github.com/pytorch/pytorch/issues/24679, closes https://github.com/pytorch/pytorch/issues/24678 `addbmm` depends on `addmm` so needed to be ported at the same time. I also removed `THTensor_(baddbmm)` which I noticed had already been ported so was just dead code. After having already written this code, I had to fix merge conflicts with https://github.com/pytorch/pytorch/issues/40354 which revealed there was already an established place for cpu blas routines in ATen. However, the version there doesn't make use of ATen's AVX dispatching so thought I'd wait for comment before migrating this into that style. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40927 Reviewed By: ezyang Differential Revision: D22468490 Pulled By: ngimel fbshipit-source-id: f8a22be3216f67629420939455e31a88af20201d	2020-07-10 14:30:55 -07:00
Yavuz Yetim	16c8146da9	Self binning histogram (#40875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40875 This op uses the given num_bins and a spacing strategy to automatically bin and compute the histogram of given matrices. Test Plan: Unit tests. Reviewed By: neha26shah Differential Revision: D22329069 fbshipit-source-id: 28406b94e284d52d875f73662fc82f93dbc00064	2020-07-10 13:55:42 -07:00
Ksenija Stanojevic	9b0393fcf1	[ONNX]Fix export of flatten (#40418 ) Summary: Shape is passed to _reshape_to_tensor as a Constant and cannot infer shape of the input when model is exported with dynamic axes set. Instead of a Constant pass output of a subgraph Shape-Slice-Concat to compute the shape for the Reshape node in _reshape_to_tensor function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40418 Reviewed By: hl475 Differential Revision: D22480127 Pulled By: houseroad fbshipit-source-id: 11853adb6e6914936871db1476916699141de435	2020-07-10 13:06:25 -07:00
Linbin Yu	a548c6b18f	add check for duplicated op registration in JIT (#41214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41214 Same as D21032976, add check for duplicated op name in JIT Test Plan: run full JIT predictor also buck test pytorch-playground Reviewed By: smessmer Differential Revision: D22467871 fbshipit-source-id: 9b7a40a217e6c63cca44cad54f9f657b8b207a45	2020-07-10 12:19:04 -07:00
Jade Nie	75b6dd3d49	Wrap Caffe2's SparseLengthsSum into a PyTorch op (#39596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39596 This diff wraps Caffe2's SparseLengthsSum on GPU as a PT op. Reviewed By: jianyuh Differential Revision: D21895309 fbshipit-source-id: 38bb156f9be8d28225d2b44f5b4c93d27779aff9	2020-07-10 11:19:13 -07:00
Michael Carilli	d927aee312	Small clarification of torch.cuda.amp multi-model example (#41203 ) Summary: some people have been confused by `retain_graph` in the snippet, they thought it was an additional requirement imposed by amp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41203 Differential Revision: D22463700 Pulled By: ngimel fbshipit-source-id: e6fc8871be2bf0ecc1794b1c6f5ea99af922bf7e	2020-07-10 11:13:26 -07:00
Venkata Chintapalli	4a09501fbe	LogitOp LUT based fake FP16 Op. (#41258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41258 LogitOp LUT based fake FP16 Op. (Note: this ignores all push blocking failures!) Test Plan: test_op_nnpi_fp16.py covers the test_logit testing. Reviewed By: hyuen Differential Revision: D22351963 fbshipit-source-id: e2ed2bd9bfdc58c6f823d7d41557109c08628bd7	2020-07-10 10:53:42 -07:00
Omkar Salpekar	33f9fbf8ba	Modularize parsing NCCL_BLOCKING_WAIT in ProcessGroupNCCL (#41076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41076 Modularizes Parsing the NCCL_BLOCKING_WAIT environment variable in the ProcessGroupNCCL Constructor. ghstack-source-id: 107491850 Test Plan: Sandcastle/CI Differential Revision: D22401225 fbshipit-source-id: 79866d3f4f1a617cdcbca70e3bea1ce9dcac3316	2020-07-10 10:47:38 -07:00
anjali411	db38487ece	Autograd Doc for Complex Numbers (#41012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41012 Test Plan: Imported from OSS Differential Revision: D22476911 Pulled By: anjali411 fbshipit-source-id: 7da20cb4312a0465272bebe053520d9911475828	2020-07-10 09:57:43 -07:00
Natalia Gimelshein	e568b3fa2d	test nan and inf in TestTorchMathOps (#41225 ) Summary: Per title. `lgamma` produces a different result for `-inf` compared to scipy, so there comparison is skipped. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41225 Differential Revision: D22473346 Pulled By: ngimel fbshipit-source-id: e4ebda1b10e2a061bd4cef38d1d7b5bf0f581790	2020-07-10 09:46:46 -07:00
Jianyu Huang	62e16934cb	[caffe2] Add the dedup implementation of fused RowWiseAdagrad op on GPUs (#40282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40282 Test Plan: ``` buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient $caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps$' --print-passing-details ``` https://our.intern.facebook.com/intern/testinfra/testrun/4785074632584150 Reviewed By: jspark1105 Differential Revision: D22102737 fbshipit-source-id: fa3fef7cecb1e2cf5c9b6019579dc0f86fd3a3b2	2020-07-10 09:05:24 -07:00
Ilia Cherniavskii	08227072e2	Benchmark RecordFunction overhead on some models (#40952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40952 Adding a benchmark to measure RecordFunction overhead, currently on resnet50 and lstm models Test Plan: python benchmarks/record_function_benchmark/record_function_bench.py Benchmarking RecordFunction overhead for lstm_jit Running warmup... finished Running 100 iterations with RecordFunction... finished N = 100, avg. time: 251.970 ms, stddev: 39.348 ms Running 100 iterations without RecordFunction... finished N = 100, avg. time: 232.828 ms, stddev: 24.556 ms Reviewed By: dzhulgakov Differential Revision: D22368357 Pulled By: ilia-cher fbshipit-source-id: bff4f4e0e06fb80fdfcf85966c2468e48ed7bc98	2020-07-10 08:46:19 -07:00
Kimish Patel	8a79eec98a	Add add_relu fusion pass to optimize_for_mobile. (#40252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40252 As title says. Test Plan: python test/test_mobile_optimizer.py Imported from OSS Differential Revision: D22126825 fbshipit-source-id: a1880587ba8db9dee0fa450bc463734e4a8693d9	2020-07-10 08:10:22 -07:00
Heitor Schueroff de Souza	75a4862f63	Added SiLU activation function (#41034 ) Summary: Implemented the SiLU activation function as discussed in https://github.com/pytorch/pytorch/issues/3169. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41034 Reviewed By: glaringlee Differential Revision: D22465203 Pulled By: heitorschueroff fbshipit-source-id: b27d064529fc99600c586ad49b594b52b718b0d2	2020-07-10 07:37:30 -07:00
Shen Li	f6eb92a354	Expose private APIs to enable/disable pickling ScriptModules without RPC (#39631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39631 Background: Currently, we cannot send ScriptModule over RPC as an argument. Otherwise, it would hit the following error: > _pickle.PickleError: ScriptModules cannot be deepcopied using > copy.deepcopy or saved using torch.save. Mixed serialization of > script and non-script modules is not supported. For purely > script modules use my_script_module.save(<filename>) instead. Failed attempt: tried to install `torch.jit.ScriptModule` to RPC's dispatch table, but it does not work as the dispatch table only matches exact types and using base type `torch.jit.ScriptModule` does not work for derived typed. Current solution: The current solution exposes `_enable_jit_rref_pickle` and `_disable_jit_rref_pickle` APIs to toggle the `allowJitRRefPickle` flag. See `test_pickle_script_module_with_rref` as an example. Test Plan: Imported from OSS Differential Revision: D21920870 Pulled By: mrshenli fbshipit-source-id: 4d58afce5d0b4b81249b383c173488820b1a47d6	2020-07-10 07:27:51 -07:00
rohithkrn	df252c059c	[ROCm] Skip caffe2 unique op test for rocm3.5 (#41219 ) Summary: unique op test failure in caffe2 blocks upgrading CI to rocm3.5.1. Skipping the test to unblock will re-enable after root causing and fixing the issue. jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41219 Differential Revision: D22471452 Pulled By: xw285cornell fbshipit-source-id: 9e503c8b37c0a4b92632f77b2f8a90281a9889c3	2020-07-09 20:00:29 -07:00
Hector Yuen	a79b416847	make Int8 FC bias quantization use round flush to infinity Summary: the current quantization rounding function uses fbgemm which defaults to round to nearest. The current implementation of hw uses round flush to infinity. Adding such an option to switch the mode of rounding. Test Plan: ran against test_fc_int8 Reviewed By: venkatacrc Differential Revision: D22452306 fbshipit-source-id: d2a1fbfc695612fe07caaf84f52669643507cc9c	2020-07-09 17:25:41 -07:00
Nikita Shulga	7c2c752e6d	Revert D22458928: [pytorch][PR] Use explicit templates in CUDALoops kernels Test Plan: revert-hammer Differential Revision: D22458928 (`e374280768`) Original commit changeset: cca623bb6e76 fbshipit-source-id: 6dd24f783ec3b781140f314716ffb02f0892c57a	2020-07-09 16:31:50 -07:00
Kimish Patel	c5dcf056ee	JIT pass for add relu fusion. (#39343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39343 Building on top of previous PR that adds fused add_relu op, this PR adds a JIT pass to transform input graph to find all fusable instancs of add + relu and fuses them. Test Plan: python test/test_jit.py TestJit.test_add_relu_fusion Imported from OSS Differential Revision: D21822396 fbshipit-source-id: 12c7e8db54c6d70a2402b32cc06c7e305ffbb1be	2020-07-09 16:25:13 -07:00
Kimish Patel	82c9f79e0e	Add fused add_relu op. (#39342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39342 Many networks such as resnet have adds followed by relu. This op is the first step in enabling this fused implementation. Once we have the fused add_relu op, a JIT pass will be written to replace add + relu patterns with add_relu. Test Plan: python test/test_nn.py TestAddRelu Imported from OSS Differential Revision: D21822397 fbshipit-source-id: 03df83a3e46ddb48a90c5a6f755227a7e361a0e8	2020-07-09 16:25:11 -07:00
Kimish Patel	d6feb6141f	[Vec256][neon] Add neon backend for vec256 (#39341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39341 This PR introduces neon backend for vec256 class for float datatype. For now only aarch64 is enabled due to few issues with enabling in aarch32 bit. Test Plan: vec256_test Imported from OSS Differential Revision: D21822399 fbshipit-source-id: 3851c4336d93d1c359c85b38cf19904f82bc7b8d	2020-07-09 16:25:09 -07:00
Kimish Patel	bddba1e336	Add benchmark for add op. (#40059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40059 This benchmark is added specifically for mobile to see if compiler is autovectorizing and thus we have no advantage of neon backend for vec256 for add op. Test Plan: CI Imported from OSS Differential Revision: D22055146 fbshipit-source-id: 43ba6c4ae57c6f05d84887c2750ce21ae1b0f0b5	2020-07-09 16:22:55 -07:00
Luca Wehrstedt	dde3d5f4a8	[RPC docs] Remove mention of TensorPipe's SHM and CMA backends as they're not built (#41200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41200 In short, we messed up. The SHM and CMA backends of TensorPipe are Linux-specific and thus they are guarded by a #ifdef in the agent's code. Due to a mishap with CMake (due the fact that TensorPipe has two CMake files, one for PyTorch and a "standalone" one) we were not correctly propagating some flags and these #ifdefs were always false. This means that these two backends have always been disabled and have thus never been covered by our OSS CI. It would be irresponsible to enable them now in v1.6, so instead we remove any mention of them from the docs. Note that this is perhaps not as bad as it sounds. These two backends were providing higher performance (latency) when the two endpoints were on the same machine. However, I suspect that most RPC users will only do transfers across machines, for which SHM and CMA wouldn't have played any role. ghstack-source-id: 107458630 Test Plan: Docs only Differential Revision: D22462158 fbshipit-source-id: 0d72fea11bcaab6d662184bbe7270529772a5e9b	2020-07-09 15:33:07 -07:00
mattip	a88099ba3e	restore old documentation references (#39086 ) Summary: Fixes gh-39007 We replaced actual content with links to generated content in many places to break the documentation into manageable chunks. This caused references like ``` https://pytorch.org/docs/stable/torch.html#torch.flip ``` to become ``` https://pytorch.org/docs/master/generated/torch.flip.html#torch.flip ``` The textual content that was located at the old reference was replaced with a link to the new reference. This PR adds a `<p id="xxx"/p>` reference next to the link, so that the older references from outside tutorials and forums still work: they will bring the user to the link that they can then follow through to see the actual content. The way this is done is to monkeypatch the sphinx writer method that produces the link. It is ugly but practical, and in my mind not worse than adding javascript to do the same thing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39086 Differential Revision: D22462421 Pulled By: jlin27 fbshipit-source-id: b8f913b38c56ebb857c5a07bded6509890900647	2020-07-09 15:20:10 -07:00
Nikita Shulga	b952eaf668	Preserve CUDA gencode flags (#41173 ) Summary: Add `torch._C._cuda_getArchFlags()` that returns list of architecture `torch_cuda` were compiled with Add `torch.cuda.get_arch_list()` and `torch.cuda.get_gencode_flags()` methods that returns architecture list and gencode flags PyTorch were compiled with Print warning if some of GPUs is not compatible with any of the CUBINs Pull Request resolved: https://github.com/pytorch/pytorch/pull/41173 Differential Revision: D22459998 Pulled By: malfet fbshipit-source-id: 65d40ae29e54a0ba0f3f2da11b821fdb4d452d95	2020-07-09 14:59:35 -07:00
Nikita Shulga	e374280768	Use explicit templates in CUDALoops kernels (#41059 ) Summary: Follow up after https://github.com/pytorch/pytorch/pull/40992 Use explicit templates instead of lambdas to reduce binary size without affecting the perf by 100-200Kb per arch per CU, namely: BinaryMulDivKernel.cu 3.8Mb -> 3.5Mb CompareEQKernel.cu 1.8Mb -> 1.7Mb BinaryAddSubKernel.cu 2.0Mb -> 1.8Mb BinaryBitwiseOpsKernels.cu 2.6Mb -> 2.3Mb Pull Request resolved: https://github.com/pytorch/pytorch/pull/41059 Differential Revision: D22458928 Pulled By: malfet fbshipit-source-id: cca623bb6e769cfe372977b08463d98b1a02dd14	2020-07-09 14:55:38 -07:00
Nikita Shulga	1f1351488e	Revert D21870844: Create lazy_dyndeps to avoid caffe2 import costs. Test Plan: revert-hammer Differential Revision: D21870844 (`07fd5f8ff9`) Original commit changeset: 3f65fedb65bb fbshipit-source-id: 4f661072d72486a9c14711e368247b3d30e28af9	2020-07-09 14:18:38 -07:00
Yujun Zhao	22f940b7bd	add clang code coverage compile flags (#41103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41103 add a CLANG_CODE_COVERAGE option to CMakeList. If the option is ON, add code coverage needed compile flags. Test Plan: Clone pytorch source code to local, modified these changes and builded it with `CLANG_CODE_COVERAGE ON` and `BUILD_TESTS ON`. Run a manual test and attach code coverage report. {F243609020} Reviewed By: malfet Differential Revision: D22422513 fbshipit-source-id: 27a31395c31b5b5f4b72523954722771d8f61080	2020-07-09 14:14:18 -07:00
Xiao Wang	2cf31fb577	Fix max_pool2d perf regression (#41174 ) Summary: The two pointer variables `ptr_top_diff` and `ptr_top_mask` were introduced in https://github.com/pytorch/pytorch/issues/38953. Some end-to-end testing showed training performance regression due to this change. The performance is restored after removing the two pointer variables, and adding offset directly below in the indexing [ ] calculations. See PR change https://github.com/pytorch/pytorch/pull/38953/files#diff-8085d370f4e98295074a51b8a1f829e9R187-R188 `e4a3c584d5/aten/src/ATen/native/cuda/DilatedMaxPool2d.cu (L186-L195)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/41174 Differential Revision: D22451565 Pulled By: ngimel fbshipit-source-id: 37ed6b9fd785e1be31a027ef5d60794656cc575a	2020-07-09 14:00:05 -07:00
SsnL	1922f2212a	Make IterableDataset dataloader.__len__ warning clearer (#41175 ) Summary: Based on discussion with jlucier (https://github.com/pytorch/pytorch/pull/38925#issuecomment-655859195) . `batch_size` change isn't made because data loader only has the notion of `batch_sampler`, not batch size. If `batch_size` dependent sharding is needed, users can still access it from their own code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41175 Differential Revision: D22456525 Pulled By: zou3519 fbshipit-source-id: 5281fcf14807f219de06e32107d5fe7d5b6a8623	2020-07-09 13:49:29 -07:00
Meghan Lele	e84ef45dd3	[JIT] Fix JIT triage workflow (#41170 ) Summary: Summary This commit fixes the JIT triage workflow based on testing done in my own fork. Test Plan This commit has been tested against my own fork. This commit is currently at the tip of my master branch, and if you open an issue in my fork and label it JIT, it will be added to the Triage Review project in that fork under the Needs triage column. Old issue that is labelled JIT later <img width="700" alt="Captura de Pantalla 2020-07-08 a la(s) 6 59 42 p m" src="https://user-images.githubusercontent.com/4392003/86988551-5b805100-c14d-11ea-9de3-072916211f24.png"> New issue that is opened with the JIT label <img width="725" alt="Captura de Pantalla 2020-07-08 a la(s) 6 59 17 p m" src="https://user-images.githubusercontent.com/4392003/86988560-60dd9b80-c14d-11ea-94f0-fac01a0d239b.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/41170 Differential Revision: D22460584 Pulled By: SplitInfinity fbshipit-source-id: 278483cebbaf3b35e5bdde2a541513835b644464	2020-07-09 12:40:01 -07:00
Jerry Zhang	c1fa74b2d7	[quant][refactor] test_only_eval_fn (#41078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41078 Test Plan: Imported from OSS Differential Revision: D22420699 fbshipit-source-id: cf105cd41d83036df65c6bb3147cc14aaf755897	2020-07-09 12:34:05 -07:00
Alexander Grund	7c29a4e66f	Don't add NCCL dependency to gloo if system NCCL is used (#41180 ) Summary: This avoids a (currently only) warning of cmake: ``` The dependency target "nccl_external" of target "gloo_cuda" does not exist. Call Stack (most recent call first): CMakeLists.txt:411 (include) ``` This will be a real problem once Policy CMP0046 is set which will make this warning be an error Pull Request resolved: https://github.com/pytorch/pytorch/pull/41180 Differential Revision: D22460623 Pulled By: malfet fbshipit-source-id: 0222b12b435e5e2fdf2bc85752f95abba1e3d4d5	2020-07-09 12:10:29 -07:00
HC Zhu	2252188e85	[caffe2] Fix spatial_batch_norm_op dividision-by-zero crash (#40806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40806 When the input is empty, the operator will crash on "runtime error: division by zero". This has been causing Inference platform server crashes. Example crash logs: {P134526683} Test Plan: Unit test See reproducing steps in the Test Plan of D22300135 Reviewed By: houseroad Differential Revision: D22302089 fbshipit-source-id: aaa5391fddc86483b0f3aba3efa7518e54913635	2020-07-09 12:04:11 -07:00
Linbin Yu	df1f8a48d8	add null check for c2 tensor conversion (#41096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41096 The spark spot model had some issues in tensor conversion, see P134598596. It happens when we convert an undefined c10 tensor to caffe2 tensor. This diff added a null check. Test Plan: spark spot model runs without problem Reviewed By: smessmer Differential Revision: D22330705 fbshipit-source-id: dfe0f29a48019b6611cad3fd8f2ae49e8db5427e	2020-07-09 11:44:23 -07:00
Thomas Viehmann	a318234eb0	Print raising warnings in Python rather than C++ if other error occurs (#41116 ) Summary: When we return to Python from C++ in PyTorch and have warnings and and error, we have the problem of what to do when the warnings throw because we can only throw one error. Previously, if we had an error, we punted all warnings to the C++ warning handler which would write them to stderr (i.e. system fid 2) or pass them on to glog. This has drawbacks if an error happened: - Warnings are not handled through Python even if they don't raise, - warnings are always printed with no way to suppress this, - the printing bypasses sys.stderr, so Python modules wanting to modify this don't work (with the prominent example being Jupyter). This patch does the following instead: - Set the warning using standard Python extension mechanisms, - if Python decides that this warning is an error and we have a PyTorch error, we print the warning through Python and clear the error state (from the warning). This resolves the three drawbacks discussed above, in particular it fixes https://github.com/pytorch/pytorch/issues/37240 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/41116 Differential Revision: D22456393 Pulled By: albanD fbshipit-source-id: c3376735723b092efe67319321a8a993402985c7	2020-07-09 11:38:07 -07:00
Colin L Reliability Rice	07fd5f8ff9	Create lazy_dyndeps to avoid caffe2 import costs. (#39488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39488 Currently caffe2.InitOpLibrary does the dll import uniliaterally. Instead if we make a lazy version and use it, then many pieces of code which do not need the caffe2urrenoperators get a lot faster. One a real test, the import time went from 140s to 68s. 8s. This also cleans up the algorithm slightly (although it makes a very minimal difference), by parsing the list of operators once, rather than every time a new operator is added, since we defer the RefreshCall until after we've imported all the operators. The key way we maintain safety, is that as soon as someone does an operation which requires a operator (or could), we force importing of all available operators. Future work could include trying to identify which code is needed for which operator and only import the needed ones. There may also be wins available by playing with dlmopen (which opens within a namespace), or seeing if the dl flags have an impact (I tried this and didn't see an impact, but dlmopen may make it better). Test Plan: I added a new test a lazy_dyndep_test.py (copied from all_compare_test.py). I'm a little concerned that I don't see any explicit tests for dyndep, but this should provide decent coverage. Differential Revision: D21870844 fbshipit-source-id: 3f65fedb65bb48663670349cee5e1d3e22d560ed	2020-07-09 11:34:57 -07:00
Negin Raoof	f69d6a7ea3	[ONNX] Update Default Value of recompute_scale_factor in Interpolate (#39453 ) Summary: This is a duplicate of https://github.com/pytorch/pytorch/pull/38362 "This PR completes Interpolate's deprecation process for recomputing the scales values, by updating the default value of the parameter recompute_scale_factor as planned for pytorch 1.6.0. The warning message is also updated accordingly." I'm recreating this PR as previous one is not being updated. cc gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/39453 Reviewed By: hl475 Differential Revision: D21955284 Pulled By: houseroad fbshipit-source-id: 911585d39273a9f8de30d47e88f57562216968d8	2020-07-09 11:32:49 -07:00
Vasiliy Kuznetsov	9b3a212d30	quantizer.cpp: fix cuda memory pinning (#41139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41139 Fixes the test case in https://github.com/pytorch/pytorch/issues/41115 by using PyTorch's CUDA allocator instead of the old Caffe2 one. Test Plan: run the test case from the issue: https://gist.github.com/vkuzo/6d013aa1645cb986d0d4464a931c779b let's run CI and see what it uncovers Imported from OSS Reviewed By: malfet Differential Revision: D22438787 fbshipit-source-id: 0853b0115d198a99c43e6176aef34ea951bf5c2e	2020-07-09 11:14:58 -07:00
Michael Suo	62cee0001e	Move async + serialization implementation out of 'jit/__init__.py' (#41018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41018 See https://github.com/pytorch/pytorch/pull/40807 for context. Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D22393869 Pulled By: suo fbshipit-source-id: a71cc571a423ccb81cd148444dc2a18d2ee43464	2020-07-09 10:10:01 -07:00
Ashkan Aliabadi	c8deca8ea8	Update pthreadpool to pthreadpool:029c88620802e1361ccf41d1970bd5b07fd6b7bb. (#40524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40524 Reviewed By: ezyang Differential Revision: D22215742 Pulled By: AshkanAliabadi fbshipit-source-id: ef594e0901337a92b21ddd44e554da66c723eb7c	2020-07-09 10:00:36 -07:00
Nikita Shulga	c038f8afcc	Do not install nvidia docker for non-NVIDIA configs (#41144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41144 Differential Revision: D22457124 Pulled By: malfet fbshipit-source-id: e615199cb78b315aa700efcc7332ebf4299212bf	2020-07-09 09:24:26 -07:00
Zino Benaissa	690946c49d	Generalize constant_table from tensor only to ivalue (#40718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40718 Currently only constant except tensor must be inlined during serialization. Tensor are stored in the contant table. This patch generalizes this capability to any IValue. This is particularly useful for non ASCII string literal that cannot be inlined. Test Plan: Imported from OSS Differential Revision: D22298169 Pulled By: bzinodev fbshipit-source-id: 88cc59af9cc45e426ca8002175593b9e431f4bac	2020-07-09 09:09:40 -07:00
generatedunixname89002005287564	86f72953dd	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D22452776 fbshipit-source-id: a103da6a5b1db7f1c91ca25490358da268fdfe96	2020-07-09 08:49:32 -07:00
anjali411	3e26709a4e	Remove copy_ warnings for angle and abs for complex tensors (#41152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41152 fixes https://github.com/pytorch/pytorch/issues/40838 Test Plan: Imported from OSS Differential Revision: D22444357 Pulled By: anjali411 fbshipit-source-id: 2879d0cffc0a011c624eb8e00c7b64bd33522cc3	2020-07-09 08:05:36 -07:00
Edward Yang	7ff7c9738c	Revert D22418756: [pytorch][PR] Migrate addmm, addbmm and THBlas_gemm to ATen Test Plan: revert-hammer Differential Revision: D22418756 (`6725c034b6`) Original commit changeset: 44e7bb596426 fbshipit-source-id: cbaaf3ad277648901700ef0e47715580e8f8e0dc	2020-07-09 07:47:19 -07:00
Rohan Varma	bf9cc5c776	Add callback with TLS state API in futures (#40326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40326 Adds a helper function `addCallbackWithTLSState` to both torch/csrc/utils/future.h which is used internally by RPC framework and the JIT future. Uses this helper function to avoid to pass in TLS state where it is needed for rpc and `record_function_ops.cpp`. For example, the following: ``` at::ThreadLocalState tls_state; fut->addCallback([tls_state = std::move(tls_state)]() { at::ThreadLocalStateGuard g(tls_state); some_cb_that_requires_tls_state(); } ``` becomes ``` fut->addCallbackWithTLSState(some_cb_that_requires_tls_state); ``` ghstack-source-id: 107383961 Test Plan: RPC Tests and added a test in test_misc.cpp Differential Revision: D22147634 fbshipit-source-id: 46c02337b90ee58ca5a0861e932413c40d06ed4c	2020-07-08 23:25:35 -07:00
Natalia Gimelshein	155fb22e77	Run single-threaded gradgradcheck in testnn (#41147 ) Summary: Reland https://github.com/pytorch/pytorch/issues/40999 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41147 Reviewed By: mruberry Differential Revision: D22450357 Pulled By: ngimel fbshipit-source-id: 02b6e020af5e6ef52542266bd9752b9cfbec4159	2020-07-08 22:53:27 -07:00
Dmytro Dzhulgakov	8e2841781e	[easy] Use torch.typename in JIT error messages (#41024 ) Summary: Noticed while trying to script one of the models which happened to have numpy values as constants. Lacking the numpy prefix in the error message was quite confusing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41024 Differential Revision: D22426399 Pulled By: dzhulgakov fbshipit-source-id: 06158b75355fac6871e4861f82fc637c2420e370	2020-07-08 21:49:37 -07:00
Ann Shan	33e26656fa	list workaround for CREATE_OBJECT failure (#41129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41129 Test Plan: Imported from OSS Differential Revision: D22436064 Pulled By: ann-ss fbshipit-source-id: 7cfc38eb953410edfe3d21346c6e377c3b3bfc1f	2020-07-08 18:36:04 -07:00
lcskrishna	302cf6835e	[ROCm][Caffe2] Enable MIOpen 3D Pooling (#38260 ) Summary: This PR contains the following updates: 1. MIOpen 3D pooling enabled in Caffe2. 2. Refactored the MIOpen pooling code in caffe2. 3. Enabled unit test cases for 3D pooling. CC: ezyang jeffdaily ashishfarmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/38260 Differential Revision: D21524754 Pulled By: xw285cornell fbshipit-source-id: ddfe09dc585cd61e42eee22eff8348d326fd0c3b	2020-07-08 17:42:55 -07:00
Eli Uriegas	f71cccc457	test: Add option to continue testing through error (#41136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41136 Running this within CI seems impossible since this script exits out after one failed test, so let's just add an option that CI can use to power through these errors. Should not affect current functionality. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Differential Revision: D22441694 Pulled By: seemethere fbshipit-source-id: 7f152fea15af9d47a964062ad43830818de5a109	2020-07-08 17:26:13 -07:00
Yanan Cao	04004bf10c	Fix a minor typo "forget add" -> "forget to add" (#41131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41131 Differential Revision: D22441122 Pulled By: gmagogsfm fbshipit-source-id: 383ef167b7742e2f211d1cae010b6ebb37c6e7a0	2020-07-08 17:00:42 -07:00
Meghan Lele	c7768e21b1	[JIT] Add GitHub workflow for importing issues to triage project (#41056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41056 Summary This commit adds a new GitHub workflow that automatically adds a card to the "Need triage" section of the project board for tracking JIT triage for each new issue that is opened and labelled "jit". Test Plan ??? Test Plan: Imported from OSS Differential Revision: D22444262 Pulled By: SplitInfinity fbshipit-source-id: 4e7d384822bffb978468c303322f3e2c04062644	2020-07-08 17:00:40 -07:00
Peter Bell	6725c034b6	Migrate addmm, addbmm and THBlas_gemm to ATen (#40927 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24679, closes https://github.com/pytorch/pytorch/issues/24678 `addbmm` depends on `addmm` so needed to be ported at the same time. I also removed `THTensor_(baddbmm)` which I noticed had already been ported so was just dead code. After having already written this code, I had to fix merge conflicts with https://github.com/pytorch/pytorch/issues/40354 which revealed there was already an established place for cpu blas routines in ATen. However, the version there doesn't make use of ATen's AVX dispatching so thought I'd wait for comment before migrating this into that style. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40927 Differential Revision: D22418756 Pulled By: ezyang fbshipit-source-id: 44e7bb5964263d73ae8cc6adc5f6d4e966476ae6	2020-07-08 17:00:37 -07:00
Elias Ellison	3f32332ee6	[JIT][Easy]move remove mutation to own file (#41137 ) Summary: This should be in its own file... Pull Request resolved: https://github.com/pytorch/pytorch/pull/41137 Reviewed By: jamesr66a Differential Revision: D22437922 Pulled By: eellison fbshipit-source-id: 1b62dde1a4ebac673b5c60aea4f398f734d62501	2020-07-08 17:00:35 -07:00
Sebastian Messmer	b8d2ccf009	Unify TensorOptions signatures (#39611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39611 A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. ghstack-source-id: 107330186 Test Plan: waitforsandcastle Differential Revision: D21915788 fbshipit-source-id: 39f0e114f2766a3b27b80f93f2c1a95fa23c78d4	2020-07-08 17:00:33 -07:00
Tongzhou Wang	10caf58a52	[typing] tensor._version is int (#41125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41125 Differential Revision: D22440717 Pulled By: ezyang fbshipit-source-id: f4849c6e13f01cf247b2f64f68a621b055c8bc17	2020-07-08 17:00:30 -07:00
Mengchi Zhang	97052c5fa8	Extend SparseAdagrad fusion with stochastic rounding FP16 (#41107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41107 Extend row wise sparse Adagrad fusion op to FP16 (stochastic rounding) for PyTorch. Differential Revision: D22195408 fbshipit-source-id: e9903ca7ca3b542fb56f36580e69bb2a39b554f6	2020-07-08 16:58:53 -07:00
Eli Uriegas	af2680e9ce	Update ShipIt sync fbshipit-source-id: ceb761e28fe8c53bc53f3b82b304ea8ab0e98183	2020-07-08 16:52:13 -07:00
Shen Li	0edbe6b063	Add a link in RPC doc page to point to PT Distributed overview (#41108 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41108 Test Plan: Imported from OSS Differential Revision: D22440751 Pulled By: mrshenli fbshipit-source-id: 9e7b002091a3161ae385fdfcc26484ae8fc243bb	2020-07-08 14:00:05 -07:00
Peter Bell	9d1138afec	Remove unnecessary atomic ops in DispatchStub (#40930 ) Summary: I noticed this very unusual use of atomics in `at::native::DispatchStub`. The comment asserts that `choose_cpu_impl()` will always return the same value on every thread, yet for some reason it uses a CAS loop to exchange the value instead of a simple store? That makes no sense considering it doesn't even read the exchanged value. This replaces the CAS loop with a simple store and also improves the non-initializing case to a single atomic load instead of two. For reference, the `compare_exchange` was added in https://github.com/pytorch/pytorch/issues/32148 and the while loop added in https://github.com/pytorch/pytorch/issues/35794. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40930 Differential Revision: D22438224 Pulled By: ezyang fbshipit-source-id: d56028ce18c8c5dbabdf366379a0b6aaa41aa391	2020-07-08 13:55:11 -07:00
Eli Uriegas	ec58d739c6	.circleci: Remove pynightly jobs These jobs didn't really fulfill the intended purpose that they had once had since the travis python versions were basically locked to 3.7. Going to go ahead and remove these along with its docker jobs as well since we don't actively need them anymore. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> ghstack-source-id: cdfc4fc2ae15a0c86d322cc706d383d6bc189fbc Pull Request resolved: https://github.com/pytorch/pytorch/pull/41134	2020-07-08 13:46:42 -07:00
Brian Vaughan	dfd21ec00d	Revert D22418716: [JIT] Add support for backend-lowered submodules Test Plan: revert-hammer Differential Revision: D22418716 (`6777ea19fe`) Original commit changeset: d2b2c6d5d2cf fbshipit-source-id: 5ce177e13cab0be60020f8979f9b6c520cc8654e	2020-07-08 13:14:21 -07:00
Brian Vaughan	2bc9ee97d1	Revert D22418731: [JIT] Add out-of-source-tree to_backend tests Test Plan: revert-hammer Differential Revision: D22418731 (`e2a291b396`) Original commit changeset: 621ba4efc1b1 fbshipit-source-id: 475ae24c5b612fe285035e5ebb92ffc66780a468	2020-07-08 13:11:45 -07:00
Martin Yuan	131a0ea277	Add version number to bytecode. (#36439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36439 A proposal of versioning in bytecode, as suggested by dzhulgakov in the internal post: https://fb.workplace.com/groups/pytorch.mobile.work/permalink/590192431851054/ kProducedBytecodeVersion is added. If the model version is not the same as the number in the code, an error will be thrown. The updated bytecode would look like below. It's a tuple of elements, where the first element is the version number. ``` (3, ('__torch__.m.forward', (('instructions', (('STOREN', 1, 2), ('DROPR', 1, 0), ('MOVE', 2, 0), ('OP', 0, 0), ('RET', 0, 0))), ('operators', (('aten::Int', 'Tensor'),)), ('constants', ()), ('types', ()), ('register_size', 2)))) ``` Test Plan: Imported from OSS Differential Revision: D22433532 Pulled By: iseeyuan fbshipit-source-id: 6d62e4abe679cf91a8e18793268ad8c1d94ce746	2020-07-08 12:30:58 -07:00
Natalia Gimelshein	58d7d91f88	Return atomic (#41028 ) Summary: Per title. This is not used currently in the pytorch codebase, but it is a legitimate usecase, and we have extensions that want to do that and are forced to roll their own atomic implementations for non-standard types. Whether atomic op returns old value or not should not affect performance, compiler is able to generate correct code depending on whether return value is used. https://godbolt.org/z/DBU_UW. Atomic operations for non-standard integer types (1,2 and 8 byte-width) are left as is, with void return. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41028 Differential Revision: D22425008 Pulled By: ngimel fbshipit-source-id: ca064edb768a6b290041a599e5b50620bdab7168	2020-07-08 11:54:24 -07:00
Mike Ruberry	351407dd75	Disables unary op casting to output dtype (#41097 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41047. Some CPU kernel implementations don't call `cast_outputs()`, so when CPU temporaries were created to hold their outputs they weren't copied back to the out parameters correctly. Instead of fixing that issue, for simplicity this PR disables the behavior. The corresponding test in test_type_promotion.py is expanded with more operations to verify that unary ops can no longer have out arguments with different dtypes than their inputs (except in special cases like torch.abs which maps complex inputs to float outputs and torch.deg2rad which is secretly torch.mul). Pull Request resolved: https://github.com/pytorch/pytorch/pull/41097 Differential Revision: D22422352 Pulled By: mruberry fbshipit-source-id: 8e61d34ef1c9608790b35cf035302fd226fd9421	2020-07-08 11:48:40 -07:00
Michael Suo	c93e96fbd9	[jit] move script-related implementation out of torch/jit/__init__.py (#40902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40902 See the bottom of this stack for context. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D22360210 Pulled By: suo fbshipit-source-id: 4275127173a36982ce9ad357aa344435b98e1faf	2020-07-08 11:38:34 -07:00
rohithkrn	6c9b869930	[ROCm] Skip Conv2d, Conv3d transpose fp16 test for ROCm3.5 (#41088 ) Summary: There's a regression in MIOpen in ROCm3.5 that results in failure of autocast tests. Skipping the tests for now and will re-enable once the fixes are in MIOpen. ezyang jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41088 Differential Revision: D22419823 Pulled By: xw285cornell fbshipit-source-id: 347fb9a03368172fe0b263d14d27ee0c3efbf4f6	2020-07-08 11:13:49 -07:00
Supriya Rao	dde18041a6	[quant][graphmode] Refactor quantization patterns (#40894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40894 Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D22403901 fbshipit-source-id: e0bcf8a628c6a1acfe6fa10a52912360a619bc62	2020-07-08 10:36:25 -07:00
Richard Zou	03eec07956	Move error messages in-line in `_vmap_internals.py` (#41077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41077 This PR is a refactor that moves error messages into their callsites in `_vmap_internals.py`. Furthermore, because a little bird told me we've dropped python 3.5 support, this PR adopts f-string syntax to clean up the string replace logic. Together these changes make the error messages read better IMO. Test Plan: - `python test/test_vmap.py -v`. There exists tests that invoke each of the error messages. Differential Revision: D22420473 Pulled By: zou3519 fbshipit-source-id: cfd46b2141ac96f0a62864928a95f8eaa3052f4e	2020-07-08 08:42:56 -07:00
Linbin Yu	de4fc23381	clean up duplicated op names (#41092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41092 added overload name for some full JIT operators and removed some duplicated op registrations Test Plan: apply D21032976, then buck run fbsource//xplat/caffe2/fb/pytorch_predictor:predictor make sure there's no runtime error in operator registration Reviewed By: iseeyuan Differential Revision: D22419922 fbshipit-source-id: f651898e75b5bdb8dc03fc00b136689536c51707	2020-07-08 06:39:39 -07:00
generatedunixname89002005287564	e4fbcaa2bc	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D22429730 fbshipit-source-id: 585d8df36d7fa18a9c2d3fa54c1d333bf94464d0	2020-07-08 05:02:26 -07:00
Jerry Zhang	3d3fd13e04	[quant][graphmode][fix] filter for list append change (#41020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41020 Only support quantization of list append for List[Tensor] Test Plan: Imported from OSS Differential Revision: D22420698 fbshipit-source-id: 179677892037e136d90d16230a301620c3111063	2020-07-08 03:44:44 -07:00
Alyssa Wang	e0e8b98c43	Export logic op to pytorch Summary: Export logit op to pt for better preproc perf Test Plan: unit test Also tested with model re-generation Reviewed By: houseroad Differential Revision: D22324611 fbshipit-source-id: 86accb6b4528e5c818d2c3f8c67926f279d158d6	2020-07-08 02:27:09 -07:00
Hector Yuen	6ef94590fa	match int8 quantization of nnpi (#41094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41094 mimic nnpi's quantization operations removed redundant int8 test Test Plan: ran FC with sizes up to 5, running bigger sizes Reviewed By: venkatacrc Differential Revision: D22420537 fbshipit-source-id: 91211c8a6e4d3d3bec2617b758913b44aa44b1b1	2020-07-08 00:07:42 -07:00
Meghan Lele	e2a291b396	[JIT] Add out-of-source-tree to_backend tests (#40842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40842 Summary This commit adds out-of-source-tree tests for `to_backend`. These tests check that a Module can be lowered to a backend, exported, loaded (in both Python and C++) and executed. Fixes This commit fixes #40067. Test Plan: Imported from OSS Differential Revision: D22418731 Pulled By: SplitInfinity fbshipit-source-id: 621ba4efc1b121fa76c9c7ca377792ac7440d250	2020-07-07 21:00:43 -07:00
Meghan Lele	6777ea19fe	[JIT] Add support for backend-lowered submodules (#40841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40841 Summary This commit adds support for using `Modules` that have been lowered as submodules in `ScriptModules`. Test Plan This commit adds execution and save/load tests to test_backends.py for backend-lowered submodules. Fixes This commit fixes #40069. Test Plan: Imported from OSS Differential Revision: D22418716 Pulled By: SplitInfinity fbshipit-source-id: d2b2c6d5d2cf3042a620b3bde7d494f1abe28dc1	2020-07-07 21:00:40 -07:00
Meghan Lele	5a4c45f8d1	[JIT] Move TestBackend to test directory (#40840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40840 Summary This commit moves the TestBackend used for the JIT backend extension to the tests directory. It was temporarily placed in the source directory while figuring out some details of the user experience for this feature. Test Plan `python test/test_jit.py TestBackends` Fixes This commit fixes #40067. Test Plan: Imported from OSS Differential Revision: D22418682 Pulled By: SplitInfinity fbshipit-source-id: 9356af1341ec4d552a41c2a8929b327bc8b56057	2020-07-07 21:00:38 -07:00
Meghan Lele	3e01931e49	[JIT] Separate to_backend API into libtorch and libtorch_python (#40839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40839 Summary This commit splits the to_backend API properly into `libtorch` and `libtorch_python`. The backend interface and all of the code needed to run a graph on a backend is in libtorch, and all of the code related to creating a Python binding for the lowering process is in `libtorch_python`. Test Plan `python test/test_jit.py TestBackends` Fixes This commit fixes #40072. Test Plan: Imported from OSS Differential Revision: D22418664 Pulled By: SplitInfinity fbshipit-source-id: b96e0c34ab84e45dff0df68b8409ded57a55ab25	2020-07-07 20:58:42 -07:00
Michael Carilli	0911c1e71a	Added index_put to promotelist (#41035 ) Summary: [index_put](https://pytorch.org/docs/master/tensors.html#torch.Tensor.index_put) requires src and dst tensors to be the same dtype, so imo it belongs on the promote list when autocast is active (output should be widest dtype among input dtypes). i also put some other registrations in alphabetical order. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41035 Differential Revision: D22418305 Pulled By: ngimel fbshipit-source-id: b467cb16ac6c2ba1f9e43531f69a144b17f00b87	2020-07-07 20:36:55 -07:00
Xiang Gao	c55d8a6f62	Remove std::complex from c10::Scalar (#39831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39831 Differential Revision: D22018505 Pulled By: ezyang fbshipit-source-id: 4719c0f1673077598c5866dafc7391d9e074f4eb	2020-07-07 20:31:42 -07:00
Venkata Chintapalli	3615e344a3	Unit test case for the Int8FC to cover quantization scale errors. (#41100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41100 Unit test case for the Int8FC to cover quantization scale errors. Test Plan: test_int8_ops_nnpi.py test case test_int8_small_input. Reviewed By: hyuen Differential Revision: D22422353 fbshipit-source-id: b1c1baadc32751cd7e98e0beca8f0c314d9e5f10	2020-07-07 20:04:17 -07:00
Donny Greenberg	bacca663ff	Fix Broken Link in CONTRIBUTING.md (#41066 ) Summary: Spotted a broken link, and while I was at it, fixed a few little language and formatting nits. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41066 Reviewed By: mruberry Differential Revision: D22415371 Pulled By: dongreenberg fbshipit-source-id: 7d11c13235b28a01886063c11a4c5ccb333c0c02	2020-07-07 20:02:47 -07:00
Yanan Cao	445128d0f2	Add PyTorch Glossary (#40639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40639 Differential Revision: D22421207 Pulled By: gmagogsfm fbshipit-source-id: 7df8bfc85e28bcf1fb08892a3671e7a9cb0dee9c	2020-07-07 19:53:44 -07:00
Hector Yuen	bce75a2536	add first implementation of swish (#41085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41085 add the first LUT implementation of swish Test Plan: compared against swish lowered as x*sigmoid(x), had to increase the threshold of error but looks generally right Reviewed By: venkatacrc Differential Revision: D22418117 fbshipit-source-id: c75fa496aa7a5356ddc87f1d61650f432e389457	2020-07-07 19:48:34 -07:00
Thomas Viehmann	a8bc7545d5	use PYTORCH_ROCM_ARCH to set GLOO_ROCM_ARCH (#40170 ) Summary: Previously it used the default arch set which may or may not coincide with the user's. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40170 Differential Revision: D22400866 Pulled By: xw285cornell fbshipit-source-id: 222ba684782024fa68f37bf7d4fdab9a2389bdea	2020-07-07 19:41:02 -07:00
Eli Uriegas	054e5d8943	.circleci: Fix job-specs-custom docker tag (#41111 ) Summary: Should resolve master breakages Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/41111 Differential Revision: D22426863 Pulled By: seemethere fbshipit-source-id: 561eaaa0d97a6fe13c75c1a73e4324b92d94afed	2020-07-07 19:32:23 -07:00
Linbin Yu	cc29c192a6	add "aten::add.str" op and remove two duplicated ops Summary: add "aten::add.str" op and remove two duplicated ops Test Plan: ``` buck run //xplat/caffe2/fb/pytorch_predictor:converter /mnt/vol/gfsfblearner-altoona/flow/data/2020-06-29/1ca8a85f-dbd5-4181-b5fc-63d24465c1fc/201084299/2068673333/model.pt1 ~/model_f201084299.bc buck run xplat/assistant/model_benchmark_tool/mobile/binary/:lite_predictor -- --model ~/model_f201084299.bc --input_file /tmp/gc_model_input.txt --model_input_args src_tokens,dict_feat,contextual_token_embedding --warmup 1 --iter 2 ``` Reviewed By: pengtxiafb Differential Revision: D22395604 fbshipit-source-id: 0ce21e8b8ae989d125f2f3739523e3c486590b9f	2020-07-07 19:07:35 -07:00
Eli Uriegas	a4fd4905c8	bump docker version to more recent tag (#41105 ) Summary: Tag was introduced originally as https://github.com/pytorch/pytorch/pull/40385 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/41105 Reviewed By: malfet Differential Revision: D22423910 Pulled By: seemethere fbshipit-source-id: 336fc7ef5243a5863c59762efd182ed7ea6dfc2c	2020-07-07 18:28:24 -07:00
Jithun Nair	eea535742f	Add bfloat16 support for nccl path (#38515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38515 Differential Revision: D22420896 Pulled By: ezyang fbshipit-source-id: 80d2d0c2052c91c9035e1e025ebb14e210cb0100	2020-07-07 18:07:06 -07:00
Jeff Daily	38b465db27	ROCm 3.5.1 image (#40385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40385 Differential Revision: D22421426 Pulled By: ezyang fbshipit-source-id: 1a131cdb1a0d5ad7ccd55dc1db17cae982cc286b	2020-07-07 15:37:23 -07:00
David Reiss	5e03a1e926	Add support for int[]? arguments in native_functions.yaml (#37174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37174 ghstack-source-id: 106938112 Test Plan: Upcoming diffs use this for upsampling. Differential Revision: D21210002 fbshipit-source-id: d6a55ab6420c05a92873a569221b613149aa0daa	2020-07-07 13:52:20 -07:00
David Reiss	4dad829ea3	In interpolate, inline the call to _interp_output_size (#37173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37173 This function is only called in one place, so inline it. This eliminates boilerplate related to overloads and allows for further simplification of shared logic in later diffs. All shared local variables have the same names (from closed_over_args), and no local variables accidentally collide. ghstack-source-id: 106938108 Test Plan: Existing tests for interpolate. Differential Revision: D21209995 fbshipit-source-id: acfadf31936296b2aac0833f704764669194b06f	2020-07-07 13:52:18 -07:00
David Reiss	3c1c74c366	In interpolate, move exceptional cases to the bottom (#37172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37172 This improves readability by keeping cases with similar behavior close together. It should also have a very tiny positive impact on perf. ghstack-source-id: 106938109 Test Plan: Existing tests for interpolate. Differential Revision: D21209996 fbshipit-source-id: c813e56aa6ba7370b89a2784fcb62cc146005258	2020-07-07 13:52:16 -07:00
David Reiss	8f0e254790	In interpolate, use if instead of elif (#37171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37171 Every one of these branches returns or raises, so there's no need for elif. This makes it a little easier to reorder and move conditions. ghstack-source-id: 106938110 Test Plan: Existing test for interpolate. Differential Revision: D21209992 fbshipit-source-id: 5c517e61ced91464b713f7ccf53349b05e27461c	2020-07-07 13:49:53 -07:00
Pritam Damania	93778f3b24	Expose certain methods in OpaqueTensorImpl. (#41060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41060 Exposes a const ref opaque_handle and made copy_tensor_metdata a protected method. This helps in reusing code in sub classes of OpaqueTensorImpl Test Plan: waitforbuildbot Reviewed By: dzhulgakov Differential Revision: D22406602 fbshipit-source-id: e3b8338099f257da7f6bbff679f1fdb71e5f335a	2020-07-07 13:36:32 -07:00
chengjun	8d570bc708	Decouple DataParallel/DistributedDataParallel from CUDA (#38454 ) Summary: Decouple DataParallel/DistributedDataParallel from CUDA to support more device types. - Move torch/cuda/comm.py to torch/nn/parallel/comm.py with minor changes for common devices support. Torch.cuda.comm is kept as is for backward compatibility - Provide common APIs to arbitrary device types without changing existing CUDA APIs in torch.cuda space. - Replace the torch.cuda calls in DataParellel/DistributedDataParallel with the new APIs. Related RFC: [https://github.com/pytorch/pytorch/issues/36160](https://github.com/pytorch/pytorch/issues/36160) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38454 Differential Revision: D22051557 Pulled By: mrshenli fbshipit-source-id: 7842dad0e5d3ca0f6fb760bda49182dcf6653af8	2020-07-07 12:48:16 -07:00
mattip	75155df8b4	Doc warnings (#41068 ) Summary: solves most of gh-38011 in the framework of solving gh-32703. These should only be formatting fixes, I did not try to fix grammer and syntax. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41068 Differential Revision: D22411919 Pulled By: zou3519 fbshipit-source-id: 25780316b6da2cfb4028ea8a6f649bb18b746440	2020-07-07 11:43:21 -07:00
Eli Uriegas	ff3ba25b8e	.circleci: Output binary sizes, store binaries (#41074 ) Summary: We need an easy to way to quickly visually grep binary sizes from builds and then have a way to test out those binaries quickly. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/41074 Differential Revision: D22415667 Pulled By: seemethere fbshipit-source-id: 86386e5390dce6aae26e952a47f9e2a2221d30b5	2020-07-07 11:36:49 -07:00
Jithun Nair	0e6b750288	Insert parentheses around kernel name argument to hipLaunchKernelGGL (#41022 ) Summary: This is to workaround an issue in hipclang wrt templated kernel name arguments to hipLaunchKernelGGL. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41022 Differential Revision: D22404183 Pulled By: ngimel fbshipit-source-id: 63135ccb9e087f4c8e8663ed383979f7e2c1ba06	2020-07-07 11:31:45 -07:00
Yanan Cao	630e7ed9cc	Splitting embedding_bag to embedding_bag_forward_only and embedding_bag (#40557 ) Summary: Currently embedding_bag's CPU kernel queries whether weight.requires_grad() is true. This violates layering of AutoGrad and Op Kernels, causing issues in third-party backends like XLA. See this [issue](https://github.com/pytorch/xla/issues/2215) for more details. This PR hoists the query of weight.requires_grad() to Python layer, and splits embedding_bag into two separate ops, each corresponding to weight.requires_grad() == true and false. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40557 Reviewed By: ailzhang Differential Revision: D22327476 Pulled By: gmagogsfm fbshipit-source-id: c815b3690d676a43098e12164517c5debec90fdc	2020-07-07 11:24:29 -07:00
Karel Ha	00ee54d2a4	Fix link to PyTorch organization (from Governance) (#40984 ) Summary: PR fixes https://github.com/pytorch/pytorch/issues/40666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40984 Differential Revision: D22404543 Pulled By: ngimel fbshipit-source-id: 0d39e8f4d701517cce9c31fddaaad46be3d4844b	2020-07-07 11:22:57 -07:00
lento	452d5e191b	Grammatically updated the tech docs (#41031 ) Summary: Small grammatical update to the torch tech docs ![image](https://user-images.githubusercontent.com/26879385/86633690-e126c400-bfc8-11ea-8892-23cdc037daa9.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/41031 Differential Revision: D22404342 Pulled By: ngimel fbshipit-source-id: 1c723119cfb050c4ef53de7971fe6e0acf3e91a9	2020-07-07 11:17:17 -07:00
Edward Yang	22c7d183f7	If ninja is being used, force build_ext to run. (#40837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40837 As ninja has accurate dependency tracking, if there is nothing to do, then we will very quickly noop. But this is important for correctness: if a change was made to a header that is not listed explicitly in the distutils Extension, then distutils will come to the wrong conclusion about whether or not recompilation is needed (but Ninja will work it out.) This caused https://github.com/pytorch/vision/issues/2367 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D22340930 Pulled By: ezyang fbshipit-source-id: 481b74f6e2cc78159d2a74d413751cf7cf16f592	2020-07-07 09:49:31 -07:00
Edward Leardi	733b8c23c4	Fix several quantization documentation typos (#40567 ) Summary: This PR fixes several typos I noticed in the docs here: https://pytorch.org/docs/master/quantization.html. In one case there was a misspelled module [torch.nn.instrinsic.qat](https://pytorch.org/docs/master/quantization.html#torch-nn-instrinsic-qat) which I corrected and am including screenshots of below just in case. <img width="1094" alt="before" src="https://user-images.githubusercontent.com/54918401/85766765-5cdd6280-b6e5-11ea-93e6-4944cf820b71.png"> <img width="1093" alt="after" src="https://user-images.githubusercontent.com/54918401/85766769-5d75f900-b6e5-11ea-8850-0d1f5ed67b16.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/40567 Differential Revision: D22311291 Pulled By: ezyang fbshipit-source-id: 65d1f3dd043357e38a584d9e30f31634a5b0995c	2020-07-07 09:45:23 -07:00
Taylor Robie	2d98f8170e	Add option to warn if elements in a Compare table are suspect (#41011 ) Summary: This PR adds a `.highlight_warnings()` method to `Compare`, which will include a `(! XX%)` next to measurements with high variance to highlight that fact. For example: ``` [------------- Record function overhead ------------] \| lstm_jit \| resnet50_jit 1 threads: ------------------------------------------ with_rec_fn \| 650 \| 8600 without_rec_fn \| 660 \| 8000 2 threads: ------------------------------------------ with_rec_fn \| 360 \| 4200 without_rec_fn \| 350 \| 4000 4 threads: ------------------------------------------ with_rec_fn \| 250 \| 2100 without_rec_fn \| 260 \| 2000 8 threads: ------------------------------------------ with_rec_fn \| 200 (! 6%) \| 1200 without_rec_fn \| 210 (! 6%) \| 1100 16 threads: ----------------------------------------- with_rec_fn \| 220 (! 8%) \| 900 (! 5%) without_rec_fn \| 200 (! 5%) \| 1000 (! 7%) 32 threads: ----------------------------------------- with_rec_fn \| 1000 (! 7%) \| 920 without_rec_fn \| 1000 (! 6%) \| 900 (! 6%) Times are in milliseconds (ms). (! XX%) Measurement has high variance, where XX is the median / IQR * 100. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/41011 Differential Revision: D22412905 Pulled By: robieta fbshipit-source-id: 2c90e719d9a5a1c0267ed113dd1b1b1738fa8269	2020-07-07 09:39:22 -07:00
Brian Vaughan	a04af4dccb	Revert D22396896: [pytorch][PR] run single-threaded gradgradcheck in test_nn Test Plan: revert-hammer Differential Revision: D22396896 (`dac63a13cb`) Original commit changeset: 3b247caceb65 fbshipit-source-id: 90bbd71ca5128a7f07fe2907c061ee0922d16edf	2020-07-07 07:43:39 -07:00
Wojciech Baranowski	0e09511af9	type annotations for dataloader, dataset, sampler (#39392 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38913 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39392 Reviewed By: anjali411 Differential Revision: D22102489 Pulled By: zou3519 fbshipit-source-id: acb68d9521145f0b047214d62b5bdc5a0d1b9be4	2020-07-07 07:16:18 -07:00
Nikita Shulga	a6b703cc89	Make `torch_cpu` compileable when `USE_TENSORPIPE` is not set. (#40846 ) Summary: Forward-declare `tensorpipe::Message` class in utils.h Guard TensorPipe specific methods in utils.cpp with `#ifdef USE_TENSORPIPE` Pass `USE_TENSORPIPE` as private flag to `torch_cpu` library Pull Request resolved: https://github.com/pytorch/pytorch/pull/40846 Differential Revision: D22338864 Pulled By: malfet fbshipit-source-id: 2ea2aea84527ae7480e353afb55951a068b3b980	2020-07-07 07:02:57 -07:00
Haixin Liu	12b5bdc601	Remove unused Logger in get_matching_activations (#41023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41023 Remove Logger in get_matching_activations since it's not used. ghstack-source-id: 107237046 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_submodule_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_functional_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_functional_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_dynamic' Differential Revision: D22394957 fbshipit-source-id: 7d59e0f35e9f4c304b8487460d48236ee6e5a872	2020-07-07 00:33:07 -07:00
Nikita Shulga	4aa543ed2e	Fix unordered-map-over-enum for GCC 5.4 (#41063 ) Summary: Forgot to add this to https://github.com/pytorch/pytorch/pull/41055 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41063 Differential Revision: D22407451 Pulled By: malfet fbshipit-source-id: 6f06653b165cc4817d134657f87caf643182832a	2020-07-06 23:26:31 -07:00
Nikita Shulga	50df097599	Fix CUDA jit codegen compilation with gcc-5.4 (#41055 ) Summary: It's a known gcc-5.4 bug that enum class is not hasheable by default, so `std::unordered_map` needs 3rd explicit parameters to compute hash from the type. Should fix regression caused by https://github.com/pytorch/pytorch/pull/40864 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41055 Differential Revision: D22405478 Pulled By: malfet fbshipit-source-id: f4bd36bebdc1ad0251ebd1e6cefba866e6605fe6	2020-07-06 21:09:17 -07:00
Yael Dekel	56396ad024	ONNX: support view_as operator (#40496 ) Summary: This PR adds support for the torch `view_as` operator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40496 Reviewed By: hl475 Differential Revision: D22398318 Pulled By: houseroad fbshipit-source-id: f92057f9067a201b707aa9b8fc4ad34643dd5fa3	2020-07-06 20:38:46 -07:00
Ksenija Stanojevic	b2cc8a2617	[ONNX]Fix export of full_like (#40063 ) Summary: Fix export of full_like when fill_value is of type torch._C.Value. This PR fixes a bug when exporting GPT2DoubleHeadsModel https://github.com/huggingface/transformers/issues/4950 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40063 Reviewed By: hl475 Differential Revision: D22398353 Pulled By: houseroad fbshipit-source-id: 6980a61211fe571c2e4a57716970f474851d811e	2020-07-06 20:36:09 -07:00
Yael Dekel	6e4f501f1a	Improve error message for Pad operator (#39651 ) Summary: In issue https://github.com/pytorch/pytorch/issues/36997 the user encountered a non-meaningful error message when trying to export the model to ONNX. The Pad operator in opset 9 requires the list of paddings to be constant. This PR tries to improve the error message given to the user when this is not the case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39651 Reviewed By: hl475 Differential Revision: D21992262 Pulled By: houseroad fbshipit-source-id: b817111c2a40deba85e4c6cdb874c1713312dba1	2020-07-06 20:26:02 -07:00
Edward Leardi	6b50874cb7	Fix HTTP links in documentation to HTTPS (#40878 ) Summary: I ran `make linkcheck` using `sphinx.builders.linkcheck` on the documentation and noticed a few links weren't using HTTPS so I quickly updated them all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40878 Differential Revision: D22404647 Pulled By: ngimel fbshipit-source-id: 9c9756db59197304023fddc28f252314f6cf4af3	2020-07-06 20:05:21 -07:00
Nikita Shulga	63ef706979	[ATen] Add `native_cuda_h` list to CMakeLists.txt (#41038 ) Summary: Closes https://github.com/pytorch/pytorch/issues/40784 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41038 Differential Revision: D22404273 Pulled By: malfet fbshipit-source-id: 8df05f948f069ac95591d523222faa1327429e71	2020-07-06 19:58:36 -07:00
Richard Zou	5d1d8a58b8	Enable `in_dims` for vmap frontend api (#40717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40717 `in_dims` specifies which dimension of the input tensors should be vmapped over. One can also specify `None` as an `in_dim` for a particular input to indicate that we do not map over said input. We implement `in_dims` by creating a BatchedTensor with BatchDim equal to said `in_dim`. Most of this PR is error checking. `in_dims` must satisfy the following: - `in_dim` can be either an int or a Tuple[Optional[int]]. If it is an int, we use it to mean the `in_dim` for every input. - If `in_dims` is not-None at some index `idx`, then the input at index `idx` MUST be a tensor (vmap can only map over tensors). jax supports something more generalized: their `in_dims` can match the structure of the `inputs` to the function (i.e., it is a nested python data structure matching the data structure of `inputs` specifying where in `inputs` the Tensors to be mapped are and what their map dims should be). We don't have the infrastruture yet so we only support `int` or a flat tuple for `in_dims`. Test Plan: - `pytest test/test_vmap.py -v` Differential Revision: D22397914 Pulled By: zou3519 fbshipit-source-id: 56d2e14be8b6024e4cde2729eff384da305b4ea3	2020-07-06 19:14:43 -07:00
Natalia Gimelshein	dac63a13cb	run single-threaded gradgradcheck in test_nn (#40999 ) Summary: Most time-consuming tests in test_nn (taking about half the time) were gradgradchecks on Conv3d. Reduce their sizes, and, most importantly, run gradgradcheck single-threaded, because that cuts the time of conv3d tests by an order of magnitude, and barely affects other tests. These changes bring test_nn time down from 1200 s to ~550 s on my machine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40999 Differential Revision: D22396896 Pulled By: ngimel fbshipit-source-id: 3b247caceb65d64be54499de1a55de377fdf9506	2020-07-06 17:21:25 -07:00
Elias Ellison	37a572f33e	fix grad thrashing of shape analysis (#40939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40939 Previously, when we would do shape analysis by running the op with representative inputs, we would always set the grad property to false. This led to a wrong static analysis when we would create differentiable subgraphs, and propagate shapes without also propagating requires_grad, and then uninline them. Test Plan: Imported from OSS Differential Revision: D22394676 Pulled By: eellison fbshipit-source-id: 254e6e9f964b40d160befe0e125abe1b7aa2bd5e	2020-07-06 17:12:13 -07:00
Elias Ellison	4af8424377	shape analysis fix for default dtype' (#40938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40938 already accepted in https://github.com/pytorch/pytorch/pull/40645 Test Plan: Imported from OSS Reviewed By: jamesr66a, Krovatkin Differential Revision: D22394675 Pulled By: eellison fbshipit-source-id: 1e9dbb24a4cb564d9a68280d2166329ca9fb0425	2020-07-06 17:10:01 -07:00
Will Feng (FAIAR)	078669f6c3	Back out "[2/n][Compute Meta] support analysis for null flag features" Summary: Original commit changeset: 46c59d849fa8 The original commit is breaking DPER3 release pipeline with the following failures: https://www.internalfb.com/intern/chronos/jobinstance?jobinstanceid=9007207344413239&smc=chronos_gp_admin_client&offset=0 ``` Child workflow f 202599639 failed with error: c10::Error: [enforce fail at operator.cc:76] blob != nullptr. op Save: Encountered a non-existing input blob: feature_preproc/feature_sparse_to_dense/default_float_value ``` https://www.internalfb.com/intern/chronos/jobinstance?jobinstanceid=9007207344855973&smc=chronos_gp_admin_client&offset=0 ``` Child workflow f 202629391 failed with error: c10::Error: [enforce fail at operator.cc:76] blob != nullptr. op Save: Encountered a non-existing input blob: tum_preproc/inductive/feature_sparse_to_dense/default_float_value ``` Related UBN tasks: T69529846, T68986110 Test Plan: Build a DPER3 package on top of this commit, and check that DPER3 release test `model_deliverability_test` is passing. Differential Revision: D22396317 fbshipit-source-id: 92d5b30cc146c005d6159a8d5bfe8973e2c546dd	2020-07-06 16:29:03 -07:00
Wojciech Baranowski	a78024476b	Port `equal` from THC to ATen (CUDA) (#36483 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24557 ASV benchmark: ``` import torch sizes = [ (10**6,), (1000, 1000), (10, 10), (1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ] class EqualTrue: params = range(len(sizes)) def setup(self, n): dims = sizes[n] self.a = torch.rand(dims, device='cuda') self.b = self.a.clone() def time_equal(self, n): torch.equal(self.a, self.b) class EqualFalse: params = range(len(sizes)) def setup(self, n): dims = sizes[n] self.a = torch.rand(dims, device='cuda') self.b = torch.rand(dims, device='cuda') def time_equal(self, n): torch.equal(self.a, self.b) ``` Old results: ``` [ 75.00%] ··· equal.EqualFalse.time_equal [ 75.00%] ··· ======== ============ param1 -------- ------------ 0 67.7±7μs 1 74.0±2μs 2 24.4±0.1μs 3 135±0.2μs ======== ============ [100.00%] ··· equal.EqualTrue.time_equal [100.00%] ··· ======== ============ param1 -------- ------------ 0 59.8±0.2μs 1 59.9±0.3μs 2 25.0±0.5μs 3 136±0.2μs ======== ============ ``` New results: ``` [ 75.00%] ··· equal.EqualFalse.time_equal [ 75.00%] ··· ======== ============ param1 -------- ------------ 0 44.4±0.2μs 1 44.5±0.4μs 2 31.3±0.3μs 3 96.6±0.5μs ======== ============ [100.00%] ··· equal.EqualTrue.time_equal [100.00%] ··· ======== ============ param1 -------- ------------ 0 44.2±0.2μs 1 44.6±0.2μs 2 30.8±0.3μs 3 97.3±0.2μs ======== ============ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36483 Differential Revision: D21451829 Pulled By: VitalyFedyunin fbshipit-source-id: 033e8060192c54f139310aeafe8ba784bab94ded	2020-07-06 16:00:16 -07:00
James Reed	c0f9bf9bea	s/torch::jit::class_/torch::class_/ (#40795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40795 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D22314215 Pulled By: jamesr66a fbshipit-source-id: a2fb5c6804d4014f8e437c6858a7be8cd3efb380	2020-07-06 15:53:33 -07:00
Dongxin Liu	cbe52d762c	Mish Activation Function (#40856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40856 Add a new activation function - Mish: A Self Regularized Non-Monotonic Neural Activation Function https://arxiv.org/abs/1908.08681 Test Plan: buck test //caffe2/caffe2/python/operator_test:elementwise_ops_test -- 'test_mish' {F242275183} Differential Revision: D22158035 fbshipit-source-id: 459c1dd0ac5b515913fc09b5f4cd13dcf095af31	2020-07-06 15:51:23 -07:00
Nikita Shulga	87f9b55aa5	Use explicit templates in `gpu_kernel_with_scalars` (#40992 ) Summary: This trick should have no effect on performance, but it reduces size of kernels using the template by 10% For example, sizeof(BinaryMulDivKernel.cu.o) compiled by CUDA-10.1 toolchain for sm_75 before the change was 4.2Mb, after 3.8Mb Pull Request resolved: https://github.com/pytorch/pytorch/pull/40992 Differential Revision: D22398733 Pulled By: malfet fbshipit-source-id: 6576f4da00dc5fc2575b2313577f52c6571d5e6f	2020-07-06 15:46:28 -07:00
Sameer Deshmukh	945ae5bd7b	Update the documentation of the scatter_ method with support for reduction methods. (#40962 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/36447 . Update for https://github.com/pytorch/pytorch/issues/33389. Also removes unused `unordered_map` include from the CPP file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40962 Differential Revision: D22376253 Pulled By: ngimel fbshipit-source-id: 4e7432190e9a847321aec6d6f6634056fa69bdb8	2020-07-06 15:27:16 -07:00
Peter Bell	35bd2b3c8b	DOC: Clarify that CrossEntropyLoss mean is weighted (#40991 ) Summary: Closes https://github.com/pytorch/pytorch/issues/40560 This adds the equation for the weighted mean to `CrossEntropyLoss`'s docs and the `reduction` argument for `CrossEntropyLoss` and `NLLLoss` no longer describes a non-weighted mean of the outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40991 Differential Revision: D22395805 Pulled By: ezyang fbshipit-source-id: a623b6dd2aab17220fe0bf706bd9b62d6ba531fd	2020-07-06 15:05:31 -07:00
Christian Sarofeen	b9b4f05abf	[nvFuser] Working towards reductions, codegen improvements (#40864 ) Summary: Have basic reduction fusion working, and have improved code generator to approach performance of eager mode reductions. Coming soon will be pointwise-reduction fusions in a way that should prevent the possibility of hitting regressions. Also working on performant softmax kernels in the code generator which may be our next fusion target. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40864 Reviewed By: ngimel Differential Revision: D22392877 Pulled By: soumith fbshipit-source-id: 457448a807d628b1035f6d90bc0abe8a87bf8447	2020-07-06 14:52:49 -07:00
Meghan Lele	e026d91506	[JIT] Remove dead store in unpickler.cpp (#40625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40625 Test Plan: Continuous integration. Reviewed By: suo Differential Revision: D22259289 fbshipit-source-id: 76cb097dd06a636004fc780b17cb20f27d3821de	2020-07-06 14:48:03 -07:00
Mike Ruberry	d753f1c2e1	Fixes formatting of vander, count_nonzero, DistributedSampler documentation (#41025 ) Summary: Bundle of small edits to fix formatting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41025 Differential Revision: D22398364 Pulled By: mruberry fbshipit-source-id: 8d484cb52a1cf4a8eb1f64914574250c9fd5043d	2020-07-06 14:26:13 -07:00
Jiakai Liu	0fbd42b20f	[pytorch] deprecate PYTORCH_DISABLE_TRACING macro (#41004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41004 Tracing has been moved into separate files. Now we can disable it by not compiling the source files for xplat mobile build. ghstack-source-id: 107158627 Test Plan: CI + build size bot Reviewed By: iseeyuan Differential Revision: D22372615 fbshipit-source-id: bf2e2249e401295ff63020a292df119b188fb966	2020-07-06 14:22:59 -07:00
Jiakai Liu	7f60642bae	[pytorch] add manual registration for trace type (#40903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40903 This PR continues the work of #38467 - decoupling Autograd and Trace for manually registered ops. ghstack-source-id: 107158638 Test Plan: CI Differential Revision: D22354804 fbshipit-source-id: f5ea45ade2850296c62707a2a4449d7d67a9f5b5	2020-07-06 14:20:37 -07:00
raghuramank100	e173278348	Update quantization.rst (#40896 ) Summary: Add documentation for dynamic quantized modules Pull Request resolved: https://github.com/pytorch/pytorch/pull/40896 Differential Revision: D22395955 Pulled By: z-a-f fbshipit-source-id: cdc956d1509a0901bc24b73b6ca68a1b65e00cc2	2020-07-06 13:47:39 -07:00
Ailing Zhang	e75f12ac15	Check statstical diff rather than exact match for test_dropout_cuda. (#40883 ) Summary: There's is a TODO tracked in https://github.com/pytorch/pytorch/issues/40882 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40883 Reviewed By: pbelevich Differential Revision: D22346087 Pulled By: ailzhang fbshipit-source-id: b4789ca3a10f6a72c6e77276bde45633eb6cf545	2020-07-06 13:11:48 -07:00
mungsoo	c38a5cba0d	Remove duplicate assignment in collate.py (#40655 ) Summary: Duplicated assignment Pull Request resolved: https://github.com/pytorch/pytorch/pull/40655 Reviewed By: ezyang Differential Revision: D22308827 Pulled By: colesbury fbshipit-source-id: 48361da8994b3ca00ef29e9afd3ec2672266f00a	2020-07-06 12:37:59 -07:00
Tongzhou Wang	c935712d58	Use unbind for tensor.__iter__ (#40884 ) Summary: Unbind, which has a special backward with cat, is arguably better than multiple selects, whose backward is creating & adding a bunch of tensors as big as `self`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40884 Reviewed By: pbelevich Differential Revision: D22363376 Pulled By: zou3519 fbshipit-source-id: 0911cdbb36f9a35d1b95f315d0a2f412424e056d	2020-07-06 10:53:15 -07:00
Nikita Shulga	f6f3c0094a	Revert D22369579: add eq.str, ne.str, and add.str ops Test Plan: revert-hammer Differential Revision: D22369579 (`0deb2560b8`) Original commit changeset: 7ac9a184d437 fbshipit-source-id: 9c861b9f6bf32fe51fa0ea516cf09a3d09d78a7c	2020-07-06 09:52:59 -07:00
James Reed	9c82b570bf	Fix delegating to jit.load from torch.load (#40937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40937 Test Plan: Imported from OSS Differential Revision: D22363816 Pulled By: jamesr66a fbshipit-source-id: 50fc318869407fe8b215368026eaceb129b68a46	2020-07-06 09:00:13 -07:00
Venkata Chintapalli	73c5a78f43	Test test_int8_ops_nnpi.py case typo fix. (#41008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41008 Test test_int8_ops_nnpi.py case typo fix. Test Plan: test_int8_ops_nnpi.py case typo fix. Reviewed By: hl475 Differential Revision: D22390331 fbshipit-source-id: 8d257c72114ce890720219eb519b9cb43b2ca49b	2020-07-06 08:44:08 -07:00
Xiong Wei	46f5cf1e31	Improve error reporting of AVX instruction in CI job (#40681 ) Summary: Close https://github.com/pytorch/pytorch/issues/40320 Leverage `qemu` and `gdbserver` for printing backtrace and instruction, and help developers to understand the causes of failed tests better. Signed-off-by: Xiong Wei <xiongw.fnst@cn.fujitsu.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/40681 Differential Revision: D22391512 Pulled By: malfet fbshipit-source-id: 19f125cf6c0e5a51814aff2b1d4d3c81298e3cb6	2020-07-06 08:31:01 -07:00
guol-fnst	e1afa9daff	fix cmake bug (#39930 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39930 Differential Revision: D22391207 Pulled By: ezyang fbshipit-source-id: bde19a112846e124d4e5316ba947f48d4dccf361	2020-07-06 08:02:30 -07:00
Wojciech Baranowski	0b9717b86a	When linking libtorch_cpu.so, put AVX sources last in the input list (#40449 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39600 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40449 Reviewed By: VitalyFedyunin Differential Revision: D22312501 Pulled By: colesbury fbshipit-source-id: 4c09adb0173749046f20b84241d6c940b339ad77	2020-07-06 07:56:12 -07:00
X Wang	063d5b0d3f	Remove get_fail_msg in test_dataloader.test_proper_exit (#40745 ) Summary: Close https://github.com/pytorch/pytorch/issues/40744 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40745 Reviewed By: ezyang Differential Revision: D22308972 Pulled By: colesbury fbshipit-source-id: 4b4847e6b926b2614c8b14f17a9db3b0376baabe	2020-07-06 07:48:32 -07:00
peter	450ba49653	Add the missing `resource_class` key in the update_s3_htmls job (#41000 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40998. Actually I don't know why it is needed. But without it, the build won't start. See my rerun of the update_s3_html3 job: https://app.circleci.com/pipelines/github/pytorch/pytorch/187926/workflows/432dbe98-ca2f-484d-acc7-0482cb3fd01f/jobs/6121551/steps. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41000 Differential Revision: D22390654 Pulled By: malfet fbshipit-source-id: 0f296c8a82fa92d5382f883bca951e6576f75b15	2020-07-06 07:02:11 -07:00
Liu	54d7a1e3f4	Fix module dict key ordering (#40905 ) Summary: fix https://github.com/pytorch/pytorch/issues/40227 Removed the sorting operation both in ModuleDict class, updated the docstring. Also remove a sort operation in corresponding unit test, which will lead to unit test fail. BC Note: Python version after 3.6, the plain dict will preserve the order of keys. example: For a python 3.6+ user, if he is initial a ModuleDict instance using plain python dict: { "b": torch.nn.MaxPool2d(3), "a": torch.nn.MaxPool2d(3) } , he will get a ModuleDict which preserve the order: ModuleDict( (b): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False) (a): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False) ) For a python 3.5 user, if we maintain the same input, then the output ModuleDict could be: ModuleDict( (a): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False) (b): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False) ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/40905 Differential Revision: D22357480 Pulled By: albanD fbshipit-source-id: 0e2502769647bb64f404978243ca1ebe5346d573	2020-07-06 06:40:48 -07:00
Linbin Yu	0deb2560b8	add eq.str, ne.str, and add.str ops (#40958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40958 add 3 str operators to lite interpreter eq.str ne.str add.str Test Plan: ``` buck run //xplat/caffe2/fb/pytorch_predictor:converter /mnt/vol/gfsfblearner-altoona/flow/data/2020-06-29/1ca8a85f-dbd5-4181-b5fc-63d24465c1fc/201084299/2068673333/model.pt1 ~/model_f201084299.bc buck run xplat/assistant/model_benchmark_tool/mobile/binary/:lite_predictor -- --model ~/model_f201084299.bc --input_file /tmp/gc_model_input.txt --model_input_args src_tokens,dict_feat,contextual_token_embedding --warmup 1 --iter 2 ``` Reviewed By: pengtxiafb Differential Revision: D22369579 fbshipit-source-id: 7ac9a184d437c875edfb584221edd706bffb16e1	2020-07-06 01:01:15 -07:00
Michael Suo	300a3aaaad	[jit] move private implementation out of `jit/__init__.py` (#40807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40807 We pack a lot of logic into `jit/__init__.py`, making it unclear to developers and users which parts of our API are public vs. internal. This is one in a series of PRs intended to pull implementation out into separate files, and leave `__init__.py` as a place to register the public API. This PR moves all the tracing-related stuff out, and fixes other spots up as necessary. Followups will move other core APIs out. The desired end-state is that we conform to the relevant rules in [PEP 8](https://www.python.org/dev/peps/pep-0008/#public-and-internal-interfaces). In particular: - Internal implementation goes in modules prefixed by `_`. - `__init__.py` exposes a public API from these private modules, and nothing more. - We set `__all__` appropriately to declare our public API. - All use of JIT-internal functionality outside the JIT are removed (in particular, ONNX is relying on a number internal APIs). Since they will need to be imported explicitly, it will be easier to catch new uses of internal APIs in review. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D22320645 Pulled By: suo fbshipit-source-id: 0720ea9976240e09837d76695207e89afcc58270	2020-07-05 22:01:11 -07:00
Nikita Shulga	1e64bf4c40	[CircleCI] Delete docker image after testing (#40917 ) Summary: Needed maintenance step to avoid running out of disk space on RocM testers Pull Request resolved: https://github.com/pytorch/pytorch/pull/40917 Differential Revision: D22385844 Pulled By: malfet fbshipit-source-id: b6dc9ba888a2e34c311e9bf3c8b7b98fa1ec5435	2020-07-05 13:21:00 -07:00
Will Constable	8ecd4f36aa	fix __len__, __contains__, getitem inherited from interface class derived from nn container (closes #40603 ) (#40789 ) Summary: Define static script implementation of __len__ and __contains__ on any subclass derived from a type such as ModuleList, Sequential, or ModuleDict. Implement getitem for classes derived from ModuleDict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40789 Reviewed By: eellison Differential Revision: D22325159 Pulled By: wconstab fbshipit-source-id: fc1562c29640fe800e13b5a1dd48e595c2c7239b	2020-07-04 15:45:18 -07:00
Nikolay Korovaiko	8223858cc1	shape inference of undefined for prim::grad (#40866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40866 Reviewed By: pbelevich Differential Revision: D22358988 Pulled By: Krovatkin fbshipit-source-id: 7118d7f8d4eaf056cfb71dc0d588d38b1dfb0fc7	2020-07-04 14:10:22 -07:00
Nikolay Korovaiko	88c0d886e3	update requires_gard on loop inputs correctly (master) (#40926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40926 Reviewed By: eellison Differential Revision: D22359471 Pulled By: Krovatkin fbshipit-source-id: 823e87674e2d2917f075255ec926e0485972f4e2	2020-07-04 13:58:29 -07:00
Tongzhou Wang	0790d11a18	typing for tensor.T/grad_fn torch.Size (#40879 ) Summary: fixes https://github.com/pytorch/pytorch/issues/40658 https://github.com/pytorch/pytorch/issues/40658 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40879 Reviewed By: pbelevich Differential Revision: D22339146 Pulled By: ezyang fbshipit-source-id: 6b4695e102591e7a2c391eb337c154414bacf67c	2020-07-04 11:58:29 -07:00
Xin Yao	0fc0a9308a	fix autodoc for torch.distributed.launch (#40963 ) Summary: The doc for `torch.distributed.launch` is missing since v1.2.0 (see issue https://github.com/pytorch/pytorch/issues/36386) because PR https://github.com/pytorch/pytorch/issues/22501 added some imports at the first line. `542ac74987/torch/distributed/launch.py (L1-L5)` I move it below the docstring to make the autodoc in Sphinx work normally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40963 Differential Revision: D22380816 Pulled By: mrshenli fbshipit-source-id: ee8406785b9a198bbf3fc65e589854379179496f	2020-07-04 08:59:41 -07:00
Raghuraman Krishnamoorthi	480851ad2c	Docstring changes for dynamic quantized classes (#40931 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40931 Fix docstrings for dynamic quantized Linear/LSTM and associated classes ghstack-source-id: 107064446 Test Plan: Docs show up in correctly Differential Revision: D22360787 fbshipit-source-id: 8e357e081dc59ee42fd7f12ea5079ce5d0cc9df2	2020-07-03 21:04:12 -07:00
Rohan Varma	3b7df2388e	[RFC] Profile rpc_async call from JIT (#40652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40652 Resolves https://github.com/pytorch/pytorch/issues/40304, but looking for feedback on whether there is a better approach for this. In order to profile `rpc_async` calls made within a torchscript function, we add the profiling logic to `rpcTorchscript` which is the point where the RPC is dispatched and is called by the jit `rpc_async` operator. We take a somewhat similar approach to how this is done in the python API. If profiling is enabled, we call `record_function_enter` which creates a `RecordFunction` object and runs its starting callbacks. Then, we schedule end callbacks for this `RecordFunction` to be run when the jit future completes. One caveat is that `rpcTorchscript` can also be called by rpc_async from a non-JIT function, in which case the profiling logic lives in Python. We add a check to ensure that we don't double profile in this case. ghstack-source-id: 107109485 Test Plan: Added relevant unittests. Differential Revision: D22270608 fbshipit-source-id: 9f62d1a2a27f9e05772d0bfba47842229f0c24e1	2020-07-03 15:17:16 -07:00
Jerry Zhang	f3f113f103	[quant][graphmode][fix] Print the node in error message (#40889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40889 Test Plan: Imported from OSS Differential Revision: D22348266 fbshipit-source-id: eed2ece5c94fcfaf187d6770bed4a7109f0c0b4a	2020-07-03 10:01:55 -07:00
Luca Wehrstedt	f083cea227	[RPC tests] Fix file descriptor leak (#40913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40913 Summary of the entire stack: -- This diff is part of an attempt to refactor the RPC tests. They currently suffer from several problems: - Several ways to specify the agent to use: there exists one "generic" fixture that uses the global variable TEST_CONFIG to look up the agent name, and is used for process group and Thrift, and then there are separate fixtures for the flaky agent and the TensorPipe one. - These two ways lead to having two separate decorators (`requires_process_group_agent` and `@_skip_if_tensorpipe_agent`) which must both be specified, making it unclear what the effect of each of them is and what happens if only one is given. - Thrift must override the TEST_CONFIG global variable before any other import (in order for the `requires_process_group_agent` decorator to work correctly) and for that it must use a "trap" file, which makes it even harder to track which agent is being used, and which is specific to Buck, and thus cannot be used in OSS by other agents. - Even if the TensorPipe fixture doesn't use TEST_CONFIG, it still needs to set it to the right value for other parts of the code to work. (This is done in `dist_init`). - There are a few functions in dist_utils.py that return some properties of the agent (e.g., a regexp to match against the error it returns in case of shutdown). These functions are effectively chained if/elses on the various agents, which has the effect of "leaking" some part of the Thrift agent into OSS. - Each test suite (RPC, dist autograd/dist optimizer, their JIT versions, remote module, ...) must be run on each agent (or almost; the faulty one is an exception) in both fork and spawn mode. Each of these combinations is a separate file, which leads to a proliferation of scripts. - There is no "master list" of what combinations make sense and should be run. Therefore it has happened that when adding new tests or new agents we forgot to enroll them into the right tests. (TensorPipe is still missing a few tests, it turns out). - All of these tiny "entry point" files contain almost the same duplicated boilerplate. This makes it very easy to get the wrong content into one of them due to a bad copy-paste. This refactoring aims to address these problems by: - Avoiding global state, defaults/override, traps, if/elses, ... and have a single way to specify the agent, based on an abstract base class and several concrete subclasses which can be "mixed in" to any test suite. - Instead of enabling/disabling tests using decorators, the tests that are specific to a certain agent are now in a separate class (which is a subclass of the "generic" test suite) so that they are only picked up by the agent they apply to. - Instead of having one separate entry point script for each combination, it uses one entry point for each agent, and in that script it provides a list of all the test suites it wants to run on that agent. And it does that by trying to deduplicate the boilerplate as much as possible. (In fact, the various agent-suite combinations could be grouped in any way, not necessarily by agent as I did here). It provides further advantages: - It puts all the agents on equal standing, by not having any of them be the default, making it thus easier to migrate from process group to TensorPipe. - It will make it easier to add more versions of the TensorPipe tests (e.g., one that disables the same-machine backends in order to test the TCP-based ones) without a further duplication of entry points, of boilerplate, ... Summary of this commit -- Once we start merging multiple test suites in a single file (which we'll happen in the next diffs in the stack) the OSX tests on CircleCI start failing due to "too many open files". This indicates a file descriptor leak. I then managed to repro it on Linux too by lowering the limit on open file descriptors (`ulimit -n 500`). Each test method that unittest runs is run on a new instance of the Testcase class. With our multiprocessing wrappers, this instance contains a list of child processes. Even after these processes are terminated, it appears they still hold some open file descriptor (for example a pipe to communicate with the subprocess). It also appears unittest is keeping these Testcase instances alive until the entire suite completes, which I suspect is what leads to this "leak" of file descriptors. Based on that guess, in this diff I am resetting the list of subprocesses during shutdown, and this seems to fix the problem. ghstack-source-id: 107045908 Test Plan: Sandcastle and CircleCI Differential Revision: D22356784 fbshipit-source-id: c93bb9db60fde72cae0b0c735a50c17e427580a6	2020-07-03 06:22:40 -07:00
Luca Wehrstedt	f9a71d3de4	[RPC tests] Align ddp_under_dist_autograd test with others (#40815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40815 Summary of the entire stack: -- This diff is part of an attempt to refactor the RPC tests. They currently suffer from several problems: - Several ways to specify the agent to use: there exists one "generic" fixture that uses the global variable TEST_CONFIG to look up the agent name, and is used for process group and Thrift, and then there are separate fixtures for the flaky agent and the TensorPipe one. - These two ways lead to having two separate decorators (`requires_process_group_agent` and `@_skip_if_tensorpipe_agent`) which must both be specified, making it unclear what the effect of each of them is and what happens if only one is given. - Thrift must override the TEST_CONFIG global variable before any other import (in order for the `requires_process_group_agent` decorator to work correctly) and for that it must use a "trap" file, which makes it even harder to track which agent is being used, and which is specific to Buck, and thus cannot be used in OSS by other agents. - Even if the TensorPipe fixture doesn't use TEST_CONFIG, it still needs to set it to the right value for other parts of the code to work. (This is done in `dist_init`). - There are a few functions in dist_utils.py that return some properties of the agent (e.g., a regexp to match against the error it returns in case of shutdown). These functions are effectively chained if/elses on the various agents, which has the effect of "leaking" some part of the Thrift agent into OSS. - Each test suite (RPC, dist autograd/dist optimizer, their JIT versions, remote module, ...) must be run on each agent (or almost; the faulty one is an exception) in both fork and spawn mode. Each of these combinations is a separate file, which leads to a proliferation of scripts. - There is no "master list" of what combinations make sense and should be run. Therefore it has happened that when adding new tests or new agents we forgot to enroll them into the right tests. (TensorPipe is still missing a few tests, it turns out). - All of these tiny "entry point" files contain almost the same duplicated boilerplate. This makes it very easy to get the wrong content into one of them due to a bad copy-paste. This refactoring aims to address these problems by: - Avoiding global state, defaults/override, traps, if/elses, ... and have a single way to specify the agent, based on an abstract base class and several concrete subclasses which can be "mixed in" to any test suite. - Instead of enabling/disabling tests using decorators, the tests that are specific to a certain agent are now in a separate class (which is a subclass of the "generic" test suite) so that they are only picked up by the agent they apply to. - Instead of having one separate entry point script for each combination, it uses one entry point for each agent, and in that script it provides a list of all the test suites it wants to run on that agent. And it does that by trying to deduplicate the boilerplate as much as possible. (In fact, the various agent-suite combinations could be grouped in any way, not necessarily by agent as I did here). It provides further advantages: - It puts all the agents on equal standing, by not having any of them be the default, making it thus easier to migrate from process group to TensorPipe. - It will make it easier to add more versions of the TensorPipe tests (e.g., one that disables the same-machine backends in order to test the TCP-based ones) without a further duplication of entry points, of boilerplate, ... Summary of this commit -- This prepares the stack by aligning the `ddp_under_dist_autograd` test to the other ones, so that later changes will be more consistent and thus easier to follow. It does so by moving the `skipIf` decorators and the `setUp` methods from the base test suite to the entry point scripts. ghstack-source-id: 107045911 Test Plan: Sandcastle and CircleCI Differential Revision: D22287535 fbshipit-source-id: ab0c9eb774b21d81e0ebd3078df958dbb4bfa0c7	2020-07-03 06:20:29 -07:00
Luca Wehrstedt	d0f2079b5e	[RPC tests] Remove world_size and init_method from TensorPipe fixture (#40814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40814 Summary of the entire stack: -- This diff is part of an attempt to refactor the RPC tests. They currently suffer from several problems: - Several ways to specify the agent to use: there exists one "generic" fixture that uses the global variable TEST_CONFIG to look up the agent name, and is used for process group and Thrift, and then there are separate fixtures for the flaky agent and the TensorPipe one. - These two ways lead to having two separate decorators (`requires_process_group_agent` and `@_skip_if_tensorpipe_agent`) which must both be specified, making it unclear what the effect of each of them is and what happens if only one is given. - Thrift must override the TEST_CONFIG global variable before any other import (in order for the `requires_process_group_agent` decorator to work correctly) and for that it must use a "trap" file, which makes it even harder to track which agent is being used, and which is specific to Buck, and thus cannot be used in OSS by other agents. - Even if the TensorPipe fixture doesn't use TEST_CONFIG, it still needs to set it to the right value for other parts of the code to work. (This is done in `dist_init`). - There are a few functions in dist_utils.py that return some properties of the agent (e.g., a regexp to match against the error it returns in case of shutdown). These functions are effectively chained if/elses on the various agents, which has the effect of "leaking" some part of the Thrift agent into OSS. - Each test suite (RPC, dist autograd/dist optimizer, their JIT versions, remote module, ...) must be run on each agent (or almost; the faulty one is an exception) in both fork and spawn mode. Each of these combinations is a separate file, which leads to a proliferation of scripts. - There is no "master list" of what combinations make sense and should be run. Therefore it has happened that when adding new tests or new agents we forgot to enroll them into the right tests. (TensorPipe is still missing a few tests, it turns out). - All of these tiny "entry point" files contain almost the same duplicated boilerplate. This makes it very easy to get the wrong content into one of them due to a bad copy-paste. This refactoring aims to address these problems by: - Avoiding global state, defaults/override, traps, if/elses, ... and have a single way to specify the agent, based on an abstract base class and several concrete subclasses which can be "mixed in" to any test suite. - Instead of enabling/disabling tests using decorators, the tests that are specific to a certain agent are now in a separate class (which is a subclass of the "generic" test suite) so that they are only picked up by the agent they apply to. - Instead of having one separate entry point script for each combination, it uses one entry point for each agent, and in that script it provides a list of all the test suites it wants to run on that agent. And it does that by trying to deduplicate the boilerplate as much as possible. (In fact, the various agent-suite combinations could be grouped in any way, not necessarily by agent as I did here). It provides further advantages: - It puts all the agents on equal standing, by not having any of them be the default, making it thus easier to migrate from process group to TensorPipe. - It will make it easier to add more versions of the TensorPipe tests (e.g., one that disables the same-machine backends in order to test the TCP-based ones) without a further duplication of entry points, of boilerplate, ... Summary of this commit -- This prepares the stack by simplifying the TensorPipe fixture. A comment says that the TensorPipe fixture cannot subclass the generic fixture class as that would lead to a diamond class hierarchy which Python doesn't support (whereas in fact it does), and therefore it copies over two properties that are defined on the generic fixture. However, each class that uses the TensorPipe fixture also inherits from the generic fixture, so there's no need to redefine those properties. And, in fact, by not redefining it we save ourselves some trouble when the TensorPipe fixture would end up overriding another override. ghstack-source-id: 107045914 Test Plan: Sandcastle and CircleCI Differential Revision: D22287533 fbshipit-source-id: 254c38b36ba51c9d852562b166027abacbbd60ef	2020-07-03 02:52:14 -07:00
Luca Wehrstedt	3890550940	[RPC tests] Fix @_skip_if_tensorpipe always skipping for all agents (#40860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40860 It turns out that the `@_skip_if_tensorpipe_agent` decorator was written in such a way that it accidentally caused the test to become a no-op (and thus always succeed) for all agents. What this means is that all tests wrapped by that decorator were never ever being run, for any agent. My understanding of the root cause is that the following code: ``` @_skip_if_tensorpipe_agent def test_foo(self): self.assertEqual(2 + 2, 4) ``` ended up behaving somewhat like this: ``` def test_foo(self): def original_test_func(self): self.assertEqual(2 + 2, 4) return unittest.skipIf(self.agent == "TENSORPIPE")(original_test_func) ``` which means that the test body of the decorated method was not actually calling the original test method. This issue probably came from the `@_skip_if_tensorpipe_agent` being copy-pasted from `requires_process_group_agent` (which, however, is not a decorator but rather a decorator factory). An unfortunate naming (calling `decorator` what was in fact the wrapped method) then hindered readability and hid the issue. Note that a couple of tests had become legitimately broken in the meantime and no one had noticed. The breakages have been introduced in #39909 (a.k.a., D22011868 (`145df306ae`)). ghstack-source-id: 107045916 Test Plan: Discovered this as part of my refactoring, in D22332611. After fixing the decorator two tests started breaking (for real reasons). After fixing them all is passing. Differential Revision: D22332611 fbshipit-source-id: f88ca5574675fdb3cd09a9f6da12bf1e25203a14	2020-07-03 02:50:11 -07:00
Haixin Liu	cab7d94d47	[PyTorch Numeric Suite] Remove unnecessary Logger in input arguments (#40890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40890 Remove unnecessary Logger in input arguments and simplify the API. ghstack-source-id: 107110487 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_submodule_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_functional_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_functional_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_dynamic' Differential Revision: D22345477 fbshipit-source-id: d8b4eb3d6cb3049aa3296dead8ba29bf5467bd1c	2020-07-03 02:45:46 -07:00
Jerry Zhang	542ac74987	[quant][graphmode][fix] Fold conv bn (#40865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40865 1. applied filter for the module types 2. removed the assumption that the conv bn are immediate child of parent module Test Plan: python test/test_quantization.py TestQuantizeJitPasses Imported from OSS Differential Revision: D22338074 fbshipit-source-id: 64739a5e56c0a74249a1dbc2c8454b88ec32aa9e	2020-07-03 00:01:04 -07:00
Jerry Zhang	824ab19941	[quant][graphmode] Support quantization for `aten::apend` (#40743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40743 `aten::append` modifies input inplace and the output is ignored, these ops are not supported right now, so we'll need to first make `aten::append` non-inplace by change ``` ignored = aten::append(list, x) ``` to ``` x_list = aten::ListConstruct(x) result = aten::add(list, x_list) ``` and then quantize the aten::add instead. Test Plan: TestQuantizeJitOps.test_general_shape_ops Imported from OSS Differential Revision: D22302151 fbshipit-source-id: 931000388e7501e9dd17bec2fad8a96b71a5efc5	2020-07-02 22:26:52 -07:00
Jiakai Liu	ff17b83fd8	[pytorch][ci] add custom selective build flow for android build (#40199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40199 Mobile custom selective build has already been covered by `test/mobile/custom_build/build.sh`. It builds a CLI binary with host-toolchain and runs on host machine to check correctness of the result. But that custom build test doesn't cover the android/gradle build part. And we cannot use it to measure and track the in-APK size of custom build library. So this PR adds the selective build test coverage for android NDK build. Also integrate with the CI to upload the custom build size to scuba. TODO: Ideally it should build android/test_app and measure the in-APK size. But the test_app hasn't been covered by any CI yet and is currently broken, so build & measure AAR instead (which can be inaccurate as we plan to pack C++ header files into AAR soon). Sample result: https://fburl.com/scuba/pytorch_binary_size/skxwb1gh ``` +---------------------+-------------+-------------------+-----------+----------+ \| build_mode \| arch \| lib \| Build Num \| Size \| +---------------------+-------------+-------------------+-----------+----------+ \| custom-build-single \| armeabi-v7a \| libpytorch_jni.so \| 5901579 \| 3.68 MiB \| \| prebuild \| armeabi-v7a \| libpytorch_jni.so \| 5901014 \| 6.23 MiB \| \| prebuild \| x86_64 \| libpytorch_jni.so \| 5901014 \| 7.67 MiB \| +---------------------+-------------+-------------------+-----------+----------+ ``` Test Plan: Imported from OSS Differential Revision: D22111115 Pulled By: ljk53 fbshipit-source-id: 11d24efbc49a85f851ecd0e481d14123f405b3a9	2020-07-02 21:11:01 -07:00
Jiakai Liu	28e1d241cd	[pytorch] factor out binary size upload command (#40188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40188 Create a custom command for this task to avoid copy/paste for new build jobs. Test Plan: Imported from OSS Differential Revision: D22111114 Pulled By: ljk53 fbshipit-source-id: a7d4d6bbd61ba6b6cbaa137ec7f884736957dc39	2020-07-02 21:08:17 -07:00
Nikolay Korovaiko	3c22c7aadc	infer tensor properties based on an input tensor rather than defaults for xxx_like ctors (#40895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40895 Reviewed By: eellison Differential Revision: D22358878 Pulled By: Krovatkin fbshipit-source-id: 2db2429aa89c180d8e52a6bb1265308483da46a2	2020-07-02 20:56:35 -07:00
Natalia Gimelshein	6095808d22	fix pca_lowrank memory consumption (#40853 ) Summary: Per title, fixes https://github.com/pytorch/pytorch/issues/40768 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40853 Reviewed By: pbelevich Differential Revision: D22363906 Pulled By: ngimel fbshipit-source-id: 966a4b230d351f7632c5cfae4a3b7c9a787bc9a5	2020-07-02 17:52:41 -07:00
Summer Deng	3ca5849f0a	Add serializer and deserializer for Int8QuantSchemeBlob and Int8QuantParamsBlob (#40661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40661 Add ser-de to support int8 quantization during online training Test Plan: ``` buck test caffe2/caffe2/fb/fbgemm:int8_serializer_test ``` Reviewed By: hx89 Differential Revision: D22273292 fbshipit-source-id: 3b1e9c820243acf41044270afce72a262ef92bd4	2020-07-02 17:17:05 -07:00
Ann Shan	f8d4878b3c	check for unsupported instructions when exporting mobile models (#40791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40791 Test Plan: Imported from OSS Differential Revision: D22311469 Pulled By: ann-ss fbshipit-source-id: 7a6abb3f2477e8553f8c71f4aa0442df4f712fb5	2020-07-02 16:24:11 -07:00
Vitaly Fedyunin	3c6b8a6496	Revert D22360735: .circleci: Build docker images as part of CI workflow Test Plan: revert-hammer Differential Revision: D22360735 (`af5bcba217`) Original commit changeset: 4ffbde563fdc fbshipit-source-id: 4ae2288f466703754c9e329d34d344269c70db83	2020-07-02 16:16:31 -07:00
Vitaly Fedyunin	a1c234e372	Revert D22330340: [C2] Fixed a bug in normalization operator Test Plan: revert-hammer Differential Revision: D22330340 (`ce63f70981`) Original commit changeset: 0bccf925bb76 fbshipit-source-id: e27d70dee0fbe9e708b0cf3be81dbd33c4015026	2020-07-02 16:05:23 -07:00
Hao Lu	9cc73966b3	[TVM] Fix build and sync with caffe2/caffe2/python/dlpack.h (#40888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40888 Reviewed By: yinghai Differential Revision: D22326379 fbshipit-source-id: 96ffcff5738973312c49368f53f35bf410e4c0c9	2020-07-02 15:37:45 -07:00
Xiao Wang	b7517a76ba	rshift use default >> operator (#40545 ) Summary: Fix https://github.com/pytorch/pytorch/issues/40032 Also see https://github.com/pytorch/pytorch/pull/35339 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40545 Reviewed By: pbelevich Differential Revision: D22362816 Pulled By: ngimel fbshipit-source-id: 4bbf9212b21a4158badbfee8146b3b67e94d5a33	2020-07-02 15:13:12 -07:00
Yi Huang (Feed)	dec3f918a0	Migrate 'torch.dot' from TH to Aten (CUDA) (#40646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40646 Support double, float, at::Half. Avoid creating output result on CPU. Both of two tensors must be on GPU. Reviewed By: ngimel Differential Revision: D22258840 fbshipit-source-id: 95f4747477f09b40b1d682cd1f76e4c2ba28c452	2020-07-02 14:48:59 -07:00
Nikita Lutsenko	81aebf380e	pytorch \| Fix linking of qnnpack params on windows. (#40920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40920 Pytorch depends on this from both C and C++ source files, so unify linking so it's fully fixed. Test Plan: Build it on Windows Reviewed By: dreiss, supriyar Differential Revision: D22348247 fbshipit-source-id: 2933b4804f4725ab1742914656fa367527f8f7e1	2020-07-02 13:46:20 -07:00
Nikita Lutsenko	a7e09b8727	pytorch \| Namespace init_win symbol in qnnpack. Summary: Namespacing the symbol, since it clashes with "the real thing" otherwise. Test Plan: Sandcastle + build it on windows Reviewed By: dreiss Differential Revision: D22348240 fbshipit-source-id: f9c9a7abc97626ba327605cb4749fc5c38a24d35	2020-07-02 13:37:40 -07:00
Elias Ellison	e1428cf41b	[JIT] fix unfold shape analysis (#40749 ) Summary: unfold on a 0-dimensioned tensor returns a 1-dim tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/40749 Differential Revision: D22361481 Pulled By: eellison fbshipit-source-id: 621597e5f97f6e39953eb86f8b85bb4142527a9f	2020-07-02 13:32:37 -07:00
Pawel Garbacki	ce63f70981	[C2] Fixed a bug in normalization operator (#40925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40925 normalization operator does not handle empty tensors correctly. This is a fix. Test Plan: unit tests Differential Revision: D22330340 fbshipit-source-id: 0bccf925bb768ebb997ed0c88130c5556308087f	2020-07-02 13:24:56 -07:00
Eli Uriegas	af5bcba217	.circleci: Build docker images as part of CI workflow (#40827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40827 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Differential Revision: D22360735 Pulled By: seemethere fbshipit-source-id: 4ffbde563fdc3c49fdd14794ed3c2e881030361d	2020-07-02 13:00:39 -07:00
Yinghai Lu	9f14e48834	Override shape hints with real weight shape extracted from workspace (#40872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40872 Shape hints as its name suggests, is hint. We should use real shape from workspace for the weights. Reviewed By: ChunliF Differential Revision: D22337680 fbshipit-source-id: e7a6101fb613ccb332c3e34b1c2cb8c6c47ce79b	2020-07-02 12:55:29 -07:00
Eileen Pan	db39542509	[2/n][Compute Meta] support analysis for null flag features Summary: ## TLDR Support using NaN default value for missing dense features in RawInputProcessor for DPER2. In preparation for subsequent support for null flag features in compute meta. For train_eval this is already supported in DPER3 and we do not plan to support this in DPER2 train eval. ## Overview Intern project plan to support adding dense flags for missing feature values instead of replacing with zero. ## Project plan : https://docs.google.com/document/d/1OsPUTjpJycwxWLCue3Tnb1mx0uDC_2KKWvC1Rwpo2NI/edit?usp=sharing ## Code paths: See https://fb.quip.com/eFXUA0tbDmNw for the call stack for all affected code paths. Test Plan: ## fblearner flow test 1. `flow-cli clone f197867430 --run-as-secure-group ads_personalization_systems --force-build` to build a ephemeral package and start a fblearner flow run (may fail) 2. Clone the new run and change the secure_group to `XXXX` and entitlement to `default` in the UI 3. Adds explicit_null_min_coverage flag 4. Optionally reduce `max_examples` since we only test pass/fail instead of quality. 5. Submit the run to test the change Example: f198538878 ## compare output coverages to daiquery runs 1. Randomly select null flag features from compute meta workflow output 2. Look up the feature id in feature metadata using feature name 3. Check against a daiquery sample of coverage to see if the coverage falls within guidelines. https://www.internalfb.com/intern/daiquery/workspace/275342740223489/192619942076136/ ## Sampled features: GFF_C66_ADS_USER_SUM_84_PAGE_TYPE_RATIO_EVENT_LIKE_IMPRESSION: 15694257 - original feature compute meta coverage: 0.999992 - daiquery feature coverage (10k rows): 0.69588 - null flag compute meta coverage: 0.293409 GFF_R1303_ADS_USER_SUM_7_PAGE_TYPE_COUNTER_CONVERSION: 16051183 - original feature compute meta coverage: 0.949868 - daiquery feature coverage: 0.82241 - null flag compute meta coverage: 0.151687 ## Unit tests: `buck test fblearner/flow/projects/dper/tests/workflows:ads_test` https://www.internalfb.com/intern/testinfra/testconsole/testrun/6192449504303863/ Differential Revision: D22026450 fbshipit-source-id: 46c59d849fa89253f14dc2b035c4c677cd6e3a4c	2020-07-02 12:44:41 -07:00
Tongzhou Wang	b678666a04	Add `module.training` to docs (#40923 ) Summary: A lot of people ask https://discuss.pytorch.org/t/check-if-model-is-eval-or-train/9395/3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40923 Reviewed By: pbelevich Differential Revision: D22358799 Pulled By: zou3519 fbshipit-source-id: b5465ffedb691fb4811e097c4dbd7bbc405be09c	2020-07-02 12:36:59 -07:00
Rohan Varma	6ae3cd0d9d	Configure RPC metrics handlers and pass them into Thrift RPC Agent (#40602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40602 Reviewed By: pritamdamania87 Differential Revision: D22250592 fbshipit-source-id: d38131f30939fc26af241b40e057a9dc1109e950	2020-07-02 11:41:21 -07:00
Adam Teichert	6aabd12390	fix issue #31759 (allow valid ASCII python identifiers as dimnames) (#40871 ) Summary: Fixes issue https://github.com/pytorch/pytorch/issues/31759: - Changes is_valid_identifier check on named tensor dimensions to allow digits if they are not at the beginning of the name (this allows exactly the ASCII subset of [valid python identifiers](https://docs.python.org/3/reference/lexical_analysis.html#identifiers)). - Updates error message for illegal dimension names. - Updates and adds relevant tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40871 Reviewed By: pbelevich Differential Revision: D22357314 Pulled By: zou3519 fbshipit-source-id: 9550a1136dd0673dd30a5cd5ade28069ba4c9086	2020-07-02 11:35:54 -07:00
peter	5db5a0f2bb	Re-enable Caffe2 test `RoiAlignTest.CheckCPUGPUEqual` (#40901 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35547. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40901 Differential Revision: D22357760 Pulled By: malfet fbshipit-source-id: 43f7dc13a905416288a9a317ae31a4dc78276ce4	2020-07-02 11:22:23 -07:00
Venkata Chintapalli	1a74bb84f2	Remove Int8FC diff restriction. Summary: Remove Int8FC diff restriction. Test Plan: test_int8_ops_nnpi.py Reviewed By: hyuen Differential Revision: D22353200 fbshipit-source-id: c6c80c9dda3245c02da8343ecd5689994baf0143	2020-07-02 08:15:31 -07:00
Nikita Shulga	591fffc524	Type-annotate serialization.py (#40862 ) Summary: Move Storage class from __init__.pyi.in to types.py and make it a protocol, since this is not a real class Expose `PyTorchFileReader` and `PyTorchFileWriter` native classes Ignore function attributes, as there are yet no good way to type annotate those, see https://github.com/python/mypy/issues/2087 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40862 Differential Revision: D22344743 Pulled By: malfet fbshipit-source-id: 95cdb6f980ee79383960f306223e170c63df3232	2020-07-02 07:10:55 -07:00
Michael Suo	9fa1f27968	[jit] Fix value association with dictionaries in the tracer (#40885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40885 `TracingState::setValue` associates a concrete IValue in the traced program with a `Value*` symbolic. Previously, the logic for how GenericDicts worked was special cased to only work for very simple cases and silently eat other cases. This PR generalizes the logic to reflect the same behavior as using dictionaries on input: whenever we encounter a dictionary in the system, we completely "burn in" all the keys into the graph, and then recursively call `setValue` on the associated value. This has the effect of requiring that any dictionary structure you are creating in a traced program be of fixed structure, similar to how any dictionary used as input must be static as well. Test Plan: Imported from OSS Differential Revision: D22342490 Pulled By: suo fbshipit-source-id: 93e610a4895d61d9b8b19c8d2aa4e6d57777eaf6	2020-07-02 04:09:35 -07:00
Hao Lu	59294fbbb9	[caffe2] Reimplement RemoveOpsByType with SSA (#40649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40649 The original implementation of RemoveOpsByType is pretty buggy and does not remove all instances of the ops that should be removed. It's also quite complicated and hard to modify. I reimplemented it by first converting the graph to its SSA form. The algorithm is quite simple once the graph is in SSA form. It's very similar to constant propagation with a few modifications. The hardest part is to deal with the case of removing an op with the output being an output of the predict net, because that output has to be preserved. (Note: this ignores all push blocking failures!) Reviewed By: yinghai, dzhulgakov Differential Revision: D22220798 fbshipit-source-id: faf6ed5242f1e2f310125d964738c608c6c55c94	2020-07-02 02:45:36 -07:00
Spandan Tiwari	ea03f954ad	[ONNX] Add warning in ONNX export when constant folding is on in training-amenable mode (#40546 ) Summary: This PR introduces a warning when user tries to export the model to ONNX in training-amenable mode while constant folding is turned on. We want to warn against any unintentional use because constant folding may fold some parameters that may be intended to be trainable in the exported model. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40546 Reviewed By: hl475 Differential Revision: D22310917 Pulled By: houseroad fbshipit-source-id: ba83b8e63af7c458b5ecca8ff2ee1c77e2064f90	2020-07-01 21:40:38 -07:00
Nikita Shulga	73f11dc3d1	`torch._six.PY37` should be true for Python-3.8 as well (#40868 ) Summary: Right now it is used to check whether `math.remainder` exists, which is the case for both Python-3.7 and 3.8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40868 Differential Revision: D22343454 Pulled By: malfet fbshipit-source-id: 6b6d4869705b64c4b952309120f92c04ac7e39fd	2020-07-01 19:49:37 -07:00
Sebastian Messmer	8f6e50d013	Make some more ops c10-full (#40747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40747 - ghstack-source-id: 106833603 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D22299161 fbshipit-source-id: 6e34999b5f8244d9582e4978754039d340720ca8	2020-07-01 19:39:32 -07:00
Sebastian Messmer	d7c9f96e43	Optimize perf for calling ops with custom classes (#38257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38257 It seems we're doing a runtime type check for custom classes on each operator call if the operator has custom class arguments. This does not have an effect on operators without custom class arguments, but this is a problem for operators with custom class arguments, for example operators taking a at::native::xnnpack::Conv2dOpContext argument. The long term solution would be to move those checks to op registration time instead of doing them at call time, but as an intermediate fix, we can at least make the check fast by - Using ska::flat_hash_map instead of std::unordered_map - Using std::type_index instead of std::string (i.e. avoid calling std::hash on a std::string) ghstack-source-id: 106805209 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D21507226 fbshipit-source-id: bd120d5574734be843c197673ea4222599fee7cb	2020-07-01 19:28:29 -07:00
vfdev-5	2f47e953f7	Fixes #40158 (#40617 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40158 Description - docs update: removed incorrect statements Pull Request resolved: https://github.com/pytorch/pytorch/pull/40617 Reviewed By: ezyang Differential Revision: D22308802 Pulled By: yns88 fbshipit-source-id: e33084af320f249c0c9ba04bdbe2191d1b954d17	2020-07-01 18:05:44 -07:00
Yanli Zhao	04b6e4273e	clang format reducer.cpp (#40876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40876 clang format reducer.cpp ghstack-source-id: 106980050 Test Plan: unit test Differential Revision: D22321422 fbshipit-source-id: 54afdff206504c7bbdf2e408928cc32068e15cdc	2020-07-01 17:24:37 -07:00
Ailing Zhang	ad30d465d5	Move install_torchvision to common.sh so that it can be sourced. (#40828 ) Summary: Moving this to a file that can be source by downstream pytorch/xla. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40828 Reviewed By: malfet Differential Revision: D22339513 Pulled By: ailzhang fbshipit-source-id: c43b18fa2b7e1e8bb6810a6a43bb7dccd4756238	2020-07-01 16:40:43 -07:00
Omkar Salpekar	49e12d888a	[NCCL - reland] Explicitly abort NCCL Communicators on Process Group Destruction (#40585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40585 This PR aborts incomplete NCCL Communicators in the ProcessGroupNCCL destructor. This should prevent pending NCCL communicators from blocking other CUDA ops. ghstack-source-id: 106988073 Test Plan: Sandcastle/ OSS CI Differential Revision: D22244873 fbshipit-source-id: 4b4fe65e1bd875a50151870f8120498193d7535e	2020-07-01 16:21:16 -07:00
Sanjeev Kumar	af34f2f63b	Added missing generator argument in type annotation(pytorch#40803) (#40873 ) Summary: Added missing generator argument in type annotation(pytorch#40803) Pull Request resolved: https://github.com/pytorch/pytorch/pull/40873 Differential Revision: D22344217 Pulled By: malfet fbshipit-source-id: 9871401b97c96fa20c70e3f66334259ead1f8429	2020-07-01 16:05:18 -07:00
Xiong Wei	c73255801f	Fix the autograd codegen for repeat function (#40766 ) Summary: Fix https://github.com/pytorch/pytorch/issues/40701 A new special case is added to let `dim()` save an int instead of self. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40766 Differential Revision: D22308354 Pulled By: albanD fbshipit-source-id: 69008230d7398b9e06b8e074a549ae921c2bf603	2020-07-01 15:43:28 -07:00
Supriya Rao	26543e6caf	[quant][graphmode] FP16 quant support - Operator Fusion (#40710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40710 Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D22335975 fbshipit-source-id: 5c176bb6b9c300e1beb83df972149dd5a400b854	2020-07-01 14:15:53 -07:00
Supriya Rao	55b5ab14d3	[quant][graphmode] FP16 quant support - Insert cast operators (#40709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40709 Cast to kHalf and back to kFloat before the linear operator to mimic FP16 quant support Test Plan: python test/test_quantization.py test_convert_dynamic_fp16 Imported from OSS Differential Revision: D22335977 fbshipit-source-id: f964128ec733469672a1ed4cb0d757d0a6c22c3a	2020-07-01 14:15:51 -07:00
Supriya Rao	6aebd2c412	[quant][graphmode] Add FP16 quant support - Insert Noop Observers (#40708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40708 Insert NoopObservers for activations and weight tensors for FP16 Test Plan: python test/test_quantization.py test_prepare_dynamic Imported from OSS Differential Revision: D22335976 fbshipit-source-id: b19e8035c7db3b0b065ec09c9ad6d913eb434f3e	2020-07-01 14:13:31 -07:00
Nikita Shulga	d1352192e2	Move `OperatorBase::AddRelatedBlobInfo` implementation to .cc file (#40844 ) Summary: If virtual function is implemented in header file, it's implementation will be included as a weak symbol to every shared library that includes this header along with all of it's dependencies. This was one of the reasons why size of libcaffe2_module_test_dynamic.so was 500Kb (AddRelatedBlobInfo implementation pulled a quarter of libprotobuf.a with it) Combination of this and https://github.com/pytorch/pytorch/issues/40845 reduces size of `libcaffe2_module_test_dynamic.so` from 500kb to 50Kb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40844 Differential Revision: D22334725 Pulled By: malfet fbshipit-source-id: 836a4cbb9f344355ddd2512667e77472546616c0	2020-07-01 11:48:15 -07:00
Nikita Shulga	cbdf399fc6	Move OperatorSchema default inference function implementations to .cc… (#40845 ) Summary: … file This prevents implementation of those functions(as lambdas) to be embedded as weak symbol into every shared library that includes this header. Combination of this and https://github.com/pytorch/pytorch/pull/40844 reduces size of `libcaffe2_module_test_dynamic.so` from 500kb to 50Kb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40845 Differential Revision: D22334779 Pulled By: malfet fbshipit-source-id: 64706918fc2947350a58c0877f294b1b8b085455	2020-07-01 11:42:52 -07:00
peter	c71ec1c717	Fix zip serialization for file > 2GiB for Windows (#40783 ) Summary: `long long == int64_t != long` in MSVC Pull Request resolved: https://github.com/pytorch/pytorch/pull/40783 Differential Revision: D22328757 Pulled By: ezyang fbshipit-source-id: bc7301d6b0e7e00ee6d7ca8637e3fce7810b15e2	2020-07-01 08:15:27 -07:00
Ivan Kobzarev	a0569ad8f8	[android][readme] Aar native linking add fbjni (#40578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40578 Test Plan: Imported from OSS Differential Revision: D22239286 Pulled By: IvanKobzarev fbshipit-source-id: 7a4160b621af8cfcc3b3d9e6da1a75c8afefba27	2020-07-01 08:09:17 -07:00
Wojciech Baranowski	fcadca1bda	serialization: validate sparse tensors after loading (#34059 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33439 This introduces torch._sparse_coo_tensor_unsafe(...) and torch._validate_sparse_coo_tensor_args(...) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34059 Differential Revision: D22161254 Pulled By: ezyang fbshipit-source-id: 994efc9b0e30abbc23ddd7b2ec987e6ba08a8ef0	2020-06-30 22:31:21 -07:00
Edward Yang	5f9e7240f5	Fix bug where explicitly providing a namespace never worked. (#40830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40830 Fixes #40725 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D22323886 Pulled By: ezyang fbshipit-source-id: b8a61496923d9f086d4c201024748505ba783238	2020-06-30 22:20:05 -07:00
Hong Xu	2cf9fe2d92	Remove more error-exposing tests in exp that cannot be reliably reproduced (#40825 ) Summary: Continuing https://github.com/pytorch/pytorch/issues/40824 All CIs have been enabled (on a branch that starts with `ci-all/`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/40825 Differential Revision: D22328732 Pulled By: ezyang fbshipit-source-id: 3e517d01a9183d95df0687b328fb268947ea5fb0	2020-06-30 22:14:32 -07:00
Xingdong Zuo	f13653db29	[Update transforms.py]use build-in `atanh` in TanhTransform (#40160 ) Summary: Since `torch.atanh` is recently implemented in https://github.com/pytorch/pytorch/issues/38388, we should simply use it for `TanhTransform`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40160 Differential Revision: D22208039 Pulled By: ezyang fbshipit-source-id: 34dfbc91eb9383461e16d3452e3ebe295f39df26	2020-06-30 21:38:22 -07:00
Ashkan Aliabadi	fbcf419173	Respect user set thread count. (#40707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40707 Test Plan: Imported from OSS Differential Revision: D22318197 Pulled By: AshkanAliabadi fbshipit-source-id: f11b7302a6e91d11d750df100d2a3d8d96b5d1db	2020-06-30 20:14:49 -07:00
Jiayu Liu	0203d70c63	[nit] fix some typo within documentation (#40692 ) Summary: Apologize if this seems trivial, but i'd like to fix them on my way of reading some of the source code. Thanks! Pull Request resolved: https://github.com/pytorch/pytorch/pull/40692 Differential Revision: D22284651 Pulled By: mrshenli fbshipit-source-id: 4259d1808aa4d15a02cfd486cfb44dd75fdc58f8	2020-06-30 19:24:44 -07:00
Ilia Cherniavskii	8e0714a60d	[rfc] Reduce number of coin flips in RecordFunction (#40758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40758 Currently we flip a coin for each sampled callback each time we run RecordFunction, this PR is an attempt to skip most of the coin flips (for the low-probability observers) and keep the distribution close to the original one Test Plan: CI and record_function_benchmark ``` (python_venv) iliacher@devgpu151:~/local/pytorch (reduce_coin_flops)$ ./build/bin/record_function_benchmark Warmup time: 30108 us. Time per iteration (1x1): 1496.78 us. Time per iteration (16x16): 2142.46 us. Pure RecordFunction runtime of 10000000 iterations 687929 us, number of callback invocations: 978 (python_venv) iliacher@devgpu151:~/local/pytorch (reduce_coin_flops)$ ./build/bin/record_function_benchmark Warmup time: 19051 us. Time per iteration (1x1): 1581.89 us. Time per iteration (16x16): 2195.67 us. Pure RecordFunction runtime of 10000000 iterations 682402 us, number of callback invocations: 1023 (python_venv) iliacher@devgpu151:~/local/pytorch (reduce_coin_flops)$ ./build/bin/record_function_benchmark Warmup time: 18715 us. Time per iteration (1x1): 1566.11 us. Time per iteration (16x16): 2131.17 us. Pure RecordFunction runtime of 10000000 iterations 693571 us, number of callback invocations: 963 (python_venv) iliacher@devgpu151:~/local/pytorch (reduce_coin_flops)$ (python_venv) iliacher@devgpu151:~/local/pytorch (reduce_coin_flops)$ ./build/bin/record_function_benchmark Warmup time: 18814 us. Time per iteration (1x1): 1536.2 us. Time per iteration (16x16): 1985.82 us. Pure RecordFunction runtime of 10000000 iterations 944959 us, number of callback invocations: 1015 (python_venv) iliacher@devgpu151:~/local/pytorch (reduce_coin_flops)$ ./build/bin/record_function_benchmark Warmup time: 18278 us. Time per iteration (1x1): 1526.32 us. Time per iteration (16x16): 2093.77 us. Pure RecordFunction runtime of 10000000 iterations 985307 us, number of callback invocations: 1013 (python_venv) iliacher@devgpu151:~/local/pytorch (reduce_coin_flops)$ ./build/bin/record_function_benchmark Warmup time: 18545 us. Time per iteration (1x1): 1524.65 us. Time per iteration (16x16): 2080 us. Pure RecordFunction runtime of 10000000 iterations 952835 us, number of callback invocations: 1048 ``` Reviewed By: dzhulgakov Differential Revision: D22320879 Pulled By: ilia-cher fbshipit-source-id: 2193f07d2f7625814fe7bc3cc85ba4092fe036bc	2020-06-30 17:23:00 -07:00
Michael Suo	179dbd4f25	[jit] preserve keys on dictionary input tracing (#40792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40792 Fixes https://github.com/pytorch/pytorch/issues/40529. One followup should be to produce a better error message when a new dictionary has different keys than the traced input. Right now it presents as a fairly opaque `KeyError`. Test Plan: Imported from OSS Differential Revision: D22311731 Pulled By: suo fbshipit-source-id: c9fbe0b54cf69daed2f11a191d988568521a3932	2020-06-30 16:50:36 -07:00
Shai Szulanski	0ddaaf6a92	[codemod][caffe2] Run clang-format - 5/7 Summary: This directory is opted-in to clang-format but is not format-clean. This blocks continuous formatting from being enabled on fbcode, and causes hassle for other codemods that leave inconsistent formatting. This diff runs clang-format, which is widely used and considered safe. If you are unhappy with the formatting of a particular block, please accept this diff and then in a stacked commit undo the change and wrap that code in `// clang-format off` and `// clang-format on`, or `/* clang-format off /` and `/ clang-format on */`. drop-conflicts Test Plan: sandcastleit Reviewed By: jerryzh168 Differential Revision: D22311706 fbshipit-source-id: 1ca59a82e96156a4a5dfad70ba3e64d44c5e762a	2020-06-30 15:45:11 -07:00
Hong Xu	29aef8f460	Skip some error-producing exp tests that cannot be reliably reproduced (#40824 ) Summary: This is to take care of additional master CI tests for https://github.com/pytorch/pytorch/issues/39087 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40824 Differential Revision: D22321429 Pulled By: ezyang fbshipit-source-id: 607e284688b3e4ce24d803a030e31991e4e32fd7	2020-06-30 15:39:09 -07:00
Linyuan Gong	0a75234934	Allow np.memmap objects (numpy arrays based on files) to be processed… (#39847 ) Summary: Allow np.memmap objects to be processed by default_collate np.memmap objects has the same behavior as numpy arrays, and the only difference is that they are stored in a binary file on the disk. However, the default_collate function used by PyTorch DataLoader only accepts np.array, and rejects np.memmap by type checking. This commit allows np.memmap objects to be processed by default_collate. In this way, users can use in-disk large arrays with PyTorch DataLoader. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39847 Reviewed By: ezyang Differential Revision: D22284650 Pulled By: zou3519 fbshipit-source-id: 003e3208a2afd1afc2e4640df14b3446201e00b4	2020-06-30 15:00:20 -07:00
Rui Liu	9d8dc0318b	[pruning] add rowwise counter to sparse adagrad Summary: Use the newly added counter op in sparse adagrad Reviewed By: chocjy, ellie-wen Differential Revision: D19221100 fbshipit-source-id: d939d83e3b5b3179f57194be2e8864d0fbbee2c1	2020-06-30 14:40:02 -07:00
peter	40e79bb1d3	Update the version of ninja and scipy (#40677 ) Summary: Update scipy to 1.15 and ninja to 1.10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40677 Differential Revision: D22311602 Pulled By: ezyang fbshipit-source-id: ddc852b3b8c3091409d1b3bd579dd144b58e5d47	2020-06-30 14:29:40 -07:00
Pritam Damania	e762ce8ecf	Avoid initializing `new_group` in test_backward_no_ddp. (#40727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40727 This unit test doesn't need to initialize a PG, as a result avoiding initializing a process group. #Closes: https://github.com/pytorch/pytorch/issues/40292 ghstack-source-id: 106817362 Test Plan: waitforbuildbot Differential Revision: D22295131 fbshipit-source-id: 5a60e91e4beeb61cc204d24c564106d0215090a6	2020-06-30 14:01:05 -07:00
Gao, Xiang	5a4911834d	Add CUDA11 build and test (#40452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40452 Differential Revision: D22316007 Pulled By: malfet fbshipit-source-id: 94f4b4ba2a46ff3d3042ba842a615f8392cdc350	2020-06-30 13:50:44 -07:00
Nikita Shulga	1571dd8692	Refactor duplicated string literals (#40788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40788 Avoid repeated the same `:gencode[foo/bar]` over and over again Test Plan: CI Reviewed By: EscapeZero Differential Revision: D22271151 fbshipit-source-id: f8db57db4ee0948bcca0c8945fdf30380ba81cae	2020-06-30 13:45:02 -07:00
peter	6e4f99b063	Fix wrong MSVC version constraint for CUDA 9.2 (#40794 ) Summary: Tested with https://github.com/pytorch/pytorch/pull/40782. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40794 Differential Revision: D22318045 Pulled By: malfet fbshipit-source-id: a737ffd7cb8a6a9efb62b84378318f4c3800ad8f	2020-06-30 13:02:45 -07:00
Edward Yang	9ac0febb1f	Pin torchvision version for doc_push (#40802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40802 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D22317343 Pulled By: ezyang fbshipit-source-id: 8a982dd93a28d102dfd63163cd44704e899922e0	2020-06-30 12:52:13 -07:00
Taylor Robie	f3949794a3	Prototype benchmarking util (#38338 ) Summary: This is the prototype for the modular utils that we've been discussing. It is admittedly a large PR, but a good fraction of that is documentation and examples. I've trimmed a bit on the edges since we last discussed this design (for instance Timer is no longer Fuzzer aware), but it's mostly the same. In addition to the library and hermetic examples, I've included `examples.end_to_end` which tests https://github.com/pytorch/pytorch/pull/38061 over a variety of shapes, dtypes, degrees of broadcasting, and layouts. (CC crcrpar) I only did CPU as I'm not set up on a GPU machine yet. [Results from my devserver](https://gist.github.com/robieta/d1a8e1980556dc3f4f021c9f7c3738e2) Key takeaways: 1) For contiguous Tensors, larger dtypes (fp32 and fp64) and lots of reuse of the mask due to broadcasting, improvements are significant. (Presumably due to better vectorization?) 2) There is an extra ~1.5 us overhead, which dominates small kernels. 3) Cases with lower write intensity (int8, lower mask fraction, etc) or non-contiguous seem to suffer. Hopefully this demonstrates the proof-of-concept for how this tooling can be used to tune kernels and assess PRs. Looking forward to thoughts and feedback. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38338 Differential Revision: D21551048 Pulled By: robieta fbshipit-source-id: 6c50e5439a04eac98b8a2355ef731852ba0500db	2020-06-30 11:31:27 -07:00
anjali411	c648cd372f	Fix complex printing for sci_mode=True (#40513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40513 This PR makes the following changes: 1. Complex Printing now uses print formatting for it's real and imaginary values and they are joined at the end. 2. Adding 1. naturally fixes the printing of complex tensors in sci_mode=True ``` >>> torch.tensor(float('inf')+float('inf')*1j) tensor(nan+infj) >>> torch.randn(2000, dtype=torch.cfloat) tensor([ 0.3015-0.2502j, -1.1102+1.2218j, -0.6324+0.0640j, ..., -1.0200-0.2302j, 0.6511-0.1889j, -0.1069+0.1702j]) >>> torch.tensor([1e-3, 3+4j, 1e-5j, 1e-2+3j, 5+1e-6j]) tensor([1.0000e-03+0.0000e+00j, 3.0000e+00+4.0000e+00j, 0.0000e+00+1.0000e-05j, 1.0000e-02+3.0000e+00j, 5.0000e+00+1.0000e-06j]) >>> torch.randn(3, dtype=torch.cfloat) tensor([ 1.0992-0.4459j, 1.1073+0.1202j, -0.2177-0.6342j]) >>> x = torch.tensor([1e2, 1e-2]) >>> torch.set_printoptions(sci_mode=False) >>> x tensor([ 100.0000, 0.0100]) >>> x = torch.tensor([1e2, 1e-2j]) >>> x tensor([100.+0.0000j, 0.+0.0100j]) ``` Test Plan: Imported from OSS Differential Revision: D22309294 Pulled By: anjali411 fbshipit-source-id: 20edf9e28063725aeff39f3a246a2d7f348ff1e8	2020-06-30 11:13:42 -07:00
Mikhail Zolotukhin	871bfaaba1	[JIT] Fix shape analysis for aten::masked_select. (#40753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40753 The reference says that this op always returns a 1-D tensor, even if the input and the mask are 0-D. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D22300354 Pulled By: ZolotukhinM fbshipit-source-id: f6952989c8facf87d73d00505bf6d41573eff2d6	2020-06-30 11:04:50 -07:00
Mikhail Zolotukhin	50d55b9f2b	[JIT] Update type of the unsqueeze's output in shape analysis. (#40733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40733 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D22298537 Pulled By: ZolotukhinM fbshipit-source-id: a5d4597ed10bcf14d1b28e914bf898d0cae5b4c0	2020-06-30 11:01:45 -07:00
Nikita Shulga	c3237c7a87	Print hostname of RoCM tester (#40755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40755 Differential Revision: D22311699 Pulled By: malfet fbshipit-source-id: 057702800fec84fae787b7837f39348273c80cec	2020-06-30 10:56:31 -07:00
Hong Xu	a303fd2ea6	Let exp support complex types on CUDA and enable device/dtype in complex tests (#39087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39087 Differential Revision: D22169697 Pulled By: anjali411 fbshipit-source-id: 4866b7be6742508cc40540ed1ac811f005531d8b	2020-06-30 10:50:40 -07:00
Tongzhou Wang	ef5a314597	[typing] fix register_buffer/parameter (#40669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40669 Differential Revision: D22286130 Pulled By: ezyang fbshipit-source-id: c0cc173279678978726895a0830343d5234e474e	2020-06-30 10:39:32 -07:00
Yanghan Wang	5923a802fa	Back out "[pytorch][PR] [ONNX] Add eliminate_unused_items pass" Summary: Original commit changeset: 30e1a6e8823a cause issue to fusing BN Test Plan: revert Reviewed By: houseroad Differential Revision: D22296958 fbshipit-source-id: 62664cc77baa8811ad6ecce9d0520a2ab7f89868	2020-06-30 10:26:35 -07:00
James Reed	3ecae99dd9	Support Pathlike for zipfile serialization (#40723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40723 Test Plan: Imported from OSS Differential Revision: D22294575 Pulled By: jamesr66a fbshipit-source-id: b157fa0ab02c4eb22cb99ac870942aeab352b0c5	2020-06-30 10:07:23 -07:00
Mike Ruberry	c56255499a	Reverts running clang-tidy on ATen (#40764 ) Summary: Reverts https://github.com/pytorch/pytorch/pull/39713. We are seeing CUDA-related clang-tidy failures on multiple PRs after the above change. The cause of these failures is unclear. Example error message: ``` 2020-06-26T18:45:10.9763273Z + python tools/clang_tidy.py --verbose --paths torch/csrc/ aten/src/ATen/ --diff 5036c94a6e868963e0354fc04c92e204d8d77677 -g-torch/csrc/jit/serialization/export.cpp -g-torch/csrc/jit/serialization/import.cpp -g-torch/csrc/jit/serialization/import_legacy.cpp -g-torch/csrc/onnx/init.cpp '-g-torch/csrc/cuda/nccl.' -g-torch/csrc/cuda/python_nccl.cpp 2020-06-26T18:45:11.1990578Z Error while processing /home/runner/work/pytorch/pytorch/aten/src/ATen/native/cuda/UnaryOpsKernel.cu. 2020-06-26T18:45:11.1992832Z Found compiler error(s). 2020-06-26T18:45:11.2286995Z Traceback (most recent call last): 2020-06-26T18:45:11.2288334Z File "tools/clang_tidy.py", line 55, in run_shell_command 2020-06-26T18:45:11.2288607Z output = subprocess.check_output(arguments).decode().strip() 2020-06-26T18:45:11.2289053Z File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/subprocess.py", line 411, in check_output 2020-06-26T18:45:11.2289337Z return run(popenargs, stdout=PIPE, timeout=timeout, check=True, 2020-06-26T18:45:11.2289786Z File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/subprocess.py", line 512, in run 2020-06-26T18:45:11.2290038Z raise CalledProcessError(retcode, process.args, 2020-06-26T18:45:11.2292206Z subprocess.CalledProcessError: Command '['clang-tidy', '-p', 'build', '-config', '{"Checks": "-, bugprone-, -bugprone-forward-declaration-namespace, -bugprone-macro-parentheses, -bugprone-lambda-function-name, cppcoreguidelines-, -cppcoreguidelines-interfaces-global-init, -cppcoreguidelines-owning-memory, -cppcoreguidelines-pro-bounds-array-to-pointer-decay, -cppcoreguidelines-pro-bounds-constant-array-index, -cppcoreguidelines-pro-bounds-pointer-arithmetic, -cppcoreguidelines-pro-type-cstyle-cast, -cppcoreguidelines-pro-type-reinterpret-cast, -cppcoreguidelines-pro-type-static-cast-downcast, -cppcoreguidelines-pro-type-union-access, -cppcoreguidelines-pro-type-vararg, -cppcoreguidelines-special-member-functions, hicpp-exception-baseclass, hicpp-avoid-goto, modernize-, -modernize-return-braced-init-list, -modernize-use-auto, -modernize-use-default-member-init, -modernize-use-using, -modernize-use-trailing-return-type, performance-, -performance-noexcept-move-constructor, ", "HeaderFilterRegex": "torch/csrc/.", "AnalyzeTemporaryDtors": false, "CheckOptions": null}', '-line-filter', '[{"name": "aten/src/ATen/native/cuda/UnaryOpsKernel.cu", "lines": [[10, 11], [29, 30]]}]', 'aten/src/ATen/native/cuda/UnaryOpsKernel.cu']' returned non-zero exit status 1. 2020-06-26T18:45:11.2292551Z 2020-06-26T18:45:11.2292684Z During handling of the above exception, another exception occurred: 2020-06-26T18:45:11.2292775Z 2020-06-26T18:45:11.2292894Z Traceback (most recent call last): 2020-06-26T18:45:11.2293208Z File "tools/clang_tidy.py", line 306, in <module> 2020-06-26T18:45:11.2293364Z main() 2020-06-26T18:45:11.2293817Z File "tools/clang_tidy.py", line 298, in main 2020-06-26T18:45:11.2293980Z clang_tidy_output = run_clang_tidy(options, line_filters, files) 2020-06-26T18:45:11.2294282Z File "tools/clang_tidy.py", line 191, in run_clang_tidy 2020-06-26T18:45:11.2294439Z output = run_shell_command(command) 2020-06-26T18:45:11.2294703Z File "tools/clang_tidy.py", line 59, in run_shell_command 2020-06-26T18:45:11.2294931Z raise RuntimeError("Error executing {}: {}".format(" ".join(arguments), error_output)) 2020-06-26T18:45:11.2296875Z RuntimeError: Error executing clang-tidy -p build -config {"Checks": "-, bugprone-, -bugprone-forward-declaration-namespace, -bugprone-macro-parentheses, -bugprone-lambda-function-name, cppcoreguidelines-, -cppcoreguidelines-interfaces-global-init, -cppcoreguidelines-owning-memory, -cppcoreguidelines-pro-bounds-array-to-pointer-decay, -cppcoreguidelines-pro-bounds-constant-array-index, -cppcoreguidelines-pro-bounds-pointer-arithmetic, -cppcoreguidelines-pro-type-cstyle-cast, -cppcoreguidelines-pro-type-reinterpret-cast, -cppcoreguidelines-pro-type-static-cast-downcast, -cppcoreguidelines-pro-type-union-access, -cppcoreguidelines-pro-type-vararg, -cppcoreguidelines-special-member-functions, hicpp-exception-baseclass, hicpp-avoid-goto, modernize-, -modernize-return-braced-init-list, -modernize-use-auto, -modernize-use-default-member-init, -modernize-use-using, -modernize-use-trailing-return-type, performance-, -performance-noexcept-move-constructor, ", "HeaderFilterRegex": "torch/csrc/.", "AnalyzeTemporaryDtors": false, "CheckOptions": null} -line-filter [{"name": "aten/src/ATen/native/cuda/UnaryOpsKernel.cu", "lines": [[10, 11], [29, 30]]}] aten/src/ATen/native/cuda/UnaryOpsKernel.cu: error: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice. [clang-diagnostic-error] 2020-06-26T18:45:11.2313329Z error: unable to handle compilation, expected exactly one compiler job in ' "/usr/bin/c++" "-cc1" "-triple" "x86_64-pc-linux-gnu" "-aux-triple" "nvptx64-nvidia-cuda" "-fsyntax-only" "-disable-free" "-disable-llvm-verifier" "-discard-value-names" "-main-file-name" "UnaryOpsKernel.cu" "-mrelocation-model" "pic" "-pic-level" "2" "-mthread-model" "posix" "-fno-trapping-math" "-masm-verbose" "-mconstructor-aliases" "-munwind-tables" "-fuse-init-array" "-target-cpu" "x86-64" "-dwarf-column-info" "-debugger-tuning=gdb" "-momit-leaf-frame-pointer" "-resource-dir" "/usr/lib/llvm-8/bin/../lib/clang/8.0.1" "-internal-isystem" "/usr/lib/llvm-8/bin/../lib/clang/8.0.1/include/cuda_wrappers" "-internal-isystem" "/usr/local/cuda/include" "-include" "__clang_cuda_runtime_wrapper.h" "-isystem" "/home/runner/work/pytorch/pytorch/build/third_party/gloo" "-isystem" "/home/runner/work/pytorch/pytorch/cmake/../third_party/gloo" "-isystem" ``` My guess is that our clang-tidy build is improperly configured to handle CUDA code. Until that issue is resolved this stops running clang-tidy on ATen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40764 Differential Revision: D22310032 Pulled By: mruberry fbshipit-source-id: 035067e1017f0097026cee9866bba424dd4668b4	2020-06-30 09:35:55 -07:00
Eli Uriegas	3cc18d7139	.circleci: Remove executor from windows uploads (#40742 ) Summary: This wasn't needed and broke nightly builds Fixes some issues introduced in https://github.com/pytorch/pytorch/pull/40592/files Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/40742 Differential Revision: D22310055 Pulled By: seemethere fbshipit-source-id: 095be3be06a730138d860ca6b73eaf22c24cf08f	2020-06-30 09:29:29 -07:00
Richard Zou	a6a31bcd47	Enable `out_dims` for vmap frontend API (#40576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40576 `out_dims` specifies where in the output tensors the vmapped dimension should appear. We implement this by simply creating a view with the batch dimension moved to the desired position. `out_dims` must either: - be int (use the same value for all outputs) - be Tuple[int] (so the user specifies one out_dim per output). (See the vmap docstring for what we advertise out_dims to do). I also renamed `TestVmap` to `TestVmapAPI` to make it clearer that we are testing the API here and not specific operators (which will go into their own test class). Test Plan: - `pytest test/test_vmap.py -v` Differential Revision: D22288086 Pulled By: zou3519 fbshipit-source-id: c8666cb1a0e22c54473d8045477e14c2089167cf	2020-06-30 08:20:39 -07:00
Richard Zou	2f94b7f95c	Initial vmap docstring (#40575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40575 This provides some more context for the next ~2 PRs that will implement the `out_dims` and `in_dims` functionality. I will probably add more to it later (things I think we should add: examples (maybe in a dedicated docs page), specific examples of things vmap cannot handle). Test Plan: - Code reading for now. When we are ready to add vmap to master documentation, I'll build the docs and fix any formatting problems. Differential Revision: D22288085 Pulled By: zou3519 fbshipit-source-id: 6e28d7bd524242395160c20270159b4b121d6789	2020-06-30 08:18:20 -07:00
lixinyu	4a235b87be	pop warning message for cuda module when asan is built in (#35088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35088 Test Plan: Imported from OSS Differential Revision: D20552708 Pulled By: glaringlee fbshipit-source-id: 0b809712378596ccf83211bf8ae39cd71c27dbba	2020-06-30 08:00:37 -07:00
kshitij12345	4104ab8b18	Add `torch.count_nonzero` (#39992 ) Summary: Reference https://github.com/pytorch/pytorch/issues/38349 TODO: * [x] Add tests * [x] Add docs (pending add to docs.rst) Pull Request resolved: https://github.com/pytorch/pytorch/pull/39992 Reviewed By: ezyang Differential Revision: D22236738 Pulled By: mruberry fbshipit-source-id: 8520068b086b5ffc4de9e4939e746ff889293987	2020-06-30 06:39:13 -07:00
Venkata Chintapalli	31de10a392	Int8FC dequantize fix (#40608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40608 Changes to fix uint8_t to fp16 dequantization error. Enabled test_int8_quantize (Note: this ignores all push blocking failures!) Test Plan: Verified with test_int8_ops_nnpi.py Reviewed By: hyuen Differential Revision: D22252860 fbshipit-source-id: bb44673327f0c8f44974cef2ab773aa0d89f4dc7	2020-06-30 06:20:09 -07:00
Hector Yuen	b9cca4b186	fix range of results for pairwise operations (#40728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40728 there are two reasons the test is failing: 1) div by 0 2) result is bigger than fp16 max for 1) make the divisor some safe number like 1e-3 2) when a combination of random numbers results in their resulting value bigger than 65e3, clip multiplication is fine because range of random numbers is 0,100 -> result is 0->10000 Test Plan: ran tes_div test Reviewed By: hl475 Differential Revision: D22295934 fbshipit-source-id: 173f3f2187137d6c1c4d4a505411a27f1c059f1a	2020-06-29 23:49:08 -07:00
Sebastian Messmer	a371652bc8	Allow to get string references to strings inside torch::List (#39763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39763 This is an ask from fluent. For performance reasons, they need a way to get read access to the std::string inside of a torch::List<std::string> without having to copy that string. Instead of special casing std::string, we decided to give access to the underlying value. The API now looks like: ```cpp torch::List<std::string> list = ...; const std::string& str = list[2].toIValueRef().toStringRef(); ``` ghstack-source-id: 106806840 Test Plan: unit tests Reviewed By: ezyang Differential Revision: D21966183 fbshipit-source-id: 8b80b0244d10215c36b524d1d80844832cf8b69a	2020-06-29 20:52:32 -07:00
Sebastian Messmer	fabd60ec1a	Add comment with UNBOXEDONLY explanation to codegen (#40117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40117 ghstack-source-id: 106804731 Test Plan: just comments Reviewed By: ezyang Differential Revision: D22075103 fbshipit-source-id: 76677dc337196b71c50075f2845a1899451a705f	2020-06-29 20:50:45 -07:00
Siqi Yan	01e2099bb8	[TB] Add support for hparam domain_discrete (#40720 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40720 Add support for populating domain_discrete field in TensorBoard add_hparams API Test Plan: Unit test test_hparams_domain_discrete Reviewed By: edward-io Differential Revision: D22291347 fbshipit-source-id: 78db9f62661c9fe36cd08d563db0e7021c01428d	2020-06-29 19:33:57 -07:00
Sebastian Messmer	53af9df557	Unify boxed function signature between jit and c10 (#37034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37034 c10 takes a Stack* in boxed functions while JIT took Stack&. c10 doesn't return anything while JIT returns an int which is always zero. This changes JIT to follow the c10 behavior. ghstack-source-id: 106834069 Test Plan: unit tests Differential Revision: D20567950 fbshipit-source-id: 1a7aea291023afc52ae706957e9a5ca576fbb53b	2020-06-29 19:24:26 -07:00
James Reed	320164f878	Fix zip serialization for file > 2GiB (#40722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40722 Test Plan: Imported from OSS Differential Revision: D22294016 Pulled By: jamesr66a fbshipit-source-id: 0288882873d4b59bdef37d018c030519c4be7f03	2020-06-29 19:17:06 -07:00
anjali411	9393ac011a	[CUDA] addmm for complex (#40431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40431 Test Plan: Imported from OSS Differential Revision: D22285916 Pulled By: anjali411 fbshipit-source-id: 5863c713bdaa8e5b4f3d2b41fa59108502145a23	2020-06-29 17:41:46 -07:00
Ailing Zhang	d7cd16858f	Add documentation about storage sharing is preserved and serialized f… (#40412 ) Summary: …ile size. fixes https://github.com/pytorch/pytorch/issues/40157 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40412 Reviewed By: ezyang Differential Revision: D22265639 Pulled By: ailzhang fbshipit-source-id: 16b0301f16038bd784e7e92f63253fedc7820adc	2020-06-29 17:23:29 -07:00
Meghan Lele	8f5b28674c	[JIT] Remove dead store in quantization_patterns.h (#40724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40724 Test Plan: Continuous integration. Differential Revision: D22294600 Pulled By: SplitInfinity fbshipit-source-id: 04546579273d8864d91c3c74a654aa75ba34ee45	2020-06-29 16:55:15 -07:00
Jiakai Liu	0235676f8a	[pytorch][ci] run mobile code analysis on PR (#40247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40247 This CI job was bypassed on PR because most part of it has already been covered by mobile-custom-build-dynamic job that runs on every PR. However, it can still fail independently because it builds and analyzes a small test project, e.g.: if people forget to update the registration API used in the test project. So this PR changed it to only build and analyze the test project and run the job on every PR. Test Plan: Imported from OSS Differential Revision: D22126044 Pulled By: ljk53 fbshipit-source-id: 6699a200208a65b249bd3a4e43ad72bc07388ce3	2020-06-29 16:44:45 -07:00
Wenfang Xu	6e1cf000b3	[jit][oacr] Add some operators for Assistant NLU joint lite model (#40126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40126 These are needed for benchmarking / running our model, following Step 7 in the [Lite interpreter wiki](https://www.internalfb.com/intern/wiki/PyTorch/PyTorchDev/Mobile/Lite_Interpreter/#make-your-model-work-wit) and [this thread](https://www.internalfb.com/intern/qa/56293/atenemptymemory_format-missing-on-fb4a). Test Plan: Sandcastle Reviewed By: iseeyuan Differential Revision: D22073611 fbshipit-source-id: daa46a39c386806be8d5d589740663e85451757e	2020-06-29 16:41:04 -07:00
David Reiss	21de450fcb	Fix batch size zero for QNNPACK linear_dynamic (#40588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40588 Two bugs were preventing this from working. One was a divide by zero when multithreading was enabled, fixed similarly to the fix for static quantized linear in the previous commit. The other was computation of min and max to determine qparams. FBGEMM uses [0,0] for [min,max] of empty input, do the same. Test Plan: Added a unit test. Differential Revision: D22264415 Pulled By: dreiss fbshipit-source-id: 6ca9cf48107dd998ef4834e5540279a8826bc754	2020-06-29 16:31:11 -07:00
David Reiss	14145f9775	Fix and reenable threaded QNNPACK linear (#40587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40587 Previously, this was causing divide-by-zero only in the multithreaded empty-batch case, while calculating tiling parameters for the threads. In my opinion, the bug here is using a value that is allowed to be zero (batch size) for an argument that should not be zero (tile size), so I fixed the bug by bailing out right before the call to pthreadpool_compute_4d_tiled. Test Plan: TestQuantizedOps.test_empty_batch Differential Revision: D22264414 Pulled By: dreiss fbshipit-source-id: 9446d5231ff65ef19003686f3989e62f04cf18c9	2020-06-29 16:29:29 -07:00
Sameer Deshmukh	9ca4a46bf8	Implement parallel scatter reductions for CPU (#36447 ) Summary: This PR implements gh-33389. As a result of this PR, users can now specify various reduction modes for scatter operations. Currently, `add`, `subtract`, `multiply` and `divide` have been implemented, and adding new ones is not hard. While we now allow dynamic runtime selection of reduction modes, the performance is the same as as was the case for the `scatter_add_` method in the master branch. Proof can be seen in the graph below, which compares `scatter_add_` in the master branch (blue) and `scatter_(reduce="add")` from this PR (orange). ![scatter-regression py csv](https://user-images.githubusercontent.com/2629909/82671491-e5e22380-9c79-11ea-95d6-6344760c8578.png) The script used for benchmarking is as follows: ``` python import os import sys import torch import time import numpy from IPython import get_ipython Ms=256 Ns=512 dim = 0 top_power = 2 ipython = get_ipython() plot_name = os.path.basename(__file__) branch = sys.argv[1] fname = open(plot_name + ".csv", "a+") for pM in range(top_power): M = Ms * (2 ** pM) for pN in range(top_power): N = Ns * (2 ** pN) input_one = torch.rand(M, N) index = torch.tensor(numpy.random.randint(0, M, (M, N))) res = torch.randn(M, N) test_case = f"{M}x{N}" print(test_case) tobj = ipython.magic("timeit -o res.scatter_(dim, index, input_one, reduce=\"add\")") fname.write(f"{test_case},{branch},{tobj.average},{tobj.stdev}\n") fname.close() ``` Additionally, one can see that various reduction modes take almost the same time to execute: ``` op: add 70.6 µs ± 27.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 26.1 µs ± 26.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) op: subtract 71 µs ± 20.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 26.4 µs ± 34.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) op: multiply 70.9 µs ± 31.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 27.4 µs ± 29.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) op: divide 164 µs ± 48.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 52.3 µs ± 132 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` Script: ``` python import torch import time import numpy from IPython import get_ipython ipython = get_ipython() nrows = 3000 ncols = 10000 dims = [nrows, ncols] res = torch.randint(5, 10, dims) idx1 = torch.randint(dims[0], (1, dims[1])).long() src1 = torch.randint(5, 10, (1, dims[1])) idx2 = torch.randint(dims[1], (dims[0], 1)).long() src2 = torch.randint(5, 10, (dims[0], 1)) for op in ["add", "subtract", "multiply", "divide"]: print(f"op: {op}") ipython.magic("timeit res.scatter_(0, idx1, src1, reduce=op)") ipython.magic("timeit res.scatter_(1, idx2, src2, reduce=op)") ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36447 Differential Revision: D22272631 Pulled By: ngimel fbshipit-source-id: 3cdb46510f9bb0e135a5c03d6d4aa5de9402ee90	2020-06-29 15:52:11 -07:00
anjali411	11a74a58c8	Setter for real and imag tensor attributes (#39860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39860 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D22163234 Pulled By: anjali411 fbshipit-source-id: 35b4aa16499341edff1a4be4076539ac7c74f5be	2020-06-29 15:44:55 -07:00
Nikita Shulga	fd90e4b309	[CircleCI] Add RocM build/test jobs (#39760 ) Summary: Set PYTORCH_ROCM_ARCH to `gfx900;gfx906` if `CIRCLECI` environment variable is defined Add RocM build test jobs and schedule them on `xlarge` and `amd-gpu` resource classes respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39760 Differential Revision: D22290335 Pulled By: malfet fbshipit-source-id: 7462f97b262abcacac3e515086ac6236a45626d2	2020-06-29 14:15:44 -07:00
Zhang, Xiaobing	63e5a53b8c	DNNL: fix build error when DNNL using TBB threading pool (#40699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40699 Differential Revision: D22286334 Pulled By: albanD fbshipit-source-id: 0635a0a5e4bf80d44d90c86945d92e98e26ef480	2020-06-29 13:53:18 -07:00
Ho Young Jhoo	ed83b9a4be	Change function parameter `self` to `input` in torch.__init__.pyi (#40235 ) Summary: Fix https://github.com/pytorch/pytorch/issues/40223: Incorrect "self" keyword arguments in `torch.__init__.pyi` type hints Pull Request resolved: https://github.com/pytorch/pytorch/pull/40235 Differential Revision: D22285816 Pulled By: ezyang fbshipit-source-id: ebc35290c0c625916289f1a46abc6ff2197f4bcf	2020-06-29 13:49:13 -07:00
peter	d2e16dd888	Remove constexpr for NVCC on Windows (#40675 ) Summary: They are not well supported. Fixes https://github.com/pytorch/pytorch/issues/40393 and https://github.com/pytorch/pytorch/issues/39394. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40675 Differential Revision: D22286031 Pulled By: ezyang fbshipit-source-id: 7e309916ae21cd3909ee6466952ba89847c74d71	2020-06-29 10:58:42 -07:00
Kimish Patel	4a174c83ca	Add option to preserve certain methods during optimize_for_mobile. (#40629 ) Summary: By default freeze_module pass, invoked from optimize_for_mobile, preserves only forward method. There is an option to specify a list of methods that can be preserved during freeze_module. This PR exposes that to optimize_for_module pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40629 Test Plan: python test/test_mobile_optimizer.py Reviewed By: dreiss Differential Revision: D22260972 Pulled By: kimishpatel fbshipit-source-id: 452c653269da8bb865acfb58da2d28c23c66e326	2020-06-29 09:32:53 -07:00
yyn19951228	4121d34036	Python/C++ API Parity: Add impl and tests for ParameterDict (#40654 ) Summary: This diff contains the implementation of C++ api for ParameterDict from https://github.com/pytorch/pytorch/issues/25883, refer to https://github.com/pytorch/pytorch/issues/36904 and https://github.com/pytorch/pytorch/issues/28652 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40654 Test Plan: Add unit test in this diff Differential Revision: D22273265 Pulled By: glaringlee fbshipit-source-id: 9134a92c95eacdd53d5b24470d5f7edbeb40a488	2020-06-29 08:50:44 -07:00
Martin Yuan	b35cdc5200	[Fix] torch_common target shared by lite-interpreter and full-jit" and turn on query-based selective build (#40673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40673 As title. We planed to have lite-interpreter and full-jit co-exist for short-term. To avoid the duplicated symbol and operator registrations in dynamic lib loading, we put the common files in a separate component. The original source file list names are reserved. ghstack-source-id: 106757184 Test Plan: CI Reviewed By: kwanmacher Differential Revision: D22276185 fbshipit-source-id: 328a8ba9c3d88437da0d30c6e6791087d0df5e2e	2020-06-28 16:38:52 -07:00
Jeong Ukjae	b4db529352	Fix wrong link in docs/source/notes/ddp.rst (#40484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40484 Differential Revision: D22259834 Pulled By: mrshenli fbshipit-source-id: 4ec912c600c81010bdb2778c35cbb0321480199f	2020-06-28 13:55:56 -07:00
Natalia Gimelshein	502ec8f7f7	Revert D22227939: [TB] Add support for hparam domain_discrete Test Plan: revert-hammer Differential Revision: D22227939 (`4c25428c8c`) Original commit changeset: d2f0cd8e5632 fbshipit-source-id: c4329fcead69cb0f3d368a254d8756fb04be742d	2020-06-27 22:20:31 -07:00
Martin Yuan	5377827b3e	Revert D22275201: [Fix] torch_common target shared by lite-interpreter and full-jit Test Plan: revert-hammer Differential Revision: D22275201 (`1399655a98`) Original commit changeset: dafd3ad36bb3 fbshipit-source-id: a89c8b1fbb55eb7c116dd6ca9dad04bb90727c0a	2020-06-27 22:00:19 -07:00
Shen Li	521722751f	Add examples and tests for combining static/class method with async execution (#40619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40619 Test Plan: Imported from OSS Differential Revision: D22258407 Pulled By: mrshenli fbshipit-source-id: 036d85a2affc4505efd2df197fc513dba010e359	2020-06-27 20:42:23 -07:00
Martin Yuan	1399655a98	[Fix] torch_common target shared by lite-interpreter and full-jit Summary: Pull the shared source files to "torch_common" to avoid duplicated symbols and operator registrations. (Note: this ignores all push blocking failures!) Test Plan: CI buck install -c fbandroid.force_native_library_merge_map=true -c pt.build_from_deps_query=1 -c pt.selective_build=0 -c pt.static_dispatch=0 -r fb4a Reviewed By: kwanmacher Differential Revision: D22275201 fbshipit-source-id: dafd3ad36bb33e3ec33f4accfdc5af1d5f8ab775	2020-06-27 17:48:32 -07:00
kshitij12345	21991b63f5	Migrate `dot` from the TH to Aten (CPU) (#40354 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24692 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40354 Reviewed By: ezyang Differential Revision: D22214203 Pulled By: ngimel fbshipit-source-id: 500e60d1c02b3b39db19b518f2af43cd69f2e984	2020-06-27 17:11:10 -07:00
Siqi Yan	4c25428c8c	[TB] Add support for hparam domain_discrete Summary: Add support for populating domain_discrete field in TensorBoard add_hparams API Test Plan: Unit test test_hparams_domain_discrete Reviewed By: edward-io Differential Revision: D22227939 fbshipit-source-id: d2f0cd8e5632cbcc578466ff3cd587ee74f847af	2020-06-27 14:07:24 -07:00
Siqi Yan	2456e078d3	[TB] Support custom run_name in add_hparams (#40660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40660 Support custom run_name since using timestamp as run_name can be confusing to people Test Plan: hp = {"lr": 0.1, "bool_var": True, "string_var": "hi"} mt = {"accuracy": 0.1} writer.add_hparams(hp, mt, run_name="run1") writer.flush() Reviewed By: edward-io Differential Revision: D22157749 fbshipit-source-id: 3d4974381e3be3298f3e4c40e3d4bf20e49dfb07	2020-06-27 14:05:20 -07:00
Matt Galloway	15be823455	caffe2 \| Revert range loop analysis fix Summary: This reverts a change that was made to fix range loop analysis warning. Test Plan: CI Reviewed By: nlutsenko Differential Revision: D22274461 fbshipit-source-id: dedc3fcaa6e32259460380163758d6c9c9b73211	2020-06-27 13:02:23 -07:00
Nikita Shulga	68042c7466	Skip mypy on pynightly if numpy-1.20.0-dev0... is used (#40656 ) Summary: Also modernize the test script itself by using `mypy.api.run` rather than `subprocess.call` Pull Request resolved: https://github.com/pytorch/pytorch/pull/40656 Differential Revision: D22274421 Pulled By: malfet fbshipit-source-id: 59232d4d37ee01cda56375b84ac1476d16686bfe	2020-06-27 09:08:50 -07:00
Jeff Daily	ac8c8b028d	[ROCm] restore jit tests (#40447 ) Summary: Remove `skipIfRocm` from most jit tests and enable `RUN_CUDA_HALF` tests for ROCm. These changes passed more than three rounds of CI testing against the ROCm CI. CC ezyang xw285cornell sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40447 Differential Revision: D22190711 Pulled By: xw285cornell fbshipit-source-id: bac44825a2675d247b3abe2ec2f80420a95348a3	2020-06-27 01:03:59 -07:00
Jerry Zhang	411bc2b8d5	[quant][graphmode][fix] remove unsupported ops in the list (#40653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40653 (Note: this ignores all push blocking failures!) Test Plan: Imported from OSS Differential Revision: D22271413 fbshipit-source-id: a01611b5d90849ac673fa5a310f910c858e907a3	2020-06-27 00:07:57 -07:00
Jerry Zhang	61a8de77cf	[quant] aten::repeat work for quantized tensor (#40644 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40644 Test Plan: Imported from OSS Differential Revision: D22268558 fbshipit-source-id: 3bc9a129bece1b547c519772ecc6b980780fb904	2020-06-26 22:54:19 -07:00
Jerry Zhang	0309f6a4bb	[quant][graphmode][fix] cloning schema in insert_observers (#40624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40624 Previously we didn't clone schema, so the default schema is used, this is causing issue for some models Test Plan: Imported from OSS Differential Revision: D22259519 fbshipit-source-id: e2a393a54cb18f55da0c7152a74ddc22079ac350	2020-06-26 20:19:09 -07:00
Meghan Lele	0a19534dd2	[JIT] Remove dead store in quantization_patterns.h (#40623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40623 Test Plan: Continuous integration. Reviewed By: jerryzh168 Differential Revision: D22259209 fbshipit-source-id: 90c9e79e039100f2961195504bb81230bba5c5fe	2020-06-26 19:43:43 -07:00
Meghan Lele	e368b11226	[JIT] Remove dead stores in loopnest.cpp (#40626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40626 Test Plan: Continuous integration. Reviewed By: ZolotukhinM Differential Revision: D22259586 fbshipit-source-id: 447accb5b94392f0b5e4c27956a34403bb0d1ea8	2020-06-26 19:28:03 -07:00
Sinan Nasir	15864d1703	Skip allreducing `local_used_maps_dev_` when `find_unused_param=False` Summary: 1. In reducer.cpp, we have a new boolean `find_unused_param_` and its value is set in `Reducer::prepare_for_backward`. If `!find_unused_param_`, then it avoids `allreduce(local_used_maps_dev_)`. 2. Solves issue [38942](https://github.com/pytorch/pytorch/issues/38942). 3. Fixes incorrect `find_unused_parameters_` passing like checking `outputs.empty()` or `unused_parameters_.empty()`. ghstack-source-id: 106693089 Test Plan: 1. Run `test/distributed/test_c10d.py` and make sure all tests pass. 2. A new test case `test_find_unused_parameters_when_unused_parameters_empty` is included. Old `reducer.cpp` was failing in that unit test because it was checking `find_unused_parameters_` by `unused_parameters_.empty()`. Current `reducer.cpp` passes this unit test. 3. Two test cases were failing `test_forward_backward_unused_parameters` and `test_forward_backward_optimizer` , because `find_unused_parameter_` of their `reducer` object was not set properly. I fixed that as well. Imported from OSS Output of version 14: ``` ................s.....s...............................................test/distributed/test_c10d.py:1531: UserWarning: Deprecation warning: In a future PyTorch release torch.full will no longer return tensors of floating dtype by default. Instead, a bool fill_value will return a tensor of torch.bool dtype, and an integral fill_value will return a tensor of torch.long dtype. Set the optional `dtype` or `out` arguments to suppress this warning. (Triggered internally at ../aten/src/ATen/native/TensorFactories.cpp:364.) tensor = torch.full([100, 100], self.rank) test/distributed/test_c10d.py:1531: UserWarning: Deprecation warning: In a future PyTorch release torch.full will no longer return tensors of floating dtype by default. Instead, a bool fill_value will return a tensor of torch.bool dtype, and an integral fill_value will return a tensor of torch.long dtype. Set the optional `dtype` or `out` arguments to suppress this warning. (Triggered internally at ../aten/src/ATen/native/TensorFactories.cpp:364.) tensor = torch.full([100, 100], self.rank) test/distributed/test_c10d.py:1531: UserWarning: Deprecation warning: In a future PyTorch release torch.full will no longer return tensors of floating dtype by default. Instead, a bool fill_value will return a tensor of torch.bool dtype, and an integral fill_value will return a tensor of torch.long dtype. Set the optional `dtype` or `out` arguments to suppress this warning. (Triggered internally at ../aten/src/ATen/native/TensorFactories.cpp:364.) tensor = torch.full([100, 100], self.rank) test/distributed/test_c10d.py:1531: UserWarning: Deprecation warning: In a future PyTorch release torch.full will no longer return tensors of floating dtype by default. Instead, a bool fill_value will return a tensor of torch.bool dtype, and an integral fill_value will return a tensor of torch.long dtype. Set the optional `dtype` or `out` arguments to suppress this warning. (Triggered internally at ../aten/src/ATen/native/TensorFactories.cpp:364.) tensor = torch.full([100, 100], self.rank) .test/distributed/test_c10d.py:1554: UserWarning: Deprecation warning: In a future PyTorch release torch.full will no longer return tensors of floating dtype by default. Instead, a bool fill_value will return a tensor of torch.bool dtype, and an integral fill_value will return a tensor of torch.long dtype. Set the optional `dtype` or `out` arguments to suppress this warning. (Triggered internally at ../aten/src/ATen/native/TensorFactories.cpp:364.) self.assertEqual(torch.full([10, 10], self.world_size), tensor) test/distributed/test_c10d.py:1554: UserWarning: Deprecation warning: In a future PyTorch release torch.full will no longer return tensors of floating dtype by default. Instead, a bool fill_value will return a tensor of torch.bool dtype, and an integral fill_value will return a tensor of torch.long dtype. Set the optional `dtype` or `out` arguments to suppress this warning. (Triggered internally at ../aten/src/ATen/native/TensorFactories.cpp:364.) self.assertEqual(torch.full([10, 10], self.world_size), tensor) test/distributed/test_c10d.py:1554: UserWarning: Deprecation warning: In a future PyTorch release torch.full will no longer return tensors of floating dtype by default. Instead, a bool fill_value will return a tensor of torch.bool dtype, and an integral fill_value will return a tensor of torch.long dtype. Set the optional `dtype` or `out` arguments to suppress this warning. (Triggered internally at ../aten/src/ATen/native/TensorFactories.cpp:364.) self.assertEqual(torch.full([10, 10], self.world_size), tensor) test/distributed/test_c10d.py:1554: UserWarning: Deprecation warning: In a future PyTorch release torch.full will no longer return tensors of floating dtype by default. Instead, a bool fill_value will return a tensor of torch.bool dtype, and an integral fill_value will return a tensor of torch.long dtype. Set the optional `dtype` or `out` arguments to suppress this warning. (Triggered internally at ../aten/src/ATen/native/TensorFactories.cpp:364.) self.assertEqual(torch.full([10, 10], self.world_size), tensor) .....s............................... ---------------------------------------------------------------------- Ran 108 tests in 214.210s OK (skipped=3) ``` Differential Revision: D22176231 fbshipit-source-id: b5d15f034e13a0915a474737779cc5aa8e068836	2020-06-26 19:20:59 -07:00
Eileen Pan	4102fbdf08	[1/n] Allow dense NaN value in dper raw input processor output Summary: ## TLDR Support using NaN default value for missing dense features in RawInputProcessor for DPER2. In preparation for subsequent support for null flag features in compute meta. For train_eval this is already supported in DPER3 and we do not plan to support this in DPER2 train eval. ## Overview Intern project plan to support adding dense flags for missing feature values instead of replacing with zero. Project plan : https://docs.google.com/document/d/1OsPUTjpJycwxWLCue3Tnb1mx0uDC_2KKWvC1Rwpo2NI/edit?usp=sharing ## Code paths: See https://fb.quip.com/eFXUA0tbDmNw for the call stack for all affected code paths. Test Plan: # A. DPER3 blob value inspection ## 1. Build local bento kernel in fbcode folder `buck build mode/dev-nosan //bento/kernels:bento_kernel_ads_ranking` ## 2. Use kernel `ads_ranking (local)` to print dense feature blob values n280239 ## 2.1 Try `default_dense_value = "0.0"` (default) ``` preproc_6/feature_preproc_6/dper_feature_processor_7/raw_input_proc_7/float_feature_sparse_to_dense_7/float_features [[0. ] [0. ] [0. ] [0. ] [0. ] [0. ] [0. ] [1. ] [1.7857143] [1.7777778] [1. ] [0. ] [0.5625 ] [0. ] [0. ] [0.8 ] [0. ] [1. ] [0.56 ] [0. ]] ``` ## 2.2 Try `default_dense_value = "123"` ``` preproc_2/feature_preproc_2/dper_feature_processor_3/raw_input_proc_3/float_feature_sparse_to_dense_3/float_features [[123. ] [123. ] [123. ] [123. ] [123. ] [123. ] [123. ] [ 1. ] [ 1.7857143] [ 1.7777778] [ 1. ] [123. ] [ 0.5625 ] [123. ] [123. ] [ 0.8 ] [123. ] [ 1. ] [ 0.56 ] [123. ]] ``` ## 2.3 Try `default_dense_value = float("nan")` ``` RuntimeError: [enforce fail at enforce_finite_op.h:40] std::isfinite(input_data[i]). Index 0 is not finite (e.g., NaN, Inf): -nan (Error from operator: input: "unary_4/logistic_regression_loss_4/average_loss_4/average_loss" name: "" type: "EnforceFinite" device_option { random_seed: 54 }) ``` which is expected due to nan input. # B. Unit test `buck test fblearner/flow/projects/dper/tests/preprocs:raw_feature_extractor_test` https://www.internalfb.com/intern/testinfra/testconsole/testrun/5348024586274923/ {F241336814} Differential Revision: D21961595 fbshipit-source-id: 3dcb153b3c7f42f391584f5e7f52f3d9c76de31f	2020-06-26 16:54:14 -07:00
Mengchi Zhang	897e610c82	FP16 rounding-to-nearest for row-wise SparseAdagrad fusion (#40466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40466 Extend row wise sparse Adagrad fusion op to FP16 (rounding-to-nearest) for PyTorch. Reviewed By: jianyuh Differential Revision: D22003571 fbshipit-source-id: e97e01745679a9f6e7b0f81ce5a6ebf4d4a1df41	2020-06-26 16:14:59 -07:00
Changji Shi	47c72be3d7	Port /test/cpp_extensions/rng_extension.cpp to new operator registration API (#39459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39459 Update to this PR: this code isn't going to fully solve https://github.com/pytorch/pytorch/issues/37010. The changes required for 37010 is more than this PR initially planned. Instead, this PR switches op registration of rng related tests to use the new API (similar to what was done in #36925) Test Plan: 1) unit tests Imported from OSS Reviewed By: ezyang Differential Revision: D22264889 fbshipit-source-id: 82488ac6e3b762a756818434e22c2a0f9cb9dd47	2020-06-26 16:12:54 -07:00
Xiao Wang	24a8614cac	[Reland][doc] Add overflow notice for cuFFT on half precision (#40551 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/35594 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40551 Reviewed By: ezyang Differential Revision: D22249831 Pulled By: ngimel fbshipit-source-id: b221b3c0a490ccaaabba50aa698a2490536e0917	2020-06-26 15:40:19 -07:00
Nikita Shulga	6debc28964	Ignore error code from `apt-get purge` (#40631 ) Summary: This replicates the pattern of other "do for luck" commands. Prep change to add RocM to CircleCI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40631 Differential Revision: D22261707 Pulled By: malfet fbshipit-source-id: 3dadfa434deab866a8800715f3197e84169cf43e	2020-06-26 13:34:07 -07:00
David Reiss	375cd852fa	Add a utility function for bundling large input tensors (#37055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37055 Sometimes it's okay to bundle a large example input tensor with a model. Add a utility function to make it easy for users to do that on purpose. Test Plan: Unit test. Differential Revision: D22264239 Pulled By: dreiss fbshipit-source-id: 05c6422be1aa926cca850f994ff1ae83c0399119	2020-06-26 13:34:02 -07:00
David Reiss	41ea7f2d86	Add channels-last support to bundled_inputs (#36764 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36764 This allows bundling inputs that are large uniform buffers in channels-last memory format. Test Plan: Unit test. Differential Revision: D21142660 Pulled By: dreiss fbshipit-source-id: 31bbea6586d07c1fd0bcad4cb36ed2b8bb88a7e4	2020-06-26 13:31:17 -07:00
Nikita Shulga	edac323378	Add special rules to launch docker image with RocM (#40632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40632 Differential Revision: D22262316 Pulled By: malfet fbshipit-source-id: 3d525767bfbfc8e2497541849d85cabf0379a43b	2020-06-26 13:28:36 -07:00
Sebastian Messmer	0494e0ad70	Back out "Revert D21581908: Move TensorOptions ops to c10" (#40595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40595 ghstack-source-id: 106691774 Test Plan: waitforsandcastle Differential Revision: D22247729 fbshipit-source-id: 14745588cae267c1e0cc51cd9541a9b8abb830e5	2020-06-26 12:57:09 -07:00
Meghan Lele	b8f4f6868d	[JIT] Remove dead store in exit_transforms.cpp (#40611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40611 This commit removes a dead store in `transformWith` of exit_transforms.cpp. Test Plan: Continuous integration. Reviewed By: suo Differential Revision: D22254136 fbshipit-source-id: f68c4625f7be8ae29b3500303211b2299ce5d6f6	2020-06-26 12:35:58 -07:00
Luca Wehrstedt	a62f8805e7	Update TensorPipe submodule (#40614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40614 This update pulls in a oneliner fix, which sets the TCP_NODELAY option on the TCP sockets of the UV transport. This leads to exceptional performance gains in terms of latency, with about a 25x improvement in one simple benchmark. This thus resolves a regression that TensorPipe had compared to the ProcessGroup agent and, in fact, ends up beating it by 2x. The benchmark I ran is this, with the two endpoints pinned to different cores of the same machine: ``` torch.jit.script def remote_fn(t: int): return t torch.jit.script def local_fn(): for _ in range(1_000_000): fut = rpc.rpc_async("rhs", remote_fn, (42,)) fut.wait() ``` And the average round-trip time (one iteration) is: - TensorPipe with SHM: 97.2 us - TensorPipe with UV _after the fix_: 205us - Gloo: 440us - TensorPipe with UV _before the fix_: 5ms Test Plan: Ran PyTorch RPC test suite Differential Revision: D22255393 fbshipit-source-id: 3f6825d03317d10313704c05a9280b3043920507	2020-06-26 11:45:51 -07:00
Nikolay Korovaiko	5036c94a6e	properly skip legacy tests regardless of the default executor (#40381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40381 Differential Revision: D22173938 Pulled By: Krovatkin fbshipit-source-id: 305fc4484977e828cc4cee6e053a1e1ab9f0d6c7	2020-06-26 11:13:50 -07:00
Mitchell Spryn	7676682584	Fix illegal opcode bug in caffe2 (#40584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40584 Also patch [this github issue](https://github.com/pytorch/pytorch/issues/33124) involving an illegal assembly instruction in 8x8-dq-aarch64-neon.S. Test Plan: Build binaries, copy to shaker, run executables. Also run all existing caffe tests. Reviewed By: kimishpatel Differential Revision: D22240670 fbshipit-source-id: 51960266ce58699fe6830bcf75632b92a122f638	2020-06-26 11:11:54 -07:00
Nikita Shulga	fb5d784fb4	Further reduce windows build/test matrix (#40592 ) Summary: Switch windows CPU testers from `windows.xlarge` to `windows.medium` class. Remove VS 14.16 CUDA build Only do smoke force-on-cpu tests using VS2019+CUDA10.1 config. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40592 Differential Revision: D22259351 Pulled By: malfet fbshipit-source-id: f934ff774dfc7d47f12c3da836ca314c12d92208	2020-06-26 10:18:46 -07:00
Xiang Gao	10822116c5	build docker image for CUDA11 (#40534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40534 Differential Revision: D22258874 Pulled By: seemethere fbshipit-source-id: 1954a22ed52e1a65caf89725ab1db9f40ff917b8	2020-06-26 10:07:53 -07:00
Jeff Daily	fc8bca094c	skip_if_rocm test_rnn in test_c10d_spawn.py (#40577 ) Summary: Test was added a few months back in https://github.com/pytorch/pytorch/issues/36503 but recently became flaky for ROCm. CC ezyang xw285cornell sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40577 Differential Revision: D22258196 Pulled By: ezyang fbshipit-source-id: 8a22b0c17b536b3d42d0382f7737df0f8823ba08	2020-06-26 09:45:45 -07:00
Taylor Robie	67c79bb045	update schema to reflect aliasing behavior (#39794 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/38555 I did an audit of `native_functions.yaml` and found several functions in addition to `reshape` which were not reporting that they could alias: ``` torch.jit.script def foo(t: torch.Tensor): new_value = torch.tensor(1, dtype=t.dtype, device=t.device) t.flatten()[0] = new_value t.reshape(-1)[1] = new_value t.view_as(t)[2] = new_value t.expand_as(t)[3] = new_value t.reshape_as(t)[4] = new_value t.contiguous()[5] = new_value t.detach()[6] = new_value return t ``` Currently none of the values are assigned after dead code elimination, after this PR all are. (And the JIT output matches that of eager.) I don't think this needs to be unit tested; presumably the generic machinery already is and this just brings these ops under the same umbrella. BC-breaking note: This updates the native operator schema and the aliasing rules for autograd. JIT passes will no longer incorrectly optimize mutations on graphs containing these ops, and inplace ops on the result of `flatten` will now properly be tracked in Autograd and the proper backward graph will be created. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39794 Differential Revision: D22008358 Pulled By: robieta fbshipit-source-id: 9d3ff536e58543211e08254a75c6110f2a3b4992	2020-06-26 09:25:27 -07:00
Edward Yang	a0ba7fb43e	Precompute entries in dispatch tables (#40512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40512 Fixes https://github.com/pytorch/pytorch/issues/32454 The heart of this diff is changing this: ``` inline const KernelFunction& Dispatcher::dispatch_(const DispatchTable& dispatchTable, DispatchKey dispatchKey) c nst { const KernelFunction* backendKernel = dispatchTable.lookup(dispatchKey); if (nullptr != backendKernel) { return backendKernel; } const auto& backendFallbackKernel = backendFallbackKernels_[dispatchKey]; if (backendFallbackKernel.isValid()) { return backendFallbackKernel; } const KernelFunction catchallKernel = dispatchTable.lookupCatchallKernel(); if (C10_LIKELY(nullptr != catchallKernel)) { return *catchallKernel; } reportError(dispatchTable, dispatchKey); } ``` to this: ``` const KernelFunction& OperatorEntry::lookup(DispatchKey k) const { const auto& kernel = dispatchTable_[static_cast<uint8_t>(k)]; if (C10_UNLIKELY(!kernel.isValid())) { reportError(k); } return kernel; } ``` The difference is that instead of checking a bunch of places to find the right kernel to use for an operator, all of the operators are precomputed into dispatchTable_ itself (so you don't have to consult anything else at runtime.) OperatorEntry::computeDispatchTableEntry contains that computation (which is exactly the same as it was before.) By doing this, we are able to substantially simplify many runtime components of dispatch. The diff is fairly large, as there are also some refactors interspersed with the substantive change: - I deleted the DispatchTable abstraction, folding it directly into OperatorEntry. It might make sense to have some sort of DispatchTable abstraction (if only to let you do operator[] on DispatchKey without having to cast it to integers first), but I killed DispatchTable to avoid having to design a new abstraction; the old abstraction wasn't appropriate for the new algorithm. - I renamed OperatorEntry::KernelEntry to AnnotatedKernel, and use it to store backend fallbacks as well as regular kernel registrations (this improves error messages when you incorrectly register a backend fallback twice). - I moved schema_ and debug_ into an AnnotatedSchema type, to make the invariant clearer that these are set together, or not at all. - I moved catch-all kernels out of kernels_ into its own property (undoing a refactor I did before). The main reason I did this was because our intended future state is to not have a single catch-all, but rather possibly multiple catch-alls which fill-in different portions of the dispatch table. This may change some more in the future: if we allow registrations for multiple types of catch alls, we will need a NEW data type (representing bundles of dispatch keys) which can represent this case, or perhaps overload DispatchKey to also record these types. The key changes for precomputation: - OperatorEntry::updateDispatchTable_ is now updated to fill in the entry at a DispatchKey, considering both kernels (what it did before) as well as catch-all and backend fallback. There is also OperatorEntry::updateDispatchTableFull_ which will update the entire dispatch table (which is necessary when someone sets a catch-all kernel). OperatorEntry::computeDispatchTableEntry holds the canonical algorithm specifying how we decide what function will handle a dispatch key for the operator. - Because dispatch table entry computation requires knowledge of what backend fallbacks are (which is recorded in Dispatcher, not OperatorEntry), several functions on OperatorEntry now take Dispatcher as an argument so they can query this information. - I modified the manual boxing wrapper invariant: previously, kernels stored in kernels_ did NOT have manual boxing wrappers and this was maintained by DispatchTable. Now, we just ALWAYS maintain manual boxing wrappers for all KernelFunctions we store. - DispatchKeyExtractor is greatly simplified: we only need to maintain a single per-operator bitmask of what entries are fallthrough (we don't need the global bitmask anymore). - Introduced a new debugging 'dumpComputedTable' method, which prints out the computed dispatch table, and how we computed it to be some way. This was helpful for debugging cases when the dispatch table and the canonical metadata were not in sync. Things that I didn't do but would be worth doing at some point: - I really wanted to get rid of the C10_UNLIKELY branch for whether or not the KernelFunction is valid, but it looks like I cannot easily do this while maintaining good error messages. In principle, I could always populate a KernelFunction which errors, but the KernelFunction needs to know what the dispatch key that is missing is (this is not passed in from the calling convention). Actually, it might be possible to do something with functors, but I didn't do it here. - If we are going to get serious about catchalls for subsets of operators, we will need to design a new API for them. This diff is agnostic to this question; we don't change public API at all. - Precomputation opens up the possibility of subsuming DispatchStub by querying CPU capability when filling in the dispatch table. This is not implemented yet. (There is also a mild blocker here, which is that DispatchStub is also used to share TensorIterator configuration, and this cannot be directly supported by the regular Dispatcher.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D22236352 Pulled By: ezyang fbshipit-source-id: d6d90f267078451816b1899afc3f79737b4e128c	2020-06-26 09:03:39 -07:00
Edward Yang	a4cabd1a3c	Generalize Python dispatcher testing API; disallow overwriting fallback (#40469 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40469 - The old testing interface C._dispatch_import was based off the old c10::import variation, which meant the API lined up in a strange way with the actual torch/library.h. This diff reduces the differences by letting you program the Library constructor directly. - Using this newfound flexibility, we add a test for backend fallbacks from Python; specifically testing that we disallow registering a backend fallback twice. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D22236351 Pulled By: ezyang fbshipit-source-id: f8365e3033e9410c7e6eaf9f78aa32e1f7d55833	2020-06-26 09:01:28 -07:00
Nikita Shulga	44bf822084	Add C++ standard version check to top level headers (#40510 ) Summary: Remove `-std=c++14` flag from `utils.cmake`, since PyTorch C++ API can be invoked by any compiler compliant with C++14 standard or later Pull Request resolved: https://github.com/pytorch/pytorch/pull/40510 Differential Revision: D22253313 Pulled By: malfet fbshipit-source-id: ff731525868b251c27928fc98b0724080ead9be2	2020-06-26 08:44:04 -07:00
Martin Yuan	dfc7e71d13	[Selective Build] Apply query-based on instrumentation_tests Summary: 1. Modularize some bzl files to break circular buck load 2. Use query-based on instrumentation_tests (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: kwanmacher Differential Revision: D22188728 fbshipit-source-id: affbabd333c51c8b1549af6602c6bb79fabb7236	2020-06-26 08:05:53 -07:00
Anthony Shoumikhin	f1406c43fc	[papaya][aten] Fix compiler error: loop variable 'tensor' is always a copy because the range of type 'c10::List<at::Tensor>' does not return a reference. (#40599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40599 . Test Plan: CI Reviewed By: smessmer Differential Revision: D22246106 fbshipit-source-id: a5d0535e627b9f493fca7234dcfc15c521b0ed7f	2020-06-26 02:43:25 -07:00
Wanchao Liang	eebd492dcf	[doc] fix autograd doc subsubsection display issue (#40582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40582 There's a misuse in the `requires_grad` with ~~~~~~, "~~~~" is not a official section marker, change it to "^^^^^" to denote subsubsections, also fix the other places where we should use subsection "-----" instead of subsubsection "^^^^" see https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#sections Before: <img width="712" alt="rst_before" src="https://user-images.githubusercontent.com/9443650/85789835-2226fa80-b6e4-11ea-97b6-2b19fdf324a4.png"> After: <img width="922" alt="rst_after" src="https://user-images.githubusercontent.com/9443650/85789856-281cdb80-b6e4-11ea-925f-cb3f4ebaa2bf.png"> Test Plan: Imported from OSS Differential Revision: D22245747 Pulled By: wanchaol fbshipit-source-id: 11548ed42f627706863bb74d4269827d1b3450d4	2020-06-25 23:28:33 -07:00
Xiang Gao	3ab60ff696	Remove cpu vec256 for std::complex (#39830 ) Summary: std::complex is gone. We are now using c10::complex Pull Request resolved: https://github.com/pytorch/pytorch/pull/39830 Differential Revision: D22252066 Pulled By: malfet fbshipit-source-id: cdd5bb03ec66825d82177d609cbcf0738922dba0	2020-06-25 23:25:58 -07:00
Eli Uriegas	fab412a8f3	Bump nightlies to 1.7.0 (#40519 ) Summary: edit: apparently we hardcode a lot more versions that I would've anticipated. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/40519 Differential Revision: D22221280 Pulled By: seemethere fbshipit-source-id: ba15a910a6755ec08c10f7783ed72b1e06e6b570	2020-06-25 22:36:33 -07:00
Jerry Zhang	e3a97688cc	[quant][graphmode][fix] dequantize propagation for {add/mul}_scalar (#40596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40596 Previously the fusion patterns for {add/mul}_scalar is inconsistent since the op pattern produces a non-quantized tensor and the op replacement graph produces a quantized tensor Test Plan: Imported from OSS Differential Revision: D22251072 fbshipit-source-id: e16eb92cf6611578cca1ed8ebde961f8d0610137	2020-06-25 22:17:08 -07:00
Ksenija Stanojevic	547ea787ff	[ONNX] Add eliminate_unused_items pass (#38812 ) Summary: This PR: - Adds eliminate_unused_items pass that removes unused inputs and initializers. - Fixes run_embed_params function so it doesn't export unnecessary parameters. - Removes test_modifying_params in test_verify since it's no longer needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38812 Reviewed By: ezyang Differential Revision: D22236416 Pulled By: houseroad fbshipit-source-id: 30e1a6e8823a7e36b51ae1823cc90476a53cd5bb	2020-06-25 22:00:26 -07:00
Mike Ruberry	5466231187	Fixes lint (#40606 ) Summary: '= ' => '=' Pull Request resolved: https://github.com/pytorch/pytorch/pull/40606 Differential Revision: D22252511 Pulled By: mruberry fbshipit-source-id: 5f90233891be58a742371e4416166a267aee4669	2020-06-25 21:53:00 -07:00
Jasmine Liu	ac79c874ce	[PyTorch Operator] [2/n] Adding python test Summary: Adding python test file with image files wit the input image being p.jpg. Test for the quality difference between the raw image and the decoded image Test Plan: Parsing buck files: finished in 1.5 sec Building: finished in 6.4 sec (100%) 10241/10241 jobs, 2 updated Total time: 8.0 sec More details at https://www.internalfb.com/intern/buck/build/387cb1c1-2902-4f90-ae9f-83fb6d473487 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 93e6ef88-ec68-41cb-9de7-7868a14e6d65 Trace available for this run at /tmp/tpx-20200623-055836.283269/trace.log Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/4222124679431330 ✓ ListingSuccess: caffe2/test:test_bundled_images - main (18.865) ✓ Pass: caffe2/test:test_bundled_images - test_single_tensors (test_bundled_images.TestBundledInputs) (18.060) ✓ Pass: caffe2/test:test_bundled_images - main (18.060) Summary Pass: 2 ListingSuccess: 1 Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4222124679431330 Reviewed By: dreiss Differential Revision: D22046611 fbshipit-source-id: fabc604269a5a4d8a37135ce776200da2794a252	2020-06-25 18:36:44 -07:00
Sebastian Messmer	c790476384	Back out "Revert D22072830: [wip] Upgrade msvc to 14.13" (#40594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40594 Original commit changeset: 901de185e607 ghstack-source-id: 106642590 Test Plan: oss ci Differential Revision: D22247269 fbshipit-source-id: be0c64d1a579f8aa3999cb84a9d20488095a81bd	2020-06-25 17:19:33 -07:00
Natalia Gimelshein	b05c34259b	relax size check in flatten_for_scatter_gather (#40573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40573 Per title, to workaround apex sbn bug. Test Plan: Covered by existing tests Reviewed By: blefaudeux Differential Revision: D22236942 fbshipit-source-id: ddb164ee347a7d472a206087e4dbd16aa9d72387	2020-06-25 15:16:37 -07:00
Diego M. Rodriguez	e180ca652f	Add __all__ to torch/_C/_VariableFunctions.pyi (#40499 ) Summary: Related to https://github.com/pytorch/pytorch/issues/40397 Inspired by ezyang's comment at https://github.com/pytorch/pytorch/issues/40397#issuecomment-648233001, this PR attempts to leverage using `__all__` to explicitly export private functions from `_VariableFunctions.pyi` in order to make `mypy` aware of them after: ``` if False: from torch._C._VariableFunctions import * ``` The generation of the `__all__` template variable excludes some items from `unsorted_function_hints`, as it seems that those without hints end up not being explicitly included in the `.pyi` file: I leaned on the side of caution and opted for having `__all__` consistent with the definitions inside the file. Additionally, added some pretty-printing to avoid having an extremely long line. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40499 Differential Revision: D22240716 Pulled By: ezyang fbshipit-source-id: 77718752577a82b1e8715e666a8a2118a9d3a1cf	2020-06-25 14:10:07 -07:00
Jasmine Liu	c6e0c67449	[PyTorch Error Logging][2/N] Adding Error Logging for Loading Model (#40537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40537 Adding Error Logging when loading model, adding event "MOBILE_MODULE_LOAD" ghstack-source-id: 106615128 Test Plan: {F241028136} Reviewed By: iseeyuan Differential Revision: D22098818 fbshipit-source-id: 4de7df4432c7c6c297a9dc173e5cafa13fe2833c	2020-06-25 14:05:43 -07:00
Michael Suo	e231405ef6	[jit] Fix type annotations in select assignments (#40528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40528 Previously, an assignment like `self.foo : List[int] = []` would ignore the type hint. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D22222927 Pulled By: suo fbshipit-source-id: b0af19b87c6fbe0670d06b55f2002a783d00549d	2020-06-25 13:08:03 -07:00
Yanli Zhao	dfbf0164c9	Revert D22103662: [NCCL] Explicitly Abort NCCL Communicators on Process Group Destruction Test Plan: revert-hammer Differential Revision: D22103662 (`527ab13436`) Original commit changeset: 1f6f88b56bd7 fbshipit-source-id: d0944462c021ec73c7f883f98609fc4a3408efd9	2020-06-25 12:27:24 -07:00
Jasmine Liu	4d40ec1480	[PyTorch Error Logging][1/N] Adding Error Logging for Run_Method (#40535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40535 Adding error logging for run_method. Adding CANCEL(the method cannot be found) and FAIL status(error occurred when running the method) ghstack-source-id: 106604786 Test Plan: {F240891059} Reviewed By: xcheng16 Differential Revision: D22097857 fbshipit-source-id: 4bdc8e3993e40cb1ba51e4706be6637e3afd40b4	2020-06-25 12:25:34 -07:00
Radhakrishnan Venkataramani	f41173b975	[PyPer][quant] Add quantized embedding operators to OSS. (#40076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40076 Pull Request resolved: https://github.com/pytorch/glow/pull/4606 [PyPer][quant] Add quantized embedding operators to OSS. This is the first step in supporting Graph Mode Quantization for EmbeddingBag. At a high level, the next steps would be a) Implementation of Embedding prepack/unpack operators, b) Implementation of torch.nn.quantized.dynamic.EmbeddingBag Module, c) Implementation of torch.nn.quantized.EmbeddingBag Module, d) Implementation (modification) of IR passes to support graph quantization of EmbeddingBag module. More in-depth details regarding each step will be in the follow up diffs. Consider this as an initial diff that moves operators to respective places that's required for us to proceed. Test Plan: ```buck test mode/no-gpu caffe2/test:quantization -- --stress-runs 100 test_embedding_bag``` Reviewed By: supriyar Differential Revision: D21949828 fbshipit-source-id: cad5ed0a855db7583bddb1d93e2da398c128024a	2020-06-25 12:01:49 -07:00
Yujun Zhao	461014d54b	Unify libtorch_python_cuda_core_sources filelists between CMakeList, fbcode and bazel (#40554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40554 Get a sublist of `libtorch_python_cuda_sources` named `libtorch_python_cuda_core_sources`. Use it to replace the list which has the same content in `CMakeList.txt`. This is a change to make consistency between CMakeList and bazel. Test Plan: CI Reviewed By: malfet Differential Revision: D22223207 fbshipit-source-id: 2bde3c42a0b2d60d689581561075df4ef52ab694	2020-06-25 11:02:33 -07:00
David Reiss	7369dc8d1f	Use CPU Allocator for reading from zip container Summary: This code path is used to read tensor bodies, so we need it to respect alignment and padding requirements. Test Plan: Ran an internal test that was failing. Reviewed By: zdevito Differential Revision: D22225622 fbshipit-source-id: f2126727f96616366850642045ab9704f3885824	2020-06-25 10:51:49 -07:00
Richard Zou	c362138f43	Disallow passing functions that don't return Tensors to vmap (#40518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40518 I overlooked this in the initial vmap frontend api PR. Right now we want to restrict vmap to taking in functions that only return Tensors. A function that only return tensors can look like one of the following: ``` def fn1(x): ... return y def fn2(x): ... return y, z ``` fn1 returns a Tensor, while fn2 returns a tuple of Tensors. So we add a check that the output of the function passed to vmap returns either a single tensor or a tuple of tensors. NB: These checks allow passing a function that returns a tuple with a single-element tensor from vmap. That seems OK to me. Test Plan: - `python test/test_vmap.py -v` Differential Revision: D22216166 Pulled By: zou3519 fbshipit-source-id: a92215e9c26f6138db6b10ba81ab0c2c2c030929	2020-06-25 08:54:05 -07:00
Richard Zou	43757ea913	Add batching rule for Tensor.permute (#40517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40517 This is necessary for implementing the vmap frontend API's out_dims functionality. Test Plan: - `./build/bin/vmap_test`. The vmap python API can't accept inputs that aren't integers right now. There are workarounds around that (use a lambda) but that doesn't look too nice. In the future we'll test all batching rules in Python. Differential Revision: D22216168 Pulled By: zou3519 fbshipit-source-id: b6ef552f116fddc433e242c1594059b9d2fe1ce4	2020-06-25 08:54:01 -07:00
Richard Zou	7038579c03	Add batching rule for unsqueeze, squeeze, and transpose (#40455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40455 These don't need to be implemented right now but are useful later down the line. I thought I would use these in implementing vmap's `out_dims` functionality, but it turns out they weren't necessary. Since the code exists and is useful anyways, I am leaving this PR here. Test Plan: - `./build/bin/vmap_test`. We could test this using the vmap frontend API, but there is the catch that vmap cannot directly take integers right now (all inputs passed to vmap must be Tensors at the moment). It's possible to hack around that by declaring lambdas that take in a single tensor argument, but those don't look nice. Differential Revision: D22216167 Pulled By: zou3519 fbshipit-source-id: 1a010f5d7784845cca19339d37d6467f5b987c32	2020-06-25 08:51:27 -07:00
X Wang	88ea51c061	doc string fix for torch.cuda.set_rng_state_all (#40544 ) Summary: Fix https://github.com/pytorch/pytorch/issues/40239 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40544 Differential Revision: D22233989 Pulled By: ezyang fbshipit-source-id: b5098357a3e0c50037f95ba0d701523d5dce2628	2020-06-25 08:37:14 -07:00
Jerry Zhang	e440c370c5	[quant] Fix fuse linear pass (#40549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40549 Currently we didn't check if %weight_t is produced by `aten::t`, this will fuse some `matmul`/`addmm` that is not 2d to `aten::linear`, which is incorrect Test Plan: Imported from OSS Differential Revision: D22225921 fbshipit-source-id: 9723e82fdbac6d8e1a7ade22f3a9791321ab12b6	2020-06-25 07:10:09 -07:00
Matt Galloway	eae1ed99a3	caffe2 \| Fix building with `-Wrange-loop-analysis` on Summary: `-Wrange-loop-analysis` is turned on by default for clang 10 (see https://reviews.llvm.org/D73834). This fixes a warning that's found with that. Test Plan: Build with clang 10 and check there are no `range-loop-analysis` warnings. Reviewed By: yinghai Differential Revision: D22207072 fbshipit-source-id: 858ba8a36c653071eab961cb891ce945faf0fa87	2020-06-24 23:42:33 -07:00
Sameer Deshmukh	cf8a9b50ca	Allow ReflectionPad to accept 0-dim batch sizes. (#39231 ) Summary: Allows ReflectionPad 1D and 2D to accept 0-dim batch sizes. Related to issues: * https://github.com/pytorch/pytorch/issues/38115 * https://github.com/pytorch/pytorch/issues/12013 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39231 Reviewed By: ezyang Differential Revision: D22205717 Pulled By: mruberry fbshipit-source-id: 6744661002fcbeb4aaafd8693fb550ed53f3e00f	2020-06-24 22:24:05 -07:00
Ilia Cherniavskii	82e9318a16	Adjust CUDA memory leak test (#40504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40504 Make CUDA mem liek test not flaky Test Plan: python test/test_profiler.py Differential Revision: D22215527 Pulled By: ilia-cher fbshipit-source-id: 5f1051896342ac50cd3a21ea86ce7487b5f82a19	2020-06-24 18:22:46 -07:00
Pavel Belevich	85b87df5ba	Revert D22208758: [pytorch][PR] Report error when ATEN_THEADING is OMP and USE_OPENMP is turned off. Test Plan: revert-hammer Differential Revision: D22208758 (`3ed96e465c`) Original commit changeset: 0866c9bb9b3b fbshipit-source-id: 9e2b469469e274292b2559c02aa0256425fd355e	2020-06-24 18:20:28 -07:00
Jiatong Zhou	06debf6373	move __range_length and __derive_index to lite interpreter (#40533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40533 These ops are required by the demucs denoiser model Test Plan: build Reviewed By: kaustubh-kp, linbinyu Differential Revision: D22216217 fbshipit-source-id: f300ac246fe3a7a6566a70bb89858770af68a90c	2020-06-24 18:14:51 -07:00
Eli Uriegas	adcd755e69	Fix backup solution (#40515 ) Summary: These were changes that had to be made in the `release/1.6` branch in order to get backups to work. They should be brought to the master branch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40515 Differential Revision: D22221308 Pulled By: seemethere fbshipit-source-id: 24e2a0196a8e775fe324a383c8f0c681118b741b	2020-06-24 17:21:38 -07:00
Nikita Shulga	e12f73ee12	Add missing file to BUILD.bazel (#40536 ) Summary: Add `int8_gen_quant_params.cc` added by https://github.com/pytorch/pytorch/pull/40494/ to bazel build rules Pull Request resolved: https://github.com/pytorch/pytorch/pull/40536 Reviewed By: mruberry Differential Revision: D22219595 Pulled By: malfet fbshipit-source-id: 2875a0b9c55bad2b052a898661b96eab490f6451	2020-06-24 17:16:26 -07:00
Peter Bell	3dcc329746	Use tree-based sum for floats to avoid numerical instability (#39516 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38716, fixes https://github.com/pytorch/pytorch/issues/37234 This algorithm does the summation along a single axis with multiple "levels" of accumulator, each of which is designed to hold the sum of an order of magnitude more values than the previous. e.g. if there are 2^16 elements, the first level will hold the sum of 2^4 elements, and so on in increasing powers of 2: 2^4, 2^8, 2^12 and finally 2^16. This limits the differences in magnitude of the partial results being added together, and so we don't lose accuracy as the axis length increases. WIP to write a vectorized version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39516 Reviewed By: ezyang Differential Revision: D22106251 Pulled By: ngimel fbshipit-source-id: b56de4773292439dbda62b91f44ff37715850ae9	2020-06-24 17:06:38 -07:00
Pritam Damania	ea06db9466	Release GIL during DDP construction. (#40495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40495 As part of debugging flaky ddp_under_dist_autograd tests, I realized we were running into the following deadlock. 1) Rank 0 would go into DDP construction, hold GIL and wait for broadcast in DDP construction. 2) Rank 3 is a little slower and performs an RRef fetch call before the DDP construction. 3) The RRef fetch call is done on Rank 0 and tries to acquire GIL. 4) We now have a deadlock since Rank 0 is waiting for Rank 3 to enter the collective and Rank 3 is waiting for Rank 0 to release GIL. ghstack-source-id: 106534442 Test Plan: 1) Ran ddp_under_dist_autograd 500 times. 2) waitforbuildbot Differential Revision: D22205180 fbshipit-source-id: 6afd55342e801b9edb9591ff25158a244a8ea66a	2020-06-24 16:58:42 -07:00
Ashkan Aliabadi	71edd7f175	Update FP16 to FP16:4dfe081cf6bcd15db339cf2680b9281b8451eeb3. (#40526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40526 Differential Revision: D22215600 Pulled By: AshkanAliabadi fbshipit-source-id: 6ff0c17d17f118b64ae34c0007b705c7127f07ef	2020-06-24 16:58:40 -07:00
Peter Bell	16f276cef9	Add C++-only `int dim` overloads to `std`-related operations (#40451 ) Summary: Fixes gh-40287 The `int -> bool` conversion takes higher precedence than `int -> IntArrayRef`. So, calling `std(0)` in C++ would select the `std(unbiased=False)` overload instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40451 Differential Revision: D22217926 Pulled By: ezyang fbshipit-source-id: 7520792fab5ab6665bddd03b6f57444c6c729af4	2020-06-24 16:56:55 -07:00
Ashkan Aliabadi	a208a272cb	Update cpuinfo to cpuinfo:63b254577ed77a8004a9be6ac707f3dccc4e1fd9. (#40516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40516 Differential Revision: D22215554 Pulled By: AshkanAliabadi fbshipit-source-id: f779cf6e08cf344b87071c2ffc9b3f7cf4659085	2020-06-24 16:47:24 -07:00
Nikita Shulga	c120fdc05b	Unify `torch/csrc/cuda/shared/cudnn.cpp` include path (#40525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40525 Move `USE_CUDNN` define under `USE_CUDA` guard, add `cuda/shared/cudnn.cpp` to filelist if either USE_ROCM or USE_CUDNN is set. This is a prep change for PyTorch CUDA src filelist unification change. Test Plan: CI Differential Revision: D22214899 fbshipit-source-id: b71b32fc603783b41cdef0e7fab2cc9cbe750a4e	2020-06-24 16:40:11 -07:00
Ashkan Aliabadi	cef35e339f	Update FXdiv to FXdiv:b408327ac2a15ec3e43352421954f5b1967701d1. (#40520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40520 Differential Revision: D22215614 Pulled By: AshkanAliabadi fbshipit-source-id: 5e41a3a69522cbfe1cc4ac76a0d1f3e90a58528d	2020-06-24 16:31:25 -07:00
Ashkan Aliabadi	4a0ba62ded	Update psimd to psimd:072586a71b55b7f8c584153d223e95687148a900. (#40522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40522 Differential Revision: D22215685 Pulled By: AshkanAliabadi fbshipit-source-id: 78c103c4f7ad21e78069dc86a8ee47aebc9aa73e	2020-06-24 16:21:25 -07:00
Wanchao Liang	3e09268c0a	[jit] allow dict to be mixed between tracing and scripting (#39601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39601 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D22202689 Pulled By: wanchaol fbshipit-source-id: 5271eb3d8fdcda3d730a085aa555b43c35d14876	2020-06-24 16:14:13 -07:00
Wanchao Liang	787e1c4c7d	[jit] fix dictConstruct order issue (#40424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40424 dictConstruct should preserve the inputs order Test Plan: Imported from OSS Differential Revision: D22202690 Pulled By: wanchaol fbshipit-source-id: c313b531b7fa49e6f3486396d61bfc5d6400cd01	2020-06-24 16:12:32 -07:00
Jessica Lin	2e6e8d557c	Update docs feature classifications (#39966 ) Summary: Update the following feature classifications in docs to align with the changes: 1. [High Level Autograd APIs](https://pytorch.org/docs/stable/autograd.html#functional-higher-level-api): Beta (was experimental) 2. [Eager Mode Quantization](https://pytorch.org/docs/stable/quantization.html): Beta (was experimental) 3. [Named Tensors](https://pytorch.org/docs/stable/named_tensor.html): Prototype (was experimental) 4. [TorchScript/RPC](https://pytorch.org/docs/stable/rpc.html#rpc): Prototype (was experimental) 5. [Channels Last Memory Layout](https://pytorch.org/docs/stable/tensor_attributes.html#torch-memory-format): Beta (was experimental) 6. [Custom C++ Classes](https://pytorch.org/docs/stable/cpp_index.html): Beta (was experimental) 7. [Torch.Sparse](https://pytorch.org/docs/stable/sparse.html): Beta (was experimental) Pull Request resolved: https://github.com/pytorch/pytorch/pull/39966 Differential Revision: D22213217 Pulled By: jlin27 fbshipit-source-id: dc49337cbc7026ed8dcac506fc60029dc3add854	2020-06-24 15:35:59 -07:00
xcnick	72f2c479e3	Migrate equal from the TH to Aten (CPU) (#33286 ) Summary: https://github.com/pytorch/pytorch/issues/24697 VitalyFedyunin glaringlee Test script: ```Python import timeit setup_ones = """ import torch a = torch.ones(({n}, {n}), dtype={dtype}) b = torch.ones(({n}, {n}), dtype={dtype}) """ for n, t in [(1000, 10000), (2000, 10000)]: for dtype in ('torch.bool', 'torch.int', 'torch.long', 'torch.bfloat16', 'torch.float', 'torch.double'): #for dtype in ('torch.bool', 'torch.int', 'torch.long', 'torch.float', 'torch.double'): print('torch.ones(({n}, {n})) equal for {t} times {dtype}'.format(n=n, t=t, dtype=dtype)) print(timeit.timeit(stmt='torch.equal(a, b)', setup=setup_ones.format(n=n, dtype=dtype), number=t)) setup_rand = """ import torch a = torch.rand(({n}, {n}), dtype={dtype}) b = a.clone() """ for n, t in [(1000, 10000), (2000, 10000)]: for dtype in ('torch.float', 'torch.double'): print('torch.rand(({n}, {n})) for {t} times {dtype}'.format(n=n, t=t, dtype=dtype)) print(timeit.timeit(stmt='torch.equal(a, b)', setup=setup_rand.format(n=n, dtype=dtype), number=t)) setup_non_contiguous = """ import torch a = torch.rand(({n}, {n}), dtype={dtype}) a2 = a[:, 500:] a3 = a2.clone() torch.equal(a2, a3) """ for n, t in [(1000, 10000), (2000, 10000)]: for dtype in ('torch.float', 'torch.double'): print('non_contiguous torch.rand(({n}, {n})) for {t} times {dtype}'.format(n=n, t=t, dtype=dtype)) print(timeit.timeit(stmt='torch.equal(a2, a3)', setup=setup_non_contiguous.format(n=n, dtype=dtype), number=t)) setup_not_equal = """ import torch a = torch.rand(({n}, {n}), dtype={dtype}) b = torch.rand(({n}, {n}), dtype={dtype}) torch.equal(a, b) """ for n, t in [(1000, 10000), (2000, 10000)]: for dtype in ('torch.float', 'torch.double'): print('not equal torch.rand(({n}, {n})) for {t} times {dtype}'.format(n=n, t=t, dtype=dtype)) print(timeit.timeit(stmt='torch.equal(a, b)', setup=setup_not_equal.format(n=n, dtype=dtype), number=t)) ``` TH ``` torch.ones((1000, 1000)) equal for 10000 times torch.bool 1.8391206220258027 torch.ones((1000, 1000)) equal for 10000 times torch.int 1.8877864250680432 torch.ones((1000, 1000)) equal for 10000 times torch.long 1.938108820002526 torch.ones((1000, 1000)) equal for 10000 times torch.bfloat16 3.184849138953723 torch.ones((1000, 1000)) equal for 10000 times torch.float 1.8825413499725983 torch.ones((1000, 1000)) equal for 10000 times torch.double 2.7266416549682617 torch.ones((2000, 2000)) equal for 10000 times torch.bool 7.227149627986364 torch.ones((2000, 2000)) equal for 10000 times torch.int 7.76215292501729 torch.ones((2000, 2000)) equal for 10000 times torch.long 9.631909006042406 torch.ones((2000, 2000)) equal for 10000 times torch.bfloat16 8.097328286035918 torch.ones((2000, 2000)) equal for 10000 times torch.float 5.5739822529722005 torch.ones((2000, 2000)) equal for 10000 times torch.double 8.444009944912978 torch.rand((1000, 1000)) for 10000 times torch.float 1.168096570065245 torch.rand((1000, 1000)) for 10000 times torch.double 1.6577326939441264 torch.rand((2000, 2000)) for 10000 times torch.float 5.49395391496364 torch.rand((2000, 2000)) for 10000 times torch.double 8.507486199960113 non_contiguous torch.rand((1000, 1000)) for 10000 times torch.float 6.074504268006422 non_contiguous torch.rand((1000, 1000)) for 10000 times torch.double 6.1426916810451075 non_contiguous torch.rand((2000, 2000)) for 10000 times torch.float 37.501055537955835 non_contiguous torch.rand((2000, 2000)) for 10000 times torch.double 44.6880351039581 not equal torch.rand((1000, 1000)) for 10000 times torch.float 0.029356416082009673 not equal torch.rand((1000, 1000)) for 10000 times torch.double 0.025421109050512314 not equal torch.rand((2000, 2000)) for 10000 times torch.float 0.026333761983551085 not equal torch.rand((2000, 2000)) for 10000 times torch.double 0.02748022007290274 ``` ATen ``` torch.ones((1000, 1000)) equal for 10000 times torch.bool 0.7961567062884569 torch.ones((1000, 1000)) equal for 10000 times torch.int 0.49172434909269214 torch.ones((1000, 1000)) equal for 10000 times torch.long 0.9459248608909547 torch.ones((1000, 1000)) equal for 10000 times torch.bfloat16 2.0877483217045665 torch.ones((1000, 1000)) equal for 10000 times torch.float 0.606857153121382 torch.ones((1000, 1000)) equal for 10000 times torch.double 1.1388208279386163 torch.ones((2000, 2000)) equal for 10000 times torch.bool 2.0329296849668026 torch.ones((2000, 2000)) equal for 10000 times torch.int 3.534358019940555 torch.ones((2000, 2000)) equal for 10000 times torch.long 8.19841272290796 torch.ones((2000, 2000)) equal for 10000 times torch.bfloat16 6.595649406313896 torch.ones((2000, 2000)) equal for 10000 times torch.float 4.193911510054022 torch.ones((2000, 2000)) equal for 10000 times torch.double 7.931309659034014 torch.rand((1000, 1000)) for 10000 times torch.float 0.8877940969541669 torch.rand((1000, 1000)) for 10000 times torch.double 1.4142901846207678 torch.rand((2000, 2000)) for 10000 times torch.float 4.010025603231043 torch.rand((2000, 2000)) for 10000 times torch.double 8.126411964651197 non_contiguous torch.rand((1000, 1000)) for 10000 times torch.float 0.602473056409508 non_contiguous torch.rand((1000, 1000)) for 10000 times torch.double 0.6784545010887086 non_contiguous torch.rand((2000, 2000)) for 10000 times torch.float 3.0991827426478267 non_contiguous torch.rand((2000, 2000)) for 10000 times torch.double 5.719010795000941 not equal torch.rand((1000, 1000)) for 10000 times torch.float 0.046060710679739714 not equal torch.rand((1000, 1000)) for 10000 times torch.double 0.036034489050507545 not equal torch.rand((2000, 2000)) for 10000 times torch.float 0.03686975734308362 not equal torch.rand((2000, 2000)) for 10000 times torch.double 0.04189508780837059 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33286 Differential Revision: D22211962 Pulled By: glaringlee fbshipit-source-id: a5c48f328432c1996f28e19bc75cb495fb689f6b	2020-06-24 15:08:06 -07:00
peter	4d549077a2	Skip test_mem_leak on Windows (#40486 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/40485. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40486 Differential Revision: D22217493 Pulled By: malfet fbshipit-source-id: 6654c3b53e8af063b508f91728e58262ffbab053	2020-06-24 14:49:14 -07:00
Omkar Salpekar	0c923eea0a	Add finishAndThrow function to ProcessGroup::Work, and use with Gloo (#40405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40405 This adds a finishAndThrow function that completes the work object, sets an exception if one is provided by the user, and throws an exception (if it is already set or passed by the caller). This is now done by grabbing the lock just once and simplifies the wait functions in ProcessGroupGloo. ghstack-source-id: 106516114 Test Plan: CI Differential Revision: D22174890 fbshipit-source-id: ea74702216c4328187c8d193bf39e1fea43847f6	2020-06-24 14:46:25 -07:00
Omkar Salpekar	3e2d2fc856	[NCCL Docs] Adding Comments for Work-level Finish in ProcessGroup (#40404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40404 Adds docs to the finish function in ProcessGroup::Work. It's better to have some documentation around these functions since we have some PR's with API-changes/optimizations for these work-level functions here and in the subclasses. ghstack-source-id: 106381736 Test Plan: CI (Docs change only) Differential Revision: D22174891 fbshipit-source-id: 7901ea3b35caf6f69f37178ca574104d3412de28	2020-06-24 14:44:18 -07:00
Omkar Salpekar	527ab13436	[NCCL] Explicitly Abort NCCL Communicators on Process Group Destruction (#40241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40241 We abort incomplete NCCL Communicators in the ProcessGroupNCCL destructor, otherwise pending NCCL communciators may block other CUDA ops. Closes: https://github.com/pytorch/pytorch/issues/32231 ghstack-source-id: 106469423 Test Plan: CI/Sandcastle Reviewed By: jiayisuse Differential Revision: D22103662 fbshipit-source-id: 1f6f88b56bd7a5e9ca5a41698995a76e60e8ad9f	2020-06-24 14:34:00 -07:00
Nikita Shulga	fe18dcd692	Use GLOG logging prefixes (#40491 ) Summary: PyTorch should stop polluting global namespace with symbols such as `ERROR` `WARNING` and `INFO`. Since `logging_is_not_google_glog.h` is a C++ header, define severity levels in namespace and add `GLOG_` prefix to match an unshortened glog severity levels. Change `LOG` and `LOG_IF` macros to use prefix + namespaced severity levels. Closes https://github.com/pytorch/pytorch/issues/40083 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40491 Test Plan: CI Reviewed By: ezyang Differential Revision: D22210925 Pulled By: malfet fbshipit-source-id: 0ec1181a53baa8bca2f526f245e398582304aeab	2020-06-24 14:07:00 -07:00
Zhang, Xiaobing	fc4824aa4a	enable mkldnn dilation conv (#40483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40483 Reviewed By: ezyang Differential Revision: D22213696 Pulled By: ngimel fbshipit-source-id: 0321eee8fcaf144b20a5182aa76f98d505c65400	2020-06-24 13:28:05 -07:00
SsnL	de7ac60cf4	Add out= variants for cuda.comm.broadcast/gather/scatter (#39681 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/38911 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39681 Differential Revision: D22161342 Pulled By: mrshenli fbshipit-source-id: 60295077159b02087823e93bb6ebac9d70adea0a	2020-06-24 12:58:19 -07:00
Mike Ruberry	e66445878d	Adds dynamic versioning pattern (#40279 ) Summary: BC NOTE: This change makes it so modules saved with torch.jit.save in PyTorch 1.6 can be loaded by previous versions of PyTorch unless they use torch.div or (soon) torch.full. It also lets tensors saved using torch.save be loaded by previous versions. So this is the opposite of BC-breaking, but I'm using that label to highlight this issue since we don't have a "BC-improving" label. PR NOTE: When an operator's semantics change in PyTorch we want to do two things: 1) Preserve the semantics of older serialized Torchscript programs that use the operator 2) Ensure the new semantics are respected Historically, this meant writing a Versioned Symbol that would remap older versions of the operator into current PyTorch code (1), and bumping the produced file format version (2). Unfortunately, bumping the produced file format version is a nuclear option for ensuring semantics are respected, since it also prevents older versions of PyTorch from loading anything (even tensors!) from newer versions. Dynamic versioning addresses the nuclear consequences of bumping the produced file format version by only bumping it when necessary. That is, when an operator with changed semantics is detected in the serialized Torchscript. This will prevent Torchscript programs that use the changed operator from loading on earlier versions of PyTorch, as desired, but will have no impact on programs that don't use the changed operator. Note that this change is only applicable when using torch.jit.save and torch.jit.load. torch.save pickles the given object using pickle (by default), which saves a function's Python directly. No new tests for this behavior are added since the existing tests for versioned division in test_save_load already validate that models with div are loaded correctly at version 4. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40279 Reviewed By: dzhulgakov Differential Revision: D22168291 Pulled By: mruberry fbshipit-source-id: e71d6380e727e25123c7eedf6d80e5d7f1fe9f95	2020-06-24 12:52:50 -07:00
Shen Li	a2e1a948a4	Increase number of iterations in DDP SPMD tests (#40506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40506 Test Plan: Imported from OSS Differential Revision: D22208965 Pulled By: mrshenli fbshipit-source-id: 7d27b60e2c09e641b4eeb1c89d9f9917c4e72e52	2020-06-24 12:48:04 -07:00
Rohan Varma	9a3e16c773	Add guard for non-default stream in DDP's autograd engine callback (#40115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40115 Closes https://github.com/pytorch/pytorch/issues/37790 Closes https://github.com/pytorch/pytorch/issues/37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR ghstack-source-id: 106481208 Differential Revision: D22073353 fbshipit-source-id: 70da9b44e5f546ff8b6d8c42022ecc846dff033e	2020-06-24 11:26:51 -07:00
Summer Deng	597cb04b2f	Use Int8QuantParamsBlob to pass the scale and zeropoint params (#40494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40494 Resubmit the diff because D22124313 (`1ec4337b7d`) was reverted due to CI test failures Added the int8_gen_quant_params.cc to CMakeList.txt to fix the CI failures Test Plan: buck test caffe2/caffe2/quantization/server: Reviewed By: hx89 Differential Revision: D22204244 fbshipit-source-id: a2c8b668f199cc5b0c5894086f554f7c459b1ad7	2020-06-24 10:20:16 -07:00
Hong Xu	3ed96e465c	Report error when ATEN_THEADING is OMP and USE_OPENMP is turned off. (#40146 ) Summary: Currently, even if USE_OPENMP is turned off, ATEN_THEADING can still use OpenMP. This commit fixes it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40146 Reviewed By: ezyang Differential Revision: D22208758 Pulled By: pbelevich fbshipit-source-id: 0866c9bb9b3b5b99d586aed176eb0fbe177efa4a	2020-06-24 09:55:10 -07:00
Michael Carilli	b4ccdef090	Allow torch.cuda.amp.GradScaler to support sparse gradients (#36786 ) Summary: Should close https://github.com/pytorch/pytorch/issues/35810. I decided to keep sparse handling on the Python side for clarity, although it could be moved to the C++ side (into `_amp_non_finite_check_and_unscale_`) without much trouble. For non-fp16 sparse grads the logic is simple (call `_amp_non_finite_check_and_unscale_` on `grad._values()`) instead of `grad` itself. At least I hope it's that easy. For fp16 sparse grads, it's tricker. Sparse tensors can be uncoalesced. From the [Note](https://pytorch.org/docs/master/sparse.html#torch.sparse.FloatTensor): > Our sparse tensor format permits uncoalesced sparse tensors, where there may be duplicate coordinates in the indices; in this case, the interpretation is that the value at that index is the sum of all duplicate value entries. An uncoalesced scaled fp16 grad may have values at duplicate coordinates that are all finite but large, such that adding them to make the coalesced version WOULD cause overflows. If I checked `_values()` on the uncoalesced version, it might not report overflows, but I think it should. So, if the grad is sparse, fp16, and uncoalesced, I still call `_amp_non_finite_check_and_unscale_` to unscale `grad._values()` in-place, but I also double-check the coalesced version by calling a second `_amp_non_finite_check_and_unscale_` on `grad.coalesce()._values()`. `coalesce()` is out-of-place, so this call doesn't redundantly affect `grad._values()`, but it does have the power to populate the same `found_inf` tensor. The `is_coalesced()` check and `coalesce()` probably aren't great for performance, but if someone needs a giant embedding table in FP16, they're better than nothing and memorywise, they'll only create a copy of nnz gradient values+indices, which is still way better than changing the whole table to FP32. An `unscale` variant with liberty to create unscaled grads out-of-place, and replace `param.grad` instead of writing through it, could get away with just one `_amp_non_finite_check_and_unscale_`. It could say `coalesced = grad.coalesced()`, do only the stronger `_amp_non_finite_check_and_unscale_` on `coalesced._values()`, and set `param.grad = coalesced`. I could even avoid replacing `param.grad` itself by going one level deeper and setting `param.grad`'s indices and values to `coalesced`'s, but that seems brittle and still isn't truly "in place". you could whiteboard an uncoalesced fp32 grad with the same property, but fp32's range is big enough that I don't think it's realistic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36786 Reviewed By: ezyang Differential Revision: D22202832 Pulled By: ngimel fbshipit-source-id: b70961a4b6fc3a4c1882f65e7f34874066435735	2020-06-24 09:10:49 -07:00
Will Constable	d855528186	wconstab/38034-sliced-sequential (#40445 ) Summary: Partial support for slicing of Sequential containers. - works around missing Sequential slice functionality by converting to tuple - only supports iteration of resulting tuple values, not direct call() on the sliced sequential Pull Request resolved: https://github.com/pytorch/pytorch/pull/40445 Differential Revision: D22192469 Pulled By: wconstab fbshipit-source-id: 61c85deda2d58f6e3bea2f1fa1d5d5dde568b9b5	2020-06-24 09:05:51 -07:00
Richard Zou	727463a727	Initial vmap frontend API (#40172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40172 This PR introduces the initial vmap frontend API. It has the following limitations that we can resolve in the future: - the inputs must be a flat list of tensors - the outputs must be a flat list of tensors - in_dims = 0 (so we always vmap over dim 0 of input tensors) - out_dims = 0 (so the returned tensors have their vmap dim appear at dim 0) - Coverage limited to operations that have batching rules implemented (torch.mul, torch.sum, torch.expand). There are some other semantic limitations (like not being able to handle mutation, aside from pytorch operations that perform mutation) that will be documented in the future. I wanted to introduce the API before adding a slow fallback for the coverage so that we can test future batching rules (and coverage) via the python API to avoid verbosity in C++-land. The way vmap works is that `vmap(func)(inputs)` wraps all Tensor inputs to be batched in BatchedTensors, sends those into func, and then unwraps the output BatchedTensors. Operations on BatchedTensors perform the batched operations that the user is asking for. When performing nested vmaps, each nested vmap adds a batch dimension upon entry and removes a batch dimension on exit. Coming up in the near future: - Support for non-zero in_dims and out_dims - docstring for vmap - slow fallback for operators that do not have a batching rule implemented. Test Plan: - `pytest test/test_vmap.py -v` Differential Revision: D22102076 Pulled By: zou3519 fbshipit-source-id: b119f0a8a3a3b1717c92dbbd180dfb1618295563	2020-06-24 08:14:24 -07:00
Richard Zou	43ab9c677b	Add invariants check to BatchedTensorImpl (#40171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40171 It checks that all of the bdims in BatchedTensorImpl are sorted in order of ascending `level`. Test Plan: - Check that nothing breaks in `./build/bin/vmap_test` Differential Revision: D22102077 Pulled By: zou3519 fbshipit-source-id: 094b7abc6c65208437f0f51a0d0083091912decc	2020-06-24 08:12:16 -07:00
Shawn Zhong	e490352dc4	Simplify complex case for tanh backward (#39997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39997 Differential Revision: D22195797 Pulled By: anjali411 fbshipit-source-id: 21eb91bcbd3bfc67acd322a1579fe737b0c02e6e	2020-06-24 07:51:34 -07:00
Eric Cotner	4975be80f8	fix typo "normal" -> "Cauchy" (#40334 ) Summary: just looks like a real simple typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/40334 Reviewed By: ezyang Differential Revision: D22195107 Pulled By: zou3519 fbshipit-source-id: 6c43842d22cbc15db2307976381f6dc1536b5047	2020-06-24 07:45:35 -07:00
Danning XIE	ecd9a64712	fix `torch.jit.trace_module` documentation (#40248 ) Summary: This should fix https://github.com/pytorch/pytorch/issues/39328 Before: ![image](https://user-images.githubusercontent.com/24580222/85076992-4720e800-b18f-11ea-9c6e-19bcf3f1cb7d.png) After: ![image](https://user-images.githubusercontent.com/24580222/85077064-6ddf1e80-b18f-11ea-9274-e8cee6909baa.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/40248 Reviewed By: ezyang Differential Revision: D22195038 Pulled By: zou3519 fbshipit-source-id: c4bff6579a422a56ed28b644f5558b20d901c94e	2020-06-24 07:31:31 -07:00
Nicolas Hug	a4dec0674c	[doc] fix typo in formula of MarginRankingLoss (#40285 ) Summary: This is just a minor doc fix: the `MarginRankingLoss` takes 2 input samples `x1` and `x2`, not just `x` Pull Request resolved: https://github.com/pytorch/pytorch/pull/40285 Reviewed By: ezyang Differential Revision: D22195069 Pulled By: zou3519 fbshipit-source-id: 909f491c94dca329a37216524f4088e9096e0bc6	2020-06-24 07:24:51 -07:00
Alexander	e439cf738a	Fix examples Adaptive avg pooling typo (#40217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40217 Reviewed By: ezyang Differential Revision: D22193711 Pulled By: zou3519 fbshipit-source-id: f96f71e025aa1c81b232e78b1d5b3a3bbd8f331f	2020-06-24 07:22:46 -07:00
Kenso Trabing	72e8690b78	Fix typo. in error message (#39958 ) Summary: Changed sould to should Pull Request resolved: https://github.com/pytorch/pytorch/pull/39958 Reviewed By: ezyang Differential Revision: D22193674 Pulled By: zou3519 fbshipit-source-id: ad7bc0aa3ee1f31f5e7965ae36c1903b28509095	2020-06-24 07:17:10 -07:00
Venkata Chintapalli	b4eb82cd29	Temporary commit at 6/17/2020, 6:49:44 PM Summary: [WIP] Logit Fake16 Op Test Plan: [WIP] Tests will be enabled in test_op_nnpi_fp16.py file. Reviewed By: hyuen Differential Revision: D22109329 fbshipit-source-id: fd73850c3ec61375ff5bbf0ef5460868a874fbf3	2020-06-24 06:51:48 -07:00
Shihao Xu	0ecea2d64d	[JIT x RPC] Consolidate Future type class and Future impl class (#40406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40406 Same motivation for https://github.com/pytorch/pytorch/issues/35110. `Future` and `RRef` are two important types for `rpc` module, should make users feel easy to use. Reference, https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-autoclass Follow https://github.com/pytorch/pytorch/pull/35694. ghstack-source-id: 106484664 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_rref_local_value ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/tensorpipe:rpc_fork_tensorpipe ``` pyre -l caffe2/torch/fb/training_toolkit pyre -l caffe2/torch/fb/distributed pyre -l aiplatform Differential Revision: D7722176 fbshipit-source-id: f3b9ccd7bccb233b2b33ad59dd65e178ba34d67f	2020-06-24 01:44:49 -07:00
chengjinfang	f035f73d53	Fix the issue that run clang-tidy on the aten folder (#39713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39713 Differential Revision: D22203850 Pulled By: mruberry fbshipit-source-id: 43f690e748b7a3c123ad20f6d640d6dae25c641c	2020-06-24 01:27:54 -07:00
Haixin Liu	46b9e519aa	Remove print (#40475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40475 As title ghstack-source-id: 106474870 Test Plan: CI Differential Revision: D22200640 fbshipit-source-id: 1f4c7bbf54be8c4187c9338fefdf14b501597d98	2020-06-24 00:42:25 -07:00
Lingyi Liu	7b0f867c48	Perf improvement of Conv2d and Conv3d (#40324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40324 1) avoid the use of item 2) bypass the im2col for 1x1 conv Test Plan: unit test and perf benchmark to show improvement ``` num = 50 N = 1 C = 512 H = 4 W = 4 M = 512 kernel_h = 1 kernel_w = 1 stride_h = 1 stride_w = 1 padding_h = 0 padding_w = 0 X_np = np.random.randn(N, C, H, W).astype(np.float32) W_np = np.random.randn(M, C, kernel_h, kernel_w).astype(np.float32) X = torch.from_numpy(X_np) conv2d_pt = torch.nn.Conv2d( C, M, (kernel_h, kernel_w), stride=(stride_h, stride_w), padding=(padding_h, padding_w), groups=1, bias=True) class ConvNet(torch.nn.Module): def __init__(self): super(ConvNet, self).__init__() self.conv2d = conv2d_pt def forward(self, x): return self.conv2d(x) model = ConvNet() def pt_forward(): # with torch.autograd.profiler.profile(record_shapes=True) as prof: model(X) # print(prof.key_averages().table(sort_by="self_cpu_time_total")) torch._C._set_mkldnn_enabled(False) t = Timer("pt_forward()", "from __main__ import pt_forward, X") ``` Before the optimization: pt time = 5.841153813526034 After the optimization: pt time = 4.513134760782123 Differential Revision: D22149067 fbshipit-source-id: 538d9eea5b729e6c3da79444bde1784bde828876	2020-06-23 23:39:05 -07:00
Mike Ruberry	cb26661fe4	Throws runtime error when torch.full would infer a float dtype from a bool or integral fill value (#40364 ) Summary: BC-breaking NOTE: In PyTorch 1.6 bool and integral fill values given to torch.full must set the dtype our out keyword arguments. In prior versions of PyTorch these fill values would return float tensors by default, but in PyTorch 1.7 they will return a bool or long tensor, respectively. The documentation for torch.full has been updated to reflect this. PR NOTE: This PR causes torch.full to throw a runtime error when it would have inferred a float dtype by being given a boolean or integer value. A versioned symbol for torch.full is added to preserve the behavior of already serialized Torchscript programs. Existing tests for this behavior being deprecated have been updated to reflect it now being unsupported, and a couple new tests have been added to validate the versioned symbol behavior. The documentation of torch.full has also been updated to reflect this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40364 Differential Revision: D22176640 Pulled By: mruberry fbshipit-source-id: b20158ebbcb4f6bf269d05a688bcf4f6c853a965	2020-06-23 23:27:22 -07:00
peterjc123	a2d4d9eca6	Improve Dynamic Library for Windows (#40365 ) Summary: 1. Use LoadLibraryEx if available 2. Print more info on error Pull Request resolved: https://github.com/pytorch/pytorch/pull/40365 Differential Revision: D22194974 Pulled By: malfet fbshipit-source-id: e8309f39d78fd4681de5aa032288882910dff928	2020-06-23 20:29:48 -07:00
peter	e2201e2ed8	Fixes caffe2 loading issues on Windows (#39513 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/27840#issuecomment-638715422. Contains a bunch of fixes (https://github.com/pytorch/pytorch/pull/39376 + https://github.com/pytorch/pytorch/pull/39334 + https://github.com/pytorch/pytorch/pull/38302 + https://github.com/pytorch/pytorch/pull/35362) Pull Request resolved: https://github.com/pytorch/pytorch/pull/39513 Differential Revision: D22190761 Pulled By: malfet fbshipit-source-id: b2d52f6cb16c233d16071e9c0670dfff7da2710e	2020-06-23 20:11:24 -07:00
Shihao Xu	7c07c39845	[torch.distributed.rpc] Install method docstrings from PyRRef to RRef (#40461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40461 It turned out `:inheried-members:` (see [doc](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-autoclass)) is not really usable. Because pybind11 generates a docstring that writes `self` as parent class, `rpc.PyRRef`, type. As a workaround, I am pulling docstrings on parent-class, `PyRRef` class, into subclass, `RRef`. And do surgery on the docstring generated by pybind11. {F241283111} ghstack-source-id: 106472496 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/rpc/:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par \ -r test_rref_str buck build mode/dev-nosan //caffe2/test/distributed/rpc/:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par \ -r test_return_local_rrefs buck test mode/dev-nosan //caffe2/torch/fb/distributed/model_parallel/tests:test_elastic_averaging -- 'test_elastic_averaging_center $caffe2\.torch\.fb\.distributed\.model_parallel\.tests\.test_elastic_averaging\.TestElasticAveragingCenter$' P134031188 Differential Revision: D7933834 fbshipit-source-id: c03a8a4c9d98888b64492a8caba1591595bfe247	2020-06-23 19:58:36 -07:00
Jessica Lin	7c737eab59	Remove table of contents at the top of rpc.rst (#40205 ) Summary: mattip - Can we remove the table of contents created by the `.. contents:: :local: :depth: 2` since this page isn't one of the large documentation pages (https://github.com/pytorch/pytorch/issues/38010) and is simply a landing page for the Distributed RPC Framework? Changes made in this original PR: `f10fbcc820 (diff-250b9b23fd6f1a5c15aecdb72afb9d7d)` cc mrshenli Pull Request resolved: https://github.com/pytorch/pytorch/pull/40205 Differential Revision: D22194943 Pulled By: jlin27 fbshipit-source-id: 4e42845daf2784a17ad81645fe3b838385656bba	2020-06-23 19:45:11 -07:00
David Reiss	b7e044f0e5	Re-apply PyTorch pthreadpool changes Summary: This re-applies D21232894 (`b9d3869df3`) and D22162524, plus updates jni_deps in a few places to avoid breaking host JNI tests. Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test` Reviewed By: xcheng16 Differential Revision: D22199952 fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5	2020-06-23 19:26:21 -07:00
Ashkan Aliabadi	bdc00196d1	Enable XNNPACK ops on iOS and macOS. Test Plan: buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/pytext/pytext_mobile_inference.json --platform ios --framework pytorch --remote --devices D221 (`9788a74da8`)AP-12.0.1 Reviewed By: xta0 Differential Revision: D21886736 fbshipit-source-id: ac482619dc1b41a110a3c4c79cc0339e5555edeb	2020-06-23 18:50:36 -07:00
Zafar	c314e0deb5	[quant] Quantized adaptive_avg_pool3d (#40271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40271 Closes #40244 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22134318 Pulled By: z-a-f fbshipit-source-id: 0489b6c083a3cbc21a1d81d8bfcc499372308088	2020-06-23 18:13:48 -07:00
Elias Ellison	6468bc4637	[JIT] script if tracing fix (#40468 ) Summary: Currently, torchvision annotates `batched_nms` with `torch.jit.script` so the function gets compiled when it is traced and ONNX will work. Unfortunately, this means we are eagerly compiling batched_nms, which fails if torchvision isn't built with `torchvision.ops.nms`. As a result, torchvision doesn't work on torch hub right now. `_script_if_tracing` could solve our problem here, but right now it does not correctly interact with recursive compilation. This PR fixes that bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40468 Reviewed By: jamesr66a Differential Revision: D22195771 Pulled By: eellison fbshipit-source-id: 83022ca0bab6d389a48a478aec03052c9282d2b7	2020-06-23 17:14:28 -07:00
Kate Mormysh	92d3182c11	Revert D21232894: Unify PyTorch mobile's threadpool usage. Test Plan: revert-hammer Differential Revision: D21232894 (`b9d3869df3`) Original commit changeset: 8b3de86247fb fbshipit-source-id: e6517cfec08f7dd0f4f8877dab62acf1d65afacd	2020-06-23 17:09:14 -07:00
Vitaly Fedyunin	ddb8565b25	Revert D22162469: [pytorch][PR] Migrate `var` & `std` to ATen Test Plan: revert-hammer Differential Revision: D22162469 (`7a3c223bbb`) Original commit changeset: 8d901c779767 fbshipit-source-id: 9e0fa439732478349c0ac6c7baafba063edfac5d	2020-06-23 17:04:15 -07:00
Nikita Shulga	7e32e6048d	Fix linspace step computation for large integral types (#40132 ) Summary: Convert start and end to `step_t` before computing the difference Should fix `torch.linspace(-2147483647, 2147483647, 10, dtype=torch.int32)` Closes https://github.com/pytorch/pytorch/issues/40118 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40132 Differential Revision: D22190095 Pulled By: malfet fbshipit-source-id: 01cb158a30c505191df663d021804d411b697871	2020-06-23 16:59:59 -07:00
Nikita Shulga	883e4c44b2	Raise exception when trying to build PyTorch on 32-bit Windows system (#40321 ) Summary: Makes errors in cases described in https://github.com/pytorch/pytorch/issues/27815 more obvious Pull Request resolved: https://github.com/pytorch/pytorch/pull/40321 Differential Revision: D22198352 Pulled By: malfet fbshipit-source-id: 327d81103c066048dcf5f900fd9083b09942af0e	2020-06-23 16:54:20 -07:00
vfdev	a6a2dd14ea	Fix typo in warning message (#39854 ) Summary: Fix typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/39854 Reviewed By: ezyang Differential Revision: D22193544 Pulled By: zou3519 fbshipit-source-id: 04b9f59da7b6ba0649fc6d315adcf20685e10930	2020-06-23 16:47:35 -07:00
Jerry Zhang	0e26a03ef9	[quant][graphmode] Enable inplace option for top level API (#40414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40414 after `_reconstruct` is supported in RecursiveScriptModule: https://github.com/pytorch/pytorch/pull/39979 we can support inplace option in quantization API Test Plan: Imported from OSS Differential Revision: D22178326 fbshipit-source-id: c78bc2bcf2c42b06280c12262bb31aebcadc6c32	2020-06-23 16:42:48 -07:00
Ivan Kobzarev	2e6da36298	[android][ci] Fix CI packaging headers to aar (#40442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40442 Problem: Nightly builds do not include libtorch headers as local build. The reason is that on docker images path is different than local path when building with `scripts/build_pytorch_android.sh` Solution: Introducing gradle property to be able to specify it and add its specification to gradle build job and snapshots publishing job which run on the same docker image. Test: ci-all jobs check https://github.com/pytorch/pytorch/pull/40443 checking that gradle build will result with headers inside aar Test Plan: Imported from OSS Differential Revision: D22190955 Pulled By: IvanKobzarev fbshipit-source-id: 9379458d8ab024ee991ca205a573c21d649e5f8a	2020-06-23 16:41:12 -07:00
Ashkan Aliabadi	b9d3869df3	Unify PyTorch mobile's threadpool usage. (#37243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37243 * Why * As it stands, we have two thread pool solutions concurrently in use in PyTorch mobile: (1) the open source pthreadpool library under third_party, and (2) Caffe2's implementation of pthreadpool under caffe2/utils/threadpool. Since the primary use-case of the latter has been to act as a drop-in replacement for the third party version so as to enable integration and usage from within NNPACK and QNNPACK, Caffe2's implementation is intentionally written to the exact same interface as the third party version. The original argument in favor of C2's implementation has been improved performance as a result of using spin locks, as opposed to relinquishing the thread's time slot and putting it to sleep - a less expensive operation up to a point. That seems to have given C2's implementation the upper hand in performance, hence justifying the added maintenance complexity, until the third party version improved in parallel surpassing the efficiency of C2's implementation as I have verified in benchmarks. With that advantage gone, there is no reason to continue using C2's implementation in PyTorch mobile either from the perspective of performance or code hygiene. As a matter of fact, there is considerable performance benefit to be had as a result of using the third party version as it currently stands. This is a tricky change though, mainly because in order to avoid potential performance regressions, of which I have witnessed none but just in abundance of caution, we have decided to continue using the internal C2's implementation whenever building for Caffe2. Again, this is mainly to avoid potential performance regressions in production C2 use cases even if doing so results in reduced performance as far as I can tell. So to summarize, today, and as it currently stands, we are using C2's implementation for (1) NNPACK, (2) PyTorch QNNPACK, and (3) ATen parallel_for on mobile builds, while using the third party version of pthreadpool for XNNPACK as XNNPACK does not provide any build options to link against an external implementation unlike NNPACK and QNNPACK do. The goal of this PR then, is to unify all usage on mobile to the third party implementation both for improved performance and better code hygiene. This applies to PyTorch's use of NNPACK, QNNPACK, XNNPACK, and mobile's implementation of ATen parallel_for, all getting routed to the exact same third party implementation in this PR. Considering that NNPACK, QNNPACK, and XNNPACK are not mobile specific, these benefits carry over to non-mobile builds of PyTorch (but not Caffe2) as well. The implementation of ATen parallel_for on non-mobile builds remains unchanged. * How * This is where things get tricky. A good deal of the build system complexity in this PR arises from our desire to maintain C2's implementation intact for C2's use. pthreadpool is a C library with no concept of namespaces, which means two copies of the library cannot exist in the same binary or symbol collision will occur violating ODR. This means that somehow, and based on some condition, we must decide on the choice of a pthreadpool implementation. In practice, this has become more complicated as a result of all the possible combinations that USE_NNPACK, USE_QNNPACK, USE_PYTORCH_QNNPACK, USE_XNNPACK, USE_SYSTEM_XNNPACK, USE_SYSTEM_PTHREADPOOL and other variables can result in. Having said that, I have done my best in this PR to surgically cut through this complexity in a way that minimizes the side effects, considering the significance of the performance we are leaving on the table, yet, as a result of this combinatorial explosion explained above I cannot guarantee that every single combination will work as expected on the first try. I am heavily relying on CI to find any issues as local testing can only go that far. Having said that, this PR provides a simple non mobile-specific C++ thread pool implementation on top of pthreadpool, namely caffe2::PThreadPool that automatically routes to C2's implementation or the third party version depending on the build configuration. This simplifies the logic at the cost of pushing the complexity to the build scripts. From there on, this thread pool is used in aten parallel_for, and NNPACK and family, again, routing all usage of threading to C2 or third party pthreadpool depending on the build configuration. When it is all said or done, the layering will look like this: a) aten::parallel_for, uses b) caffe2::PThreadPool, which uses c) pthreadpool C API, which delegates to c-1) third_party implementation of pthreadpool if that's what the build has requested, and the rabbit hole ends here. c-2) C2's implementation of pthreadpool if that's what the build has requested, which itself delegates to c-2-1) caffe2::ThreadPool, and the rabbit hole ends here. NNPACK, and (PyTorch) QNNPACK directly hook into (c). They never go through (b). Differential Revision: D21232894 Test Plan: Imported from OSS Reviewed By: dreiss Pulled By: AshkanAliabadi fbshipit-source-id: 8b3de86247fbc3a327e811983e082f9d40081354	2020-06-23 16:34:51 -07:00
Xiang Gao	c7d79f35e3	Header rename complex_type.h -> complex.h (#39885 ) Summary: This file should have been renamed as `complex.h`, but unfortunately, it was named as `complex_type.h` due to a name clash with FBCode. Is this still the case and is it easy to resolve the name clash? Maybe related to the comment at https://github.com/pytorch/pytorch/pull/39834#issuecomment-642950012 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39885 Differential Revision: D22018575 Pulled By: ezyang fbshipit-source-id: e237ccedbe2b30c31aca028a5b4c8c063087a30f	2020-06-23 16:27:09 -07:00
Edward Yang	111b399c91	Delete requires_tensor (#40184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40184 Whenever requires_tensor is True, it is also the case that abstract is true. Thus, it is not necessary to specify requires_tensor. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D22187353 Pulled By: ezyang fbshipit-source-id: d665bb69cffe491bd989495020e1ae32340aa9da	2020-06-23 16:18:28 -07:00
Edward Yang	cc9075c5d4	Add some syntax sugar for when backends use the same function. (#40182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40182 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D22187354 Pulled By: ezyang fbshipit-source-id: 875a6a7837981b60830bd7b1c35d2a3802ed7dd7	2020-06-23 16:16:42 -07:00
Sebastian Messmer	d8ec19bc03	Revert D22072830: [wip] Upgrade msvc to 14.13 Test Plan: revert-hammer Differential Revision: D22072830 Original commit changeset: 6fa03725f3fe fbshipit-source-id: 901de185e607810cb3871c2e4d23816848c97f4b	2020-06-23 16:13:03 -07:00
Sebastian Messmer	581ad48806	Revert D21581908: Move TensorOptions ops to c10 Test Plan: revert-hammer Differential Revision: D21581908 Original commit changeset: 6d4a9f526fd7 fbshipit-source-id: fe1e6368a09120ea40dea405e8409983541e3cb5	2020-06-23 16:10:07 -07:00
Jerry Zhang	cbd53bfee8	[jit] Remove unnecessary clone APIs for script::Module and RecursiveScriptModule (#40297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40297 Test Plan: Imported from OSS Differential Revision: D22191660 fbshipit-source-id: 4b338ca82caaca04784bffe01fdae3d180c192f4	2020-06-23 16:03:22 -07:00
Elias Ellison	8c20fb6481	[JIT] freeze doc (#40409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40409 Reviewed By: ezyang Differential Revision: D22192709 Pulled By: eellison fbshipit-source-id: 68cdb2e5040d31957fbd64690fdc03c058d13f9a	2020-06-23 15:44:03 -07:00
anjali411	09285070a7	Doc fix for complex views (#40450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40450 Test Plan: Imported from OSS Differential Revision: D22190911 Pulled By: anjali411 fbshipit-source-id: eb13559c7a2f62d63344601c750b5715686e95c3	2020-06-23 15:03:22 -07:00
Meghan Lele	5fce7137a9	[WIP][JIT] Add ScriptModule._reconstruct (#39979 ) Summary: Summary This commit adds an instance method `_reconstruct` that permits users to reconstruct a `ScriptModule` from a given C++ `Module` instance. Testing This commit adds a unit test for `_reconstruct`. Fixes This pull request fixes https://github.com/pytorch/pytorch/issues/33912. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39979 Differential Revision: D22172323 Pulled By: SplitInfinity fbshipit-source-id: 9aa6551c422a5a324b822a09cd8d7c660f99ca5c	2020-06-23 14:42:27 -07:00
Neha Shah	5ad885b823	[Caffe2][Pruning] Make the caffe2 Sum operator support long types (#40379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40379 The current sum operator doesn't support Long .. hence modify the code Test Plan: Write a test case Reviewed By: jspark1105, yinghai Differential Revision: D21917365 fbshipit-source-id: b37d2c100c70d17d2f89c309e40360ddfab584ee	2020-06-23 14:18:29 -07:00
Sebastian Messmer	b623bdeabb	Move TensorOptions ops to c10 (#39492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39492 This PR adds use_c10_dispatcher: full to ops taking TensorOptions. To allow this, since the c10 operator library doesn't know about TensorOptions, we need to register the operator kernels as optional<ScalarType>, optional<Device>, optional<Layout>, optional<bool> instead, and also call them this way. Changes: Add use_c10_dispatcher: full to those ops Write hacky_wrapper_for_legacy_signatures which takes an old-style kernel (i.e. one written to take TensorOptions) an creates a wrapper kernel for it that takes the scattered optional<ScalarType>, optional<Device>, optional<Layout>, optional<bool> instead. Change codegen so that all op registrations are wrapped into hacky_wrapper_for_legacy_signatures. This is added to all ops but is a no-op if the op doesn't take TensorOptions. This allows us in the future to just change a kernel signature from TensorOptions to the scattered version and have it work without having to touch codegen. Change codegen so that the frontend calls those operators with expanded arguments instead of with a TensorOptions object. This is required because now the kernels are written in this way. This PR does not remove TensorOptions special cases from codegen, but instead it separates kernels from the codegen/frontend issues. After this, kernels can be worked on separately without having to touch codegen and codegen can be worked on without having to touch kernels. Codegen diff: P133121032 ghstack-source-id: 106426630 Test Plan: waitforsandcastle Differential Revision: D21581908 fbshipit-source-id: 6d4a9f526fd70fae40581bf26f3ccf794ce6a89e	2020-06-23 14:13:34 -07:00
Ram Rachum	f6b9848c25	Use chain.from_iterable in optimizer.py (#40156 ) Summary: This is a faster and more idiomatic way of using `itertools.chain`. Instead of computing all the items in the iterable and storing them in memory, they are computed one-by-one and never stored as a huge list. This can save on both runtime and memory space. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40156 Reviewed By: ezyang Differential Revision: D22189038 Pulled By: vincentqb fbshipit-source-id: 160b2c27f442686821a6ea541e1f48f4a846c186	2020-06-23 14:07:05 -07:00
Zino Benaissa	0e074074f3	Disable inlining an opaque tensor into a constant (#40367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40367 If the tensor has no storage then do not inline as a constant. This situation when Mkldnn tensors are used. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D22158240 Pulled By: bzinodev fbshipit-source-id: 8d2879044f2429004983a1242d837367b75a9f2a	2020-06-23 13:28:31 -07:00
Elias Ellison	f000b44d89	Fork/Join Inline Docs (relanding) (#40438 ) Summary: Added fork/wait to docs/source/jit.rst, hopefully that will fix test error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40438 Differential Revision: D22188152 Pulled By: eellison fbshipit-source-id: c19277284455fb6e7c0138b0c1423d90b147d18e	2020-06-23 13:25:51 -07:00
Sebastian Messmer	d21ee2de66	[wip] Upgrade msvc to 14.13 (#40109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40109 ghstack-source-id: 106426627 Test Plan: oss CI Differential Revision: D22072830 fbshipit-source-id: 6fa03725f3fe272795553c9c4acf46130b8c6039	2020-06-23 13:05:36 -07:00

4632 changed files with 529495 additions and 129713 deletions

									
										39

.circleci/README.md
									
												View File
												
				@ -31,7 +31,7 @@ Usage

				1. Make changes to these scripts.

				2. Run the `regenerate.sh` script in this directory and commit the script changes and the resulting change to `config.yml`.

				You'll see a build failure on TravisCI if the scripts don't agree with the checked-in version.

				You'll see a build failure on GitHub if the scripts don't agree with the checked-in version.

				Motivation

				@ -55,7 +55,7 @@ Future direction

				See comment [here](https://github.com/pytorch/pytorch/pull/17323#pullrequestreview-206945747):

				In contrast with a full recursive tree traversal of configuration dimensions,

				> in the future future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.

				> in the future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.

				----------------

				----------------

				@ -90,7 +90,7 @@ The binaries are built in CircleCI. There are nightly binaries built every night

				We have 3 types of binary packages

				* pip packages - nightlies are stored on s3 (pip install -f <a s3 url>). releases are stored in a pip repo (pip install torch) (ask Soumith about this)

				* pip packages - nightlies are stored on s3 (pip install -f \<a s3 url\>). releases are stored in a pip repo (pip install torch) (ask Soumith about this)

				* conda packages - nightlies and releases are both stored in a conda repo. Nighty packages have a '_nightly' suffix

				* libtorch packages - these are zips of all the c++ libraries, header files, and sometimes dependencies. These are c++ only

				    * shared with dependencies (the only supported option for Windows)

				@ -104,16 +104,16 @@ All binaries are built in CircleCI workflows except Windows. There are checked-i

				Some quick vocab:

				* A\**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on\https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows.

				* A \**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows.

				* **jobs** are a sequence of '**steps**'

				* **steps** are usually just a bash script or a builtin CircleCI command.* All steps run in new environments, environment variables declared in one script DO NOT persist to following steps*

				* **steps** are usually just a bash script or a builtin CircleCI command. *All steps run in new environments, environment variables declared in one script DO NOT persist to following steps*

				* CircleCI has a **workspace**, which is essentially a cache between steps of the *same job* in which you can store artifacts between steps.

				## How are the workflows structured?

				The nightly binaries have 3 workflows. We have one job (actually 3 jobs:  build, test, and upload) per binary configuration

				1. binarybuilds

				1. binary_builds

				    1. every day midnight EST

				    2. linux: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml

				    3. macos: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml

				@ -144,7 +144,7 @@ The nightly binaries have 3 workflows. We have one job (actually 3 jobs:  build,

				## How are the jobs structured?

				The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources . Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts .

				The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources. Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts .

				* Linux jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml

				    * binary_linux_build.sh

				@ -178,8 +178,7 @@ CircleCI creates a  final yaml file by inlining every <<* segment, so if we were

				So, CircleCI has several executor types: macos, machine, and docker are the ones we use. The 'machine' executor gives you two cores on some linux vm. The 'docker' executor gives you considerably more cores (nproc was 32 instead of 2 back when I tried in February). Since the dockers are faster, we try to run everything that we can in dockers. Thus

				* linux build jobs use the docker executor. Running them on the docker executor was at least 2x faster than running them on the machine executor

				* linux test jobs use the machine executor and spin up their own docker. Why this nonsense? It's cause we run nvidia-docker for our GPU tests; any code that calls into the CUDA runtime needs to be run on nvidia-docker. To run a nvidia-docker you need to install some nvidia packages on the host machine and then call docker with the '—runtime nvidia' argument. CircleCI doesn't support this, so we have to do it ourself.

				    * This is not just a mere inconvenience. **This blocks all of our linux tests from using more than 2 cores.** But there is nothing that we can do about it, but wait for a fix on circleci's side. Right now, we only run some smoke tests (some simple imports) on the binaries, but this also affects non-binary test jobs.

				* linux test jobs use the machine executor in order for them to properly interface with GPUs since docker executors cannot execute with attached GPUs

				* linux upload jobs use the machine executor. The upload jobs are so short that it doesn't really matter what they use

				* linux smoke test jobs use the machine executor for the same reason as the linux test jobs

				@ -205,7 +204,7 @@ TODO: fill in stuff

				## Overview

				The code that runs the binaries lives in two places, in the normal [github.com/pytorch/pytorch](http://github.com/pytorch/pytorch), but also in [github.com/pytorch/builder](http://github.com/pytorch/builder) , which is a repo that defines how all the binaries are built. The relevant code is

				The code that runs the binaries lives in two places, in the normal [github.com/pytorch/pytorch](http://github.com/pytorch/pytorch), but also in [github.com/pytorch/builder](http://github.com/pytorch/builder), which is a repo that defines how all the binaries are built. The relevant code is

				```

				@ -261,7 +260,7 @@ Linux, MacOS and Windows use the same code flow for the conda builds.

				Conda packages are built with conda-build, see https://conda.io/projects/conda-build/en/latest/resources/commands/conda-build.html

				Basically, you pass `conda build` a build folder (pytorch-nightly/ above) that contains a build script and a meta.yaml. The meta.yaml specifies in what python environment to build the package in, and what dependencies the resulting package should have, and the build script gets called in the env to build the thing.

				tldr; on conda-build is

				tl;dr on conda-build is

				1. Creates a brand new conda environment, based off of deps in the meta.yaml

				    1. Note that environment variables do not get passed into this build env unless they are specified in the meta.yaml

				@ -271,7 +270,7 @@ tldr; on conda-build is

				4. Runs some simple import tests (if specified in the meta.yaml)

				5. Saves the finished package as a tarball

				The build.sh we use is essentially a wrapper around ```python setup.py build``` , but it also manually copies in some of our dependent libraries into the resulting tarball and messes with some rpaths.

				The build.sh we use is essentially a wrapper around `python setup.py build`, but it also manually copies in some of our dependent libraries into the resulting tarball and messes with some rpaths.

				The entrypoint file `builder/conda/build_conda.sh` is complicated because

				@ -356,15 +355,15 @@ The Dockerfiles are available in pytorch/builder, but there is no circleci job o

				# How to manually rebuild the binaries

				tldr; make a PR that looks like https://github.com/pytorch/pytorch/pull/21159

				tl;dr make a PR that looks like https://github.com/pytorch/pytorch/pull/21159

				Sometimes we want to push a change to master and then rebuild all of today's binaries after that change. As of May 30, 2019 there isn't a way to manually run a workflow in the UI. You can manually re-run a workflow, but it will use the exact same git commits as the first run and will not include any changes. So we have to make a PR and then force circleci to run the binary workflow instead of the normal tests. The above PR is an example of how to do this; essentially you copy-paste the binarybuilds workflow steps into the default workflow steps. If you need to point the builder repo to a different commit then you'd need to change https://github.com/pytorch/pytorch/blob/master/.circleci/scripts/binary_checkout.sh#L42-L45 to checkout what you want.

				## How to test changes to the binaries via .circleci

				Writing PRs that test the binaries is annoying, since the default circleci jobs that run on PRs are not the jobs that you want to run. Likely, changes to the binaries will touch something under .circleci/ and require that .circleci/config.yml be regenerated (.circleci/config.yml controls all .circleci behavior, and is generated using ```.circleci/regenerate.sh``` in python 3.7). But you also need to manually hardcode the binary jobs that you want to test into the .circleci/config.yml workflow, so you should actually make at least two commits, one for your changes and one to temporarily hardcode jobs. See https://github.com/pytorch/pytorch/pull/22928 as an example of how to do this.

				Writing PRs that test the binaries is annoying, since the default circleci jobs that run on PRs are not the jobs that you want to run. Likely, changes to the binaries will touch something under .circleci/ and require that .circleci/config.yml be regenerated (.circleci/config.yml controls all .circleci behavior, and is generated using `.circleci/regenerate.sh` in python 3.7). But you also need to manually hardcode the binary jobs that you want to test into the .circleci/config.yml workflow, so you should actually make at least two commits, one for your changes and one to temporarily hardcode jobs. See https://github.com/pytorch/pytorch/pull/22928 as an example of how to do this.

				```

				```sh

				# Make your changes

				touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml

				@ -409,7 +408,7 @@ The advantage of this flow is that you can make new changes to the base commit a

				You can build Linux binaries locally easily using docker.

				```

				```sh

				# Run the docker

				# Use the correct docker image, pytorch/conda-cuda used here as an example

				#

				@ -419,8 +418,6 @@ You can build Linux binaries locally easily using docker.

				#    in the docker container then you will see path/to/foo/baz on your local

				#    machine. You could also clone the pytorch and builder repos in the docker.

				#

				# If you're building a CUDA binary then use `nvidia-docker run` instead, see below.

				#

				# If you know how, add ccache as a volume too and speed up everything

				docker run \

				    -v your/pytorch/repo:/pytorch \

				@ -444,9 +441,7 @@ export DESIRED_CUDA=cpu

				**Building CUDA binaries on docker**

				To build a CUDA binary you need to use `nvidia-docker run` instead of just `docker run` (or you can manually pass `--runtime=nvidia`). This adds some needed libraries and things to build CUDA stuff.

				You can build CUDA binaries on CPU only machines, but you can only run CUDA binaries on CUDA machines. This means that you can build a CUDA binary on a docker on your laptop if you so choose (though it’s gonna take a loong time).

				You can build CUDA binaries on CPU only machines, but you can only run CUDA binaries on CUDA machines. This means that you can build a CUDA binary on a docker on your laptop if you so choose (though it’s gonna take a long time).

				For Facebook employees, ask about beefy machines that have docker support and use those instead of your laptop; it will be 5x as fast.

				@ -456,7 +451,7 @@ There’s no easy way to generate reproducible hermetic MacOS environments. If y

				But if you want to try, then I’d recommend

				```

				```sh

				# Create a new terminal

				# Clear your LD_LIBRARY_PATH and trim as much out of your PATH as you

				# know how to do

									
										60

.circleci/cimodel/data/binary_build_data.py
									
												View File
												
				@ -25,15 +25,17 @@ DEPS_INCLUSION_DIMENSIONS = [

				]

				def get_processor_arch_name(cuda_version):

				    return "cpu" if not cuda_version else "cu" + cuda_version

				def get_processor_arch_name(gpu_version):

				    return "cpu" if not gpu_version else (

				        "cu" + gpu_version.strip("cuda") if gpu_version.startswith("cuda") else gpu_version

				    )

				LINUX_PACKAGE_VARIANTS = OrderedDict(

				    manywheel=[

				        "3.6m",

				        "3.7m",

				        "3.8m",

				        "3.9m"

				    ],

				    conda=dimensions.STANDARD_PYTHON_VERSIONS,

				    libtorch=[

				@ -42,7 +44,7 @@ LINUX_PACKAGE_VARIANTS = OrderedDict(

				)

				CONFIG_TREE_DATA = OrderedDict(

				    linux=(dimensions.CUDA_VERSIONS, LINUX_PACKAGE_VARIANTS),

				    linux=(dimensions.GPU_VERSIONS, LINUX_PACKAGE_VARIANTS),

				    macos=([None], OrderedDict(

				        wheel=dimensions.STANDARD_PYTHON_VERSIONS,

				        conda=dimensions.STANDARD_PYTHON_VERSIONS,

				@ -50,13 +52,25 @@ CONFIG_TREE_DATA = OrderedDict(

				            "3.7",

				        ],

				    )),

				    windows=(dimensions.CUDA_VERSIONS, OrderedDict(

				        wheel=dimensions.STANDARD_PYTHON_VERSIONS,

				        conda=dimensions.STANDARD_PYTHON_VERSIONS,

				        libtorch=[

				            "3.7",

				    macos_arm64=([None], OrderedDict(

				        wheel=[

				            "3.8",

				        ],

				        conda=[

				            "3.8",

				        ],

				    )),

				    # Skip CUDA-9.2 builds on Windows

				    windows=(

				        [v for v in dimensions.GPU_VERSIONS if v not in ['cuda92'] + dimensions.ROCM_VERSION_LABELS],

				        OrderedDict(

				            wheel=dimensions.STANDARD_PYTHON_VERSIONS,

				            conda=dimensions.STANDARD_PYTHON_VERSIONS,

				            libtorch=[

				                "3.7",

				            ],

				        )

				    ),

				)

				# GCC config variants:

				@ -93,12 +107,12 @@ class TopLevelNode(ConfigNode):

				class OSConfigNode(ConfigNode):

				    def __init__(self, parent, os_name, cuda_versions, py_tree):

				    def __init__(self, parent, os_name, gpu_versions, py_tree):

				        super(OSConfigNode, self).__init__(parent, os_name)

				        self.py_tree = py_tree

				        self.props["os_name"] = os_name

				        self.props["cuda_versions"] = cuda_versions

				        self.props["gpu_versions"] = gpu_versions

				    def get_children(self):

				        return [PackageFormatConfigNode(self, k, v) for k, v in self.py_tree.items()]

				@ -117,7 +131,7 @@ class PackageFormatConfigNode(ConfigNode):

				        elif self.find_prop("os_name") == "windows" and self.find_prop("package_format") == "libtorch":

				            return [WindowsLibtorchConfigNode(self, v) for v in WINDOWS_LIBTORCH_CONFIG_VARIANTS]

				        else:

				            return [ArchConfigNode(self, v) for v in self.find_prop("cuda_versions")]

				            return [ArchConfigNode(self, v) for v in self.find_prop("gpu_versions")]

				class LinuxGccConfigNode(ConfigNode):

				@ -127,14 +141,22 @@ class LinuxGccConfigNode(ConfigNode):

				        self.props["gcc_config_variant"] = gcc_config_variant

				    def get_children(self):

				        cuda_versions = self.find_prop("cuda_versions")

				        gpu_versions = self.find_prop("gpu_versions")

				        # XXX devtoolset7 on CUDA 9.0 is temporarily disabled

				        # see https://github.com/pytorch/pytorch/issues/20066

				        if self.find_prop("gcc_config_variant") == 'devtoolset7':

				            cuda_versions = filter(lambda x: x != "90", cuda_versions)

				            gpu_versions = filter(lambda x: x != "cuda_90", gpu_versions)

				        return [ArchConfigNode(self, v) for v in cuda_versions]

				        # XXX disabling conda rocm build since docker images are not there

				        if self.find_prop("package_format") == 'conda':

				            gpu_versions = filter(lambda x: x not in dimensions.ROCM_VERSION_LABELS, gpu_versions)

				        # XXX libtorch rocm build  is temporarily disabled

				        if self.find_prop("package_format") == 'libtorch':

				            gpu_versions = filter(lambda x: x not in dimensions.ROCM_VERSION_LABELS, gpu_versions)

				        return [ArchConfigNode(self, v) for v in gpu_versions]

				class WindowsLibtorchConfigNode(ConfigNode):

				@ -144,14 +166,14 @@ class WindowsLibtorchConfigNode(ConfigNode):

				        self.props["libtorch_config_variant"] = libtorch_config_variant

				    def get_children(self):

				        return [ArchConfigNode(self, v) for v in self.find_prop("cuda_versions")]

				        return [ArchConfigNode(self, v) for v in self.find_prop("gpu_versions")]

				class ArchConfigNode(ConfigNode):

				    def __init__(self, parent, cu):

				        super(ArchConfigNode, self).__init__(parent, get_processor_arch_name(cu))

				    def __init__(self, parent, gpu):

				        super(ArchConfigNode, self).__init__(parent, get_processor_arch_name(gpu))

				        self.props["cu"] = cu

				        self.props["gpu"] = gpu

				    def get_children(self):

				        return [PyVersionConfigNode(self, v) for v in self.find_prop("python_versions")]

									
										101

.circleci/cimodel/data/binary_build_definitions.py
									
												View File
												
				@ -6,10 +6,10 @@ import cimodel.lib.conf_tree as conf_tree

				import cimodel.lib.miniutils as miniutils

				class Conf(object):

				    def __init__(self, os, cuda_version, pydistro, parms, smoke, libtorch_variant, gcc_config_variant, libtorch_config_variant):

				    def __init__(self, os, gpu_version, pydistro, parms, smoke, libtorch_variant, gcc_config_variant, libtorch_config_variant):

				        self.os = os

				        self.cuda_version = cuda_version

				        self.gpu_version = gpu_version

				        self.pydistro = pydistro

				        self.parms = parms

				        self.smoke = smoke

				@ -18,7 +18,7 @@ class Conf(object):

				        self.libtorch_config_variant = libtorch_config_variant

				    def gen_build_env_parms(self):

				        elems = [self.pydistro] + self.parms + [binary_build_data.get_processor_arch_name(self.cuda_version)]

				        elems = [self.pydistro] + self.parms + [binary_build_data.get_processor_arch_name(self.gpu_version)]

				        if self.gcc_config_variant is not None:

				            elems.append(str(self.gcc_config_variant))

				        if self.libtorch_config_variant is not None:

				@ -37,9 +37,12 @@ class Conf(object):

				        docker_distro_prefix = miniutils.override(self.pydistro, docker_word_substitution)

				        # The cpu nightlies are built on the pytorch/manylinux-cuda102 docker image

				        alt_docker_suffix = self.cuda_version or "102"

				        docker_distro_suffix = "" if self.pydistro == "conda" else alt_docker_suffix

				        return miniutils.quote("pytorch/" + docker_distro_prefix + "-cuda" + docker_distro_suffix)

				        # TODO cuda images should consolidate into tag-base images similar to rocm

				        alt_docker_suffix = "cuda102" if not self.gpu_version else (

				            "rocm:" + self.gpu_version.strip("rocm") if self.gpu_version.startswith("rocm") else self.gpu_version)

				        docker_distro_suffix = alt_docker_suffix if self.pydistro != "conda" else (

				            "cuda" if alt_docker_suffix.startswith("cuda") else "rocm")

				        return miniutils.quote("pytorch/" + docker_distro_prefix + "-" + docker_distro_suffix)

				    def get_name_prefix(self):

				        return "smoke" if self.smoke else "binary"

				@ -69,14 +72,10 @@ class Conf(object):

				                "update_s3_htmls",

				            ]

				            job_def["filters"] = branch_filters.gen_filter_dict(

				                branches_list=["nightly"],

				                tags_list=[branch_filters.RC_PATTERN],

				                branches_list=["postnightly"],

				            )

				        else:

				            if phase in ["upload"]:

				                filter_branch = "nightly"

				            else:

				                filter_branch = r"/.*/"

				            filter_branch = r"/.*/"

				            job_def["filters"] = branch_filters.gen_filter_dict(

				                branches_list=[filter_branch],

				                tags_list=[branch_filters.RC_PATTERN],

				@ -89,28 +88,61 @@ class Conf(object):

				            if not (self.smoke and self.os == "macos") and self.os != "windows":

				                job_def["docker_image"] = self.gen_docker_image()

				            if self.os != "windows" and self.cuda_version:

				            # fix this. only works on cuda not rocm

				            if self.os != "windows" and self.gpu_version:

				                job_def["use_cuda_docker_runtime"] = miniutils.quote("1")

				        else:

				            if self.os == "linux" and phase != "upload":

				                job_def["docker_image"] = self.gen_docker_image()

				        if phase == "test":

				            if self.cuda_version:

				            if self.gpu_version:

				                if self.os == "windows":

				                    job_def["executor"] = "windows-with-nvidia-gpu"

				                else:

				                    job_def["resource_class"] = "gpu.medium"

				        if phase == "upload":

				            job_def["context"] = "org-member"

				            job_def["requires"] = [

				                self.gen_build_name(upload_phase_dependency, nightly)

				            ]

				        os_name = miniutils.override(self.os, {"macos": "mac"})

				        job_name = "_".join([self.get_name_prefix(), os_name, phase])

				        return {job_name : job_def}

				    def gen_upload_job(self, phase, requires_dependency):

				        """Generate binary_upload job for configuration

				        Output looks similar to:

				      - binary_upload:

				          name: binary_linux_manywheel_3_7m_cu92_devtoolset7_nightly_upload

				          context: org-member

				          requires: binary_linux_manywheel_3_7m_cu92_devtoolset7_nightly_test

				          filters:

				            branches:

				              only:

				                - nightly

				            tags:

				              only: /v[0-9]+(\\.[0-9]+)*-rc[0-9]+/

				          package_type: manywheel

				          upload_subfolder: cu92

				        """

				        return {

				            "binary_upload": OrderedDict({

				                "name": self.gen_build_name(phase, nightly=True),

				                "context": "org-member",

				                "requires": [self.gen_build_name(

				                    requires_dependency,

				                    nightly=True

				                )],

				                "filters": branch_filters.gen_filter_dict(

				                    branches_list=["nightly"],

				                    tags_list=[branch_filters.RC_PATTERN],

				                ),

				                "package_type": self.pydistro,

				                "upload_subfolder": binary_build_data.get_processor_arch_name(

				                    self.gpu_version,

				                ),

				            })

				        }

				def get_root(smoke, name):

				    return binary_build_data.TopLevelNode(

				@ -129,10 +161,10 @@ def gen_build_env_list(smoke):

				    for c in config_list:

				        conf = Conf(

				            c.find_prop("os_name"),

				            c.find_prop("cu"),

				            c.find_prop("gpu"),

				            c.find_prop("package_format"),

				            [c.find_prop("pyver")],

				            c.find_prop("smoke"),

				            c.find_prop("smoke") and not (c.find_prop("os_name") == "macos_arm64"),  # don't test arm64

				            c.find_prop("libtorch_variant"),

				            c.find_prop("gcc_config_variant"),

				            c.find_prop("libtorch_config_variant"),

				@ -149,32 +181,19 @@ def get_nightly_uploads():

				    mylist = []

				    for conf in configs:

				        phase_dependency = "test" if predicate_exclude_macos(conf) else "build"

				        mylist.append(conf.gen_workflow_job("upload", phase_dependency, nightly=True))

				        mylist.append(conf.gen_upload_job("upload", phase_dependency))

				    return mylist

				def get_post_upload_jobs():

				    """Generate jobs to update HTML indices and report binary sizes"""

				    configs = gen_build_env_list(False)

				    common_job_def = {

				        "context": "org-member",

				        "filters": branch_filters.gen_filter_dict(

				            branches_list=["nightly"],

				            tags_list=[branch_filters.RC_PATTERN],

				        ),

				        "requires": [],

				    }

				    for conf in configs:

				        upload_job_name = conf.gen_build_name(

				            build_or_test="upload",

				            nightly=True

				        )

				        common_job_def["requires"].append(upload_job_name)

				    return [

				        {

				            "update_s3_htmls": {

				                "name": "update_s3_htmls",

				                **common_job_def,

				                "context": "org-member",

				                "filters": branch_filters.gen_filter_dict(

				                    branches_list=["postnightly"],

				                ),

				            },

				        },

				    ]

				@ -197,7 +216,9 @@ def get_jobs(toplevel_key, smoke):

				    configs = gen_build_env_list(smoke)

				    phase = "build" if toplevel_key == "binarybuilds" else "test"

				    for build_config in configs:

				        jobs_list.append(build_config.gen_workflow_job(phase, nightly=True))

				        # don't test for macos_arm64 as it's cross compiled

				        if phase != "test" or build_config.os != "macos_arm64":

				            jobs_list.append(build_config.gen_workflow_job(phase, nightly=True))

				    return jobs_list

									
										91

.circleci/cimodel/data/caffe2_build_data.py
									
												View File
											
				@ -1,91 +0,0 @@

				from cimodel.lib.conf_tree import ConfigNode, XImportant

				from cimodel.lib.conf_tree import Ver

				CONFIG_TREE_DATA = [

				    (Ver("ubuntu", "16.04"), [

				        ([Ver("clang", "7")], [XImportant("onnx_main_py3.6"),

				                               XImportant("onnx_ort1_py3.6"),

				                               XImportant("onnx_ort2_py3.6")]),

				    ]),

				]

				class TreeConfigNode(ConfigNode):

				    def __init__(self, parent, node_name, subtree):

				        super(TreeConfigNode, self).__init__(parent, self.modify_label(node_name))

				        self.subtree = subtree

				        self.init2(node_name)

				    # noinspection PyMethodMayBeStatic

				    def modify_label(self, label):

				        return str(label)

				    def init2(self, node_name):

				        pass

				    def get_children(self):

				        return [self.child_constructor()(self, k, v) for (k, v) in self.subtree]

				    def is_build_only(self):

				        if str(self.find_prop("language_version")) == "onnx_main_py3.6" or \

				                str(self.find_prop("language_version")) == "onnx_ort1_py3.6" or \

				                str(self.find_prop("language_version")) == "onnx_ort2_py3.6":

				            return False

				        return set(str(c) for c in self.find_prop("compiler_version")).intersection({

				            "clang3.8",

				            "clang3.9",

				            "clang7",

				            "android",

				        }) or self.find_prop("distro_version").name == "macos"

				    def is_test_only(self):

				        if str(self.find_prop("language_version")) == "onnx_ort1_py3.6" or \

				                str(self.find_prop("language_version")) == "onnx_ort2_py3.6":

				            return True

				        return False

				class TopLevelNode(TreeConfigNode):

				    def __init__(self, node_name, subtree):

				        super(TopLevelNode, self).__init__(None, node_name, subtree)

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return DistroConfigNode

				class DistroConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["distro_version"] = node_name

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return CompilerConfigNode

				class CompilerConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["compiler_version"] = node_name

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return LanguageConfigNode

				class LanguageConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["language_version"] = node_name

				        self.props["build_only"] = self.is_build_only()

				        self.props["test_only"] = self.is_test_only()

				    def child_constructor(self):

				        return ImportantConfigNode

				class ImportantConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["important"] = True

				    def get_children(self):

				        return []

									
										174

.circleci/cimodel/data/caffe2_build_definitions.py
									
												View File
											
				@ -1,174 +0,0 @@

				from collections import OrderedDict

				import cimodel.data.dimensions as dimensions

				import cimodel.lib.conf_tree as conf_tree

				from cimodel.lib.conf_tree import Ver

				import cimodel.lib.miniutils as miniutils

				from cimodel.data.caffe2_build_data import CONFIG_TREE_DATA, TopLevelNode

				from cimodel.data.simple.util.branch_filters import gen_filter_dict

				from dataclasses import dataclass

				DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/"

				DOCKER_IMAGE_VERSION = "376"

				@dataclass

				class Conf:

				    language: str

				    distro: Ver

				    # There could be multiple compiler versions configured (e.g. nvcc

				    # for gpu files and host compiler (gcc/clang) for cpu files)

				    compilers: [Ver]

				    build_only: bool

				    test_only: bool

				    is_important: bool

				    @property

				    def compiler_names(self):

				        return [c.name for c in self.compilers]

				    # TODO: Eventually we can probably just remove the cudnn7 everywhere.

				    def get_cudnn_insertion(self):

				        omit = self.language == "onnx_main_py3.6" \

				            or self.language == "onnx_ort1_py3.6" \

				            or self.language == "onnx_ort2_py3.6" \

				            or set(self.compiler_names).intersection({"android", "mkl", "clang"}) \

				            or str(self.distro) in ["ubuntu14.04", "macos10.13"]

				        return [] if omit else ["cudnn7"]

				    def get_build_name_root_parts(self):

				        return [

				            "caffe2",

				            self.language,

				        ] + self.get_build_name_middle_parts()

				    def get_build_name_middle_parts(self):

				        return [str(c) for c in self.compilers] + self.get_cudnn_insertion() + [str(self.distro)]

				    def construct_phase_name(self, phase):

				        root_parts = self.get_build_name_root_parts()

				        build_name_substitutions = {

				            "onnx_ort1_py3.6": "onnx_main_py3.6",

				            "onnx_ort2_py3.6": "onnx_main_py3.6",

				        }

				        if phase == "build":

				            root_parts = [miniutils.override(r, build_name_substitutions) for r in root_parts]

				        return "_".join(root_parts + [phase]).replace(".", "_")

				    def get_platform(self):

				        platform = self.distro.name

				        if self.distro.name != "macos":

				            platform = "linux"

				        return platform

				    def gen_docker_image(self):

				        lang_substitutions = {

				            "onnx_main_py3.6": "py3.6",

				            "onnx_ort1_py3.6": "py3.6",

				            "onnx_ort2_py3.6": "py3.6",

				            "cmake": "py3",

				        }

				        lang = miniutils.override(self.language, lang_substitutions)

				        parts = [lang] + self.get_build_name_middle_parts()

				        return miniutils.quote(DOCKER_IMAGE_PATH_BASE + "-".join(parts) + ":" + str(DOCKER_IMAGE_VERSION))

				    def gen_workflow_params(self, phase):

				        parameters = OrderedDict()

				        lang_substitutions = {

				            "onnx_py3": "onnx-py3",

				            "onnx_main_py3.6": "onnx-main-py3.6",

				            "onnx_ort1_py3.6": "onnx-ort1-py3.6",

				            "onnx_ort2_py3.6": "onnx-ort2-py3.6",

				        }

				        lang = miniutils.override(self.language, lang_substitutions)

				        parts = [

				            "caffe2",

				            lang,

				        ] + self.get_build_name_middle_parts() + [phase]

				        build_env_name = "-".join(parts)

				        parameters["build_environment"] = miniutils.quote(build_env_name)

				        if "ios" in self.compiler_names:

				            parameters["build_ios"] = miniutils.quote("1")

				        if phase == "test":

				            # TODO cuda should not be considered a compiler

				            if "cuda" in self.compiler_names:

				                parameters["use_cuda_docker_runtime"] = miniutils.quote("1")

				        if self.distro.name != "macos":

				            parameters["docker_image"] = self.gen_docker_image()

				            if self.build_only:

				                parameters["build_only"] = miniutils.quote("1")

				        if phase == "test":

				            resource_class = "large" if "cuda" not in self.compiler_names else "gpu.medium"

				            parameters["resource_class"] = resource_class

				        return parameters

				    def gen_workflow_job(self, phase):

				        job_def = OrderedDict()

				        job_def["name"] = self.construct_phase_name(phase)

				        if phase == "test":

				            job_def["requires"] = [self.construct_phase_name("build")]

				            job_name = "caffe2_" + self.get_platform() + "_test"

				        else:

				            job_name = "caffe2_" + self.get_platform() + "_build"

				        if not self.is_important:

				            job_def["filters"] = gen_filter_dict()

				        job_def.update(self.gen_workflow_params(phase))

				        return {job_name : job_def}

				def get_root():

				    return TopLevelNode("Caffe2 Builds", CONFIG_TREE_DATA)

				def instantiate_configs():

				    config_list = []

				    root = get_root()

				    found_configs = conf_tree.dfs(root)

				    for fc in found_configs:

				        c = Conf(

				            language=fc.find_prop("language_version"),

				            distro=fc.find_prop("distro_version"),

				            compilers=fc.find_prop("compiler_version"),

				            build_only=fc.find_prop("build_only"),

				            test_only=fc.find_prop("test_only"),

				            is_important=fc.find_prop("important"),

				        )

				        config_list.append(c)

				    return config_list

				def get_workflow_jobs():

				    configs = instantiate_configs()

				    x = []

				    for conf_options in configs:

				        phases = ["build"]

				        if not conf_options.build_only:

				            phases = dimensions.PHASES

				        if conf_options.test_only:

				            phases = ["test"]

				        for phase in phases:

				            x.append(conf_options.gen_workflow_job(phase))

				    return x

									
										15

.circleci/cimodel/data/dimensions.py
									
												View File
												
				@ -1,14 +1,23 @@

				PHASES = ["build", "test"]

				CUDA_VERSIONS = [

				    None,  # cpu build

				    "92",

				    "101",

				    "102",

				    "111",

				]

				ROCM_VERSIONS = [

				    "3.10",

				    "4.0.1",

				]

				ROCM_VERSION_LABELS = ["rocm" + v for v in ROCM_VERSIONS]

				GPU_VERSIONS = [None] + ["cuda" + v for v in CUDA_VERSIONS] + ROCM_VERSION_LABELS

				STANDARD_PYTHON_VERSIONS = [

				    "3.6",

				    "3.7",

				    "3.8"

				    "3.8",

				    "3.9"

				]

									
										147

.circleci/cimodel/data/pytorch_build_data.py
									
												View File
												
				@ -3,15 +3,13 @@ from cimodel.lib.conf_tree import ConfigNode, X, XImportant

				CONFIG_TREE_DATA = [

				    ("xenial", [

				        (None, [

				            X("nightly"),

				        ]),

				        ("gcc", [

				            ("5.4", [  # All this subtree rebases to master and then build

				                XImportant("3.6"),

				                ("3.6", [

				                    ("important", [X(True)]),

				                    ("parallel_tbb", [X(True)]),

				                    ("parallel_native", [X(True)]),

				                    ("pure_torch", [X(True)]),

				                ]),

				            ]),

				            # TODO: bring back libtorch test

				@ -19,21 +17,54 @@ CONFIG_TREE_DATA = [

				        ]),

				        ("clang", [

				            ("5", [

				                XImportant("3.6"),  # This is actually the ASAN build

				                ("3.6", [

				                    ("asan", [

				                        (True, [

				                            ("shard_test", [XImportant(True)]),

				                        ]),

				                    ]),

				                ]),

				            ]),

				            ("7", [

				                ("3.6", [

				                    ("onnx", [XImportant(True)]),

				                ]),

				            ]),

				        ]),

				        ("cuda", [

				            ("9.2", [

				                X("3.6"),

				                ("3.6", [

				                    ("cuda_gcc_override", [X("gcc5.4")])

				                    X(True),

				                    ("cuda_gcc_override", [

				                        ("gcc5.4", [

				                            ('build_only', [XImportant(True)]),

				                        ]),

				                    ]),

				                ])

				            ]),

				            ("10.1", [X("3.6")]),

				            ("10.2", [

				                XImportant("3.6"),

				            ("10.1", [

				                ("3.6", [

				                    ("libtorch", [XImportant(True)])

				                    ('build_only', [X(True)]),

				                ]),

				            ]),

				            ("10.2", [

				                ("3.6", [

				                    ("shard_test", [XImportant(True)]),

				                    ("libtorch", [

				                        (True, [

				                            ('build_only', [X(True)]),

				                        ]),

				                    ]),

				                ]),

				            ]),

				            ("11.1", [

				                ("3.8", [

				                    X(True),

				                    ("libtorch", [

				                        (True, [

				                            ('build_only', [XImportant(True)]),

				                        ]),

				                    ]),

				                ]),

				            ]),

				        ]),

				@ -46,11 +77,27 @@ CONFIG_TREE_DATA = [

				            ("9", [

				                ("3.6", [

				                    ("xla", [XImportant(True)]),

				                    ("vulkan", [XImportant(True)]),

				                ]),

				            ]),

				        ]),

				        ("gcc", [

				            ("9", [XImportant("3.8")]),

				            ("9", [

				                ("3.8", [

				                    ("coverage", [

				                        (True, [

				                            ("shard_test", [XImportant(True)]),

				                        ]),

				                    ]),

				                ]),

				            ]),

				        ]),

				        ("rocm", [

				            ("3.9", [

				                ("3.6", [

				                    ('build_only', [XImportant(True)]),

				                ]),

				            ]),

				        ]),

				    ]),

				]

				@ -118,17 +165,34 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):

				        experimental_feature = self.find_prop("experimental_feature")

				        next_nodes = {

				            "asan": AsanConfigNode,

				            "xla": XlaConfigNode,

				            "vulkan": VulkanConfigNode,

				            "parallel_tbb": ParallelTBBConfigNode,

				            "parallel_native": ParallelNativeConfigNode,

				            "onnx": ONNXConfigNode,

				            "libtorch": LibTorchConfigNode,

				            "important": ImportantConfigNode,

				            "build_only": BuildOnlyConfigNode,

				            "cuda_gcc_override": CudaGccOverrideConfigNode

				            "shard_test": ShardTestConfigNode,

				            "cuda_gcc_override": CudaGccOverrideConfigNode,

				            "coverage": CoverageConfigNode,

				            "pure_torch": PureTorchConfigNode,

				        }

				        return next_nodes[experimental_feature]

				class PureTorchConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "PURE_TORCH=" + str(label)

				    def init2(self, node_name):

				        self.props["is_pure_torch"] = node_name

				    def child_constructor(self):

				        return ImportantConfigNode

				class XlaConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "XLA=" + str(label)

				@ -140,6 +204,39 @@ class XlaConfigNode(TreeConfigNode):

				        return ImportantConfigNode

				class AsanConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "Asan=" + str(label)

				    def init2(self, node_name):

				        self.props["is_asan"] = node_name

				    def child_constructor(self):

				        return ExperimentalFeatureConfigNode

				class ONNXConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "Onnx=" + str(label)

				    def init2(self, node_name):

				        self.props["is_onnx"] = node_name

				    def child_constructor(self):

				        return ImportantConfigNode

				class VulkanConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "Vulkan=" + str(label)

				    def init2(self, node_name):

				        self.props["is_vulkan"] = node_name

				    def child_constructor(self):

				        return ImportantConfigNode

				class ParallelTBBConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "PARALLELTBB=" + str(label)

				@ -170,7 +267,7 @@ class LibTorchConfigNode(TreeConfigNode):

				        self.props["is_libtorch"] = node_name

				    def child_constructor(self):

				        return ImportantConfigNode

				        return ExperimentalFeatureConfigNode

				class CudaGccOverrideConfigNode(TreeConfigNode):

				@ -178,17 +275,33 @@ class CudaGccOverrideConfigNode(TreeConfigNode):

				        self.props["cuda_gcc_override"] = node_name

				    def child_constructor(self):

				        return ImportantConfigNode

				        return ExperimentalFeatureConfigNode

				class BuildOnlyConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["build_only"] = node_name

				    def child_constructor(self):

				        return ExperimentalFeatureConfigNode

				class ShardTestConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["shard_test"] = node_name

				    def child_constructor(self):

				        return ImportantConfigNode

				class CoverageConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["is_coverage"] = node_name

				    def child_constructor(self):

				        return ExperimentalFeatureConfigNode

				class ImportantConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "IMPORTANT=" + str(label)

				@ -201,7 +314,6 @@ class ImportantConfigNode(TreeConfigNode):

				class XenialCompilerConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return label or "<unspecified>"

				@ -215,7 +327,6 @@ class XenialCompilerConfigNode(TreeConfigNode):

				class BionicCompilerConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return label or "<unspecified>"

									
										217

.circleci/cimodel/data/pytorch_build_definitions.py
									
												View File
												
				@ -1,14 +1,13 @@

				from collections import OrderedDict

				from dataclasses import dataclass, field

				from typing import List, Optional

				from cimodel.data.pytorch_build_data import TopLevelNode, CONFIG_TREE_DATA

				import cimodel.data.dimensions as dimensions

				import cimodel.lib.conf_tree as conf_tree

				import cimodel.lib.miniutils as miniutils

				from cimodel.data.simple.util.branch_filters import gen_filter_dict

				from cimodel.data.simple.util.docker_constants import gen_docker_image_path

				from dataclasses import dataclass, field

				from typing import List, Optional

				from cimodel.data.pytorch_build_data import CONFIG_TREE_DATA, TopLevelNode

				from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN

				from cimodel.data.simple.util.docker_constants import gen_docker_image

				@dataclass

				@ -18,19 +17,25 @@ class Conf:

				    parms_list_ignored_for_docker_image: Optional[List[str]] = None

				    pyver: Optional[str] = None

				    cuda_version: Optional[str] = None

				    rocm_version: Optional[str] = None

				    # TODO expand this to cover all the USE_* that we want to test for

				    #  tesnrorrt, leveldb, lmdb, redis, opencv, mkldnn, ideep, etc.

				    # (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453608)

				    is_xla: bool = False

				    vulkan: bool = False

				    is_vulkan: bool = False

				    is_pure_torch: bool = False

				    restrict_phases: Optional[List[str]] = None

				    gpu_resource: Optional[str] = None

				    dependent_tests: List = field(default_factory=list)

				    parent_build: Optional['Conf'] = None

				    parent_build: Optional["Conf"] = None

				    is_libtorch: bool = False

				    is_important: bool = False

				    parallel_backend: Optional[str] = None

				    @staticmethod

				    def is_test_phase(phase):

				        return "test" in phase

				    # TODO: Eliminate the special casing for docker paths

				    # In the short term, we *will* need to support special casing as docker images are merged for caffe2 and pytorch

				    def get_parms(self, for_docker):

				@ -42,31 +47,47 @@ class Conf:

				        leading.append("pytorch")

				        if self.is_xla and not for_docker:

				            leading.append("xla")

				        if self.is_vulkan and not for_docker:

				            leading.append("vulkan")

				        if self.is_libtorch and not for_docker:

				            leading.append("libtorch")

				        if self.is_pure_torch and not for_docker:

				            leading.append("pure_torch")

				        if self.parallel_backend is not None and not for_docker:

				            leading.append(self.parallel_backend)

				        cuda_parms = []

				        if self.cuda_version:

				            cuda_parms.extend(["cuda" + self.cuda_version, "cudnn7"])

				            cudnn = "cudnn8" if self.cuda_version.startswith("11.") else "cudnn7"

				            cuda_parms.extend(["cuda" + self.cuda_version, cudnn])

				        if self.rocm_version:

				            cuda_parms.extend([f"rocm{self.rocm_version}"])

				        result = leading + ["linux", self.distro] + cuda_parms + self.parms

				        if not for_docker and self.parms_list_ignored_for_docker_image is not None:

				            result = result + self.parms_list_ignored_for_docker_image

				        return result

				    def gen_docker_image_path(self):

				        parms_source = self.parent_build or self

				        base_build_env_name = "-".join(parms_source.get_parms(True))

				        image_name, _ = gen_docker_image(base_build_env_name)

				        return miniutils.quote(image_name)

				        return miniutils.quote(gen_docker_image_path(base_build_env_name))

				    def gen_docker_image_requires(self):

				        parms_source = self.parent_build or self

				        base_build_env_name = "-".join(parms_source.get_parms(True))

				        _, requires = gen_docker_image(base_build_env_name)

				        return miniutils.quote(requires)

				    def get_build_job_name_pieces(self, build_or_test):

				        return self.get_parms(False) + [build_or_test]

				    def gen_build_name(self, build_or_test):

				        return ("_".join(map(str, self.get_build_job_name_pieces(build_or_test)))).replace(".", "_").replace("-", "_")

				        return (

				            ("_".join(map(str, self.get_build_job_name_pieces(build_or_test))))

				            .replace(".", "_")

				            .replace("-", "_")

				        )

				    def get_dependents(self):

				        return self.dependent_tests or []

				@ -78,20 +99,26 @@ class Conf:

				        build_env_name = "-".join(map(str, build_job_name_pieces))

				        parameters["build_environment"] = miniutils.quote(build_env_name)

				        parameters["docker_image"] = self.gen_docker_image_path()

				        if phase == "test" and self.gpu_resource:

				        if Conf.is_test_phase(phase) and self.gpu_resource:

				            parameters["use_cuda_docker_runtime"] = miniutils.quote("1")

				        if phase == "test":

				        if Conf.is_test_phase(phase):

				            resource_class = "large"

				            if self.gpu_resource:

				                resource_class = "gpu." + self.gpu_resource

				            if self.rocm_version is not None:

				                resource_class = "pytorch/amd-gpu"

				            parameters["resource_class"] = resource_class

				        if phase == "build" and self.rocm_version is not None:

				            parameters["resource_class"] = "xlarge"

				        if hasattr(self, 'filters'):

				            parameters['filters'] = self.filters

				        return parameters

				    def gen_workflow_job(self, phase):

				        job_def = OrderedDict()

				        job_def["name"] = self.gen_build_name(phase)

				        if phase == "test":

				        if Conf.is_test_phase(phase):

				            # TODO When merging the caffe2 and pytorch jobs, it might be convenient for a while to make a

				            #  caffe2 test job dependent on a pytorch build job. This way we could quickly dedup the repeated

				@ -103,36 +130,59 @@ class Conf:

				            job_name = "pytorch_linux_test"

				        else:

				            job_name = "pytorch_linux_build"

				            job_def["requires"] = [self.gen_docker_image_requires()]

				        if not self.is_important:

				            job_def["filters"] = gen_filter_dict()

				        job_def.update(self.gen_workflow_params(phase))

				        return {job_name : job_def}

				        return {job_name: job_def}

				# TODO This is a hack to special case some configs just for the workflow list

				class HiddenConf(object):

				    def __init__(self, name, parent_build=None):

				    def __init__(self, name, parent_build=None, filters=None):

				        self.name = name

				        self.parent_build = parent_build

				        self.filters = filters

				    def gen_workflow_job(self, phase):

				        return {self.gen_build_name(phase): {"requires": [self.parent_build.gen_build_name("build")]}}

				        return {

				            self.gen_build_name(phase): {

				                "requires": [self.parent_build.gen_build_name("build")],

				                "filters": self.filters,

				            }

				        }

				    def gen_build_name(self, _):

				        return self.name

				class DocPushConf(object):

				    def __init__(self, name, parent_build=None, branch="master"):

				        self.name = name

				        self.parent_build = parent_build

				        self.branch = branch

				    def gen_workflow_job(self, phase):

				        return {

				            "pytorch_doc_push": {

				                "name": self.name,

				                "branch": self.branch,

				                "requires": [self.parent_build],

				                "context": "org-member",

				                "filters": gen_filter_dict(branches_list=["nightly"],

				                                           tags_list=RC_PATTERN)

				            }

				        }

				# TODO Convert these to graph nodes

				def gen_dependent_configs(xenial_parent_config):

				    extra_parms = [

				        (["multigpu"], "large"),

				        (["NO_AVX2"], "medium"),

				        (["NO_AVX", "NO_AVX2"], "medium"),

				        (["nogpu", "NO_AVX2"], None),

				        (["nogpu", "NO_AVX"], None),

				        (["slow"], "medium"),

				        (["nogpu"], None),

				    ]

				    configs = []

				@ -141,12 +191,12 @@ def gen_dependent_configs(xenial_parent_config):

				        c = Conf(

				            xenial_parent_config.distro,

				            ["py3"] + parms,

				            pyver="3.6",

				            pyver=xenial_parent_config.pyver,

				            cuda_version=xenial_parent_config.cuda_version,

				            restrict_phases=["test"],

				            gpu_resource=gpu,

				            parent_build=xenial_parent_config,

				            is_important=xenial_parent_config.is_important,

				            is_important=False,

				        )

				        configs.append(c)

				@ -157,9 +207,44 @@ def gen_dependent_configs(xenial_parent_config):

				def gen_docs_configs(xenial_parent_config):

				    configs = []

				    for x in ["pytorch_python_doc_push", "pytorch_cpp_doc_push", "pytorch_doc_test"]:

				        configs.append(HiddenConf(x, parent_build=xenial_parent_config))

				    configs.append(

				        HiddenConf(

				            "pytorch_python_doc_build",

				            parent_build=xenial_parent_config,

				            filters=gen_filter_dict(branches_list=r"/.*/",

				                                    tags_list=RC_PATTERN),

				        )

				    )

				    configs.append(

				        DocPushConf(

				            "pytorch_python_doc_push",

				            parent_build="pytorch_python_doc_build",

				            branch="site",

				        )

				    )

				    configs.append(

				        HiddenConf(

				            "pytorch_cpp_doc_build",

				            parent_build=xenial_parent_config,

				            filters=gen_filter_dict(branches_list=r"/.*/",

				                                    tags_list=RC_PATTERN),

				        )

				    )

				    configs.append(

				        DocPushConf(

				            "pytorch_cpp_doc_push",

				            parent_build="pytorch_cpp_doc_build",

				            branch="master",

				        )

				    )

				    configs.append(

				        HiddenConf(

				            "pytorch_doc_test",

				            parent_build=xenial_parent_config

				        )

				    )

				    return configs

				@ -186,12 +271,13 @@ def instantiate_configs():

				        compiler_name = fc.find_prop("compiler_name")

				        compiler_version = fc.find_prop("compiler_version")

				        is_xla = fc.find_prop("is_xla") or False

				        is_asan = fc.find_prop("is_asan") or False

				        is_coverage = fc.find_prop("is_coverage") or False

				        is_onnx = fc.find_prop("is_onnx") or False

				        is_pure_torch = fc.find_prop("is_pure_torch") or False

				        is_vulkan = fc.find_prop("is_vulkan") or False

				        parms_list_ignored_for_docker_image = []

				        vulkan = fc.find_prop("vulkan") or False

				        if vulkan:

				            parms_list_ignored_for_docker_image.append("vulkan")

				        python_version = None

				        if compiler_name == "cuda" or compiler_name == "android":

				            python_version = fc.find_prop("pyver")

				@ -200,9 +286,14 @@ def instantiate_configs():

				            parms_list = ["py" + fc.find_prop("pyver")]

				        cuda_version = None

				        rocm_version = None

				        if compiler_name == "cuda":

				            cuda_version = fc.find_prop("compiler_version")

				        elif compiler_name == "rocm":

				            rocm_version = fc.find_prop("compiler_version")

				            restrict_phases = ["build", "test1", "test2", "caffe2_test"]

				        elif compiler_name == "android":

				            android_ndk_version = fc.find_prop("compiler_version")

				            # TODO: do we need clang to compile host binaries like protoc?

				@ -216,14 +307,22 @@ def instantiate_configs():

				            gcc_version = compiler_name + (fc.find_prop("compiler_version") or "")

				            parms_list.append(gcc_version)

				            # TODO: This is a nasty special case

				            if gcc_version == 'clang5' and not is_xla:

				                parms_list.append("asan")

				                python_version = fc.find_prop("pyver")

				                parms_list[0] = fc.find_prop("abbreviated_pyver")

				        if is_asan:

				            parms_list.append("asan")

				            python_version = fc.find_prop("pyver")

				            parms_list[0] = fc.find_prop("abbreviated_pyver")

				        if cuda_version in ["9.2", "10", "10.1", "10.2"]:

				            # TODO The gcc version is orthogonal to CUDA version?

				        if is_coverage:

				            parms_list_ignored_for_docker_image.append("coverage")

				            python_version = fc.find_prop("pyver")

				        if is_onnx:

				            parms_list.append("onnx")

				            python_version = fc.find_prop("pyver")

				            parms_list[0] = fc.find_prop("abbreviated_pyver")

				            restrict_phases = ["build", "ort_test1", "ort_test2"]

				        if cuda_version:

				            cuda_gcc_version = fc.find_prop("cuda_gcc_override") or "gcc7"

				            parms_list.append(cuda_gcc_version)

				@ -231,7 +330,12 @@ def instantiate_configs():

				        is_important = fc.find_prop("is_important") or False

				        parallel_backend = fc.find_prop("parallel_backend") or None

				        build_only = fc.find_prop("build_only") or False

				        if build_only and restrict_phases is None:

				        shard_test = fc.find_prop("shard_test") or False

				        # TODO: fix pure_torch python test packaging issue.

				        if shard_test:

				            restrict_phases = ["build"] if restrict_phases is None else restrict_phases

				            restrict_phases.extend(["test1", "test2"])

				        if build_only or is_pure_torch:

				            restrict_phases = ["build"]

				        gpu_resource = None

				@ -244,8 +348,10 @@ def instantiate_configs():

				            parms_list_ignored_for_docker_image,

				            python_version,

				            cuda_version,

				            rocm_version,

				            is_xla,

				            vulkan,

				            is_vulkan,

				            is_pure_torch,

				            restrict_phases,

				            gpu_resource,

				            is_libtorch=is_libtorch,

				@ -255,20 +361,33 @@ def instantiate_configs():

				        # run docs builds on "pytorch-linux-xenial-py3.6-gcc5.4". Docs builds

				        # should run on a CPU-only build that runs on all PRs.

				        if distro_name == 'xenial' and fc.find_prop("pyver") == '3.6' \

				                and cuda_version is None \

				                and parallel_backend is None \

				                and compiler_name == 'gcc' \

				                and fc.find_prop('compiler_version') == '5.4':

				        # XXX should this be updated to a more modern build? Projects are

				        #     beginning to drop python3.6

				        if (

				            distro_name == "xenial"

				            and fc.find_prop("pyver") == "3.6"

				            and cuda_version is None

				            and parallel_backend is None

				            and not is_vulkan

				            and not is_pure_torch

				            and compiler_name == "gcc"

				            and fc.find_prop("compiler_version") == "5.4"

				        ):

				            c.filters = gen_filter_dict(branches_list=r"/.*/",

				                                        tags_list=RC_PATTERN)

				            c.dependent_tests = gen_docs_configs(c)

				        if cuda_version == "10.1" and python_version == "3.6" and not is_libtorch:

				        if cuda_version == "10.2" and python_version == "3.6" and not is_libtorch:

				            c.dependent_tests = gen_dependent_configs(c)

				        if (compiler_name == "gcc"

				                and compiler_version == "5.4"

				                and not is_libtorch

				                and parallel_backend is None):

				        if (

				            compiler_name == "gcc"

				            and compiler_version == "5.4"

				            and not is_libtorch

				            and not is_vulkan

				            and not is_pure_torch

				            and parallel_backend is None

				        ):

				            bc_breaking_check = Conf(

				                "backward-compatibility-check",

				                [],

				@ -297,7 +416,7 @@ def get_workflow_jobs():

				        for phase in phases:

				            # TODO why does this not have a test?

				            if phase == "test" and conf_options.cuda_version == "10":

				            if Conf.is_test_phase(phase) and conf_options.cuda_version == "10":

				                continue

				            x.append(conf_options.gen_workflow_job(phase))

									
										28

.circleci/cimodel/data/simple/anaconda_prune_defintions.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,28 @@

				from collections import OrderedDict

				from cimodel.data.simple.util.branch_filters import gen_filter_dict

				from cimodel.lib.miniutils import quote

				CHANNELS_TO_PRUNE = ["pytorch-nightly", "pytorch-test"]

				PACKAGES_TO_PRUNE = "pytorch torchvision torchaudio torchtext ignite torchcsprng"

				def gen_workflow_job(channel: str):

				    return OrderedDict(

				        {

				            "anaconda_prune": OrderedDict(

				                {

				                    "name": f"anaconda-prune-{channel}",

				                    "context": quote("org-member"),

				                    "packages": quote(PACKAGES_TO_PRUNE),

				                    "channel": channel,

				                    "filters": gen_filter_dict(branches_list=["postnightly"]),

				                }

				            )

				        }

				    )

				def get_workflow_jobs():

				    return [gen_workflow_job(channel) for channel in CHANNELS_TO_PRUNE]

									
										27

.circleci/cimodel/data/simple/android_definitions.py
									
												View File
												
				@ -1,5 +1,7 @@

				import cimodel.data.simple.util.branch_filters

				from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_NDK

				import cimodel.data.simple.util.branch_filters as branch_filters

				from cimodel.data.simple.util.docker_constants import (

				    DOCKER_IMAGE_NDK, DOCKER_REQUIREMENT_NDK

				)

				class AndroidJob:

				@ -34,10 +36,11 @@ class AndroidJob:

				            "name": full_job_name,

				            "build_environment": "\"{}\"".format(build_env_name),

				            "docker_image": "\"{}\"".format(DOCKER_IMAGE_NDK),

				            "requires": [DOCKER_REQUIREMENT_NDK]

				        }

				        if self.is_master_only:

				            props_dict["filters"] = cimodel.data.simple.util.branch_filters.gen_filter_dict()

				            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.NON_PR_BRANCH_LIST)

				        return [{self.template_name: props_dict}]

				@ -47,12 +50,14 @@ class AndroidGradleJob:

				                 job_name,

				                 template_name,

				                 dependencies,

				                 is_master_only=True):

				                 is_master_only=True,

				                 is_pr_only=False):

				        self.job_name = job_name

				        self.template_name = template_name

				        self.dependencies = dependencies

				        self.is_master_only = is_master_only

				        self.is_pr_only = is_pr_only

				    def gen_tree(self):

				@ -62,7 +67,9 @@ class AndroidGradleJob:

				        }

				        if self.is_master_only:

				            props_dict["filters"] = cimodel.data.simple.util.branch_filters.gen_filter_dict()

				            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.NON_PR_BRANCH_LIST)

				        elif self.is_pr_only:

				            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.PR_BRANCH_LIST)

				        return [{self.template_name: props_dict}]

				@ -72,12 +79,18 @@ WORKFLOW_DATA = [

				    AndroidJob(["x86_64"], "pytorch_linux_build"),

				    AndroidJob(["arm", "v7a"], "pytorch_linux_build"),

				    AndroidJob(["arm", "v8a"], "pytorch_linux_build"),

				    AndroidJob(["vulkan", "x86_32"], "pytorch_linux_build", is_master_only=False),

				    AndroidGradleJob(

				        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32",

				        "pytorch_android_gradle_build-x86_32",

				        ["pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build"],

				        is_master_only=False),

				        is_master_only=False,

				        is_pr_only=True),

				    AndroidGradleJob(

				        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				        "pytorch_android_gradle_custom_build_single",

				        [DOCKER_REQUIREMENT_NDK],

				        is_master_only=False,

				        is_pr_only=True),

				    AndroidGradleJob(

				        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build",

				        "pytorch_android_gradle_build",

									
										10

.circleci/cimodel/data/simple/bazel_definitions.py
									
												View File
												
				@ -1,4 +1,7 @@

				from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_GCC7

				from cimodel.data.simple.util.docker_constants import (

				    DOCKER_IMAGE_GCC7,

				    DOCKER_REQUIREMENT_GCC7

				)

				def gen_job_name(phase):

				@ -38,7 +41,10 @@ class BazelJob:

				        full_job_name = gen_job_name(self.phase)

				        build_env_name = "-".join(build_env_parts)

				        extra_requires = [gen_job_name("build")] if self.phase == "test" else []

				        extra_requires = (

				            [gen_job_name("build")] if self.phase == "test" else

				            [DOCKER_REQUIREMENT_GCC7]

				        )

				        props_dict = {

				            "build_environment": build_env_name,

									
										2

.circleci/cimodel/data/simple/binary_smoketest.py
									
												View File
												
				@ -5,7 +5,7 @@ TODO: Refactor circleci/cimodel/data/binary_build_data.py to generate this file

				 NB: If you modify this file, you need to also modify

				 the binary_and_smoke_tests_on_pr variable in

				 pytorch-ci-hud to adjust the list of whitelisted builds

				 pytorch-ci-hud to adjust the allowed build list

				 at https://github.com/ezyang/pytorch-ci-hud/blob/master/src/BuildHistoryDisplay.js

				 Note:

									
										35

.circleci/cimodel/data/simple/docker_definitions.py
									
												View File
												
				@ -1,10 +1,13 @@

				from collections import OrderedDict

				from cimodel.lib.miniutils import quote

				from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN

				# TODO: make this generated from a matrix rather than just a static list

				IMAGE_NAMES = [

				    "pytorch-linux-bionic-cuda11.1-cudnn8-py3.6-gcc9",

				    "pytorch-linux-bionic-cuda11.1-cudnn8-py3.8-gcc9",

				    "pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9",

				    "pytorch-linux-bionic-cuda11.0-cudnn8-py3.8-gcc9",

				    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9",

				@ -15,30 +18,38 @@ IMAGE_NAMES = [

				    "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7",

				    "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",

				    "pytorch-linux-xenial-cuda11.0-cudnn8-py3-gcc7",

				    "pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",

				    "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc5.4",

				    "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7",

				    "pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    "pytorch-linux-xenial-py3-clang5-asan",

				    "pytorch-linux-xenial-py3-clang7-onnx",

				    "pytorch-linux-xenial-py3.8",

				    "pytorch-linux-xenial-py3.6-clang7",

				    "pytorch-linux-xenial-py3.6-gcc4.8",

				    "pytorch-linux-xenial-py3.6-gcc5.4",

				    "pytorch-linux-xenial-py3.6-gcc5.4",  # this one is used in doc builds

				    "pytorch-linux-xenial-py3.6-gcc7.2",

				    "pytorch-linux-xenial-py3.6-gcc7",

				    "pytorch-linux-xenial-pynightly",

				    "pytorch-linux-xenial-rocm3.3-py3.6",

				    "pytorch-linux-bionic-rocm3.9-py3.6",

				    "pytorch-linux-bionic-rocm3.10-py3.6",

				]

				def get_workflow_jobs():

				    """Generates a list of docker image build definitions"""

				    return [

				        OrderedDict(

				    ret = []

				    for image_name in IMAGE_NAMES:

				        parameters = OrderedDict({

				            "name": quote(f"docker-{image_name}"),

				            "image_name": quote(image_name),

				        })

				        if image_name == "pytorch-linux-xenial-py3.6-gcc5.4":

				            # pushing documentation on tags requires CircleCI to also

				            # build all the dependencies on tags, including this docker image

				            parameters['filters'] = gen_filter_dict(branches_list=r"/.*/",

				                                                    tags_list=RC_PATTERN)

				        ret.append(OrderedDict(

				            {

				                "docker_build_job": OrderedDict(

				                    {"name": quote(image_name), "image_name": quote(image_name)}

				                )

				                "docker_build_job": parameters

				            }

				        )

				        for image_name in IMAGE_NAMES

				    ]

				        ))

				    return ret

									
										35

.circleci/cimodel/data/simple/ge_config_tests.py
									
												View File
												
				@ -61,41 +61,16 @@ WORKFLOW_DATA = [

				        MultiPartVersion([3, 6], "py"),

				        MultiPartVersion([5, 4], "gcc"),

				        None,

				        ["ge_config_legacy", "test"],

				        ["jit_legacy", "test"],

				        ["pytorch_linux_xenial_py3_6_gcc5_4_build"]),

				    GeConfigTestJob(

				        MultiPartVersion([3, 6], "py"),

				        MultiPartVersion([5, 4], "gcc"),

				        None,

				        ["ge_config_profiling", "test"],

				        ["pytorch_linux_xenial_py3_6_gcc5_4_build"]),

				    GeConfigTestJob(

				        MultiPartVersion([3, 6], "py"),

				        MultiPartVersion([5, 4], "gcc"),

				        None,

				        ["ge_config_simple", "test"],

				        ["pytorch_linux_xenial_py3_6_gcc5_4_build"],

				        CudaVersion(10, 2),

				        ["cudnn7", "py3", "jit_legacy", "test"],

				        ["pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build"],

				        use_cuda_docker=True,

				    ),

				    GeConfigTestJob(

				        None,

				        None,

				        CudaVersion(10, 2),

				        ["cudnn7", "py3", "ge_config_legacy", "test"],

				        ["pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build"],

				        use_cuda_docker=True,

				        # TODO Why does the build environment specify cuda10.1, while the

				        # job name is cuda10_2?

				        build_env_override="pytorch-linux-xenial-cuda10.1-cudnn7-ge_config_legacy-test"),

				    GeConfigTestJob(

				        None,

				        None,

				        CudaVersion(10, 2),

				        ["cudnn7", "py3", "ge_config_profiling", "test"],

				        ["pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build"],

				        use_cuda_docker=True,

				        # TODO Why does the build environment specify cuda10.1, while the

				        # job name is cuda10_2?

				        build_env_override="pytorch-linux-xenial-cuda10.1-cudnn7-ge_config_profiling-test"),

				]

									
										23

.circleci/cimodel/data/simple/ios_definitions.py
									
												View File
												
				@ -1,16 +1,16 @@

				from cimodel.data.simple.util.versions import MultiPartVersion

				import cimodel.lib.miniutils as miniutils

				IOS_VERSION = MultiPartVersion([11, 2, 1])

				XCODE_VERSION = MultiPartVersion([12, 0, 0])

				class ArchVariant:

				    def __init__(self, name, is_custom=False):

				    def __init__(self, name, custom_build_name=""):

				        self.name = name

				        self.is_custom = is_custom

				        self.custom_build_name = custom_build_name

				    def render(self):

				        extra_parts = ["custom"] if self.is_custom else []

				        extra_parts = [self.custom_build_name] if len(self.custom_build_name) > 0 else []

				        return "_".join([self.name] + extra_parts)

				@ -19,15 +19,15 @@ def get_platform(arch_variant_name):

				class IOSJob:

				    def __init__(self, ios_version, arch_variant, is_org_member_context=True, extra_props=None):

				        self.ios_version = ios_version

				    def __init__(self, xcode_version, arch_variant, is_org_member_context=True, extra_props=None):

				        self.xcode_version = xcode_version

				        self.arch_variant = arch_variant

				        self.is_org_member_context = is_org_member_context

				        self.extra_props = extra_props

				    def gen_name_parts(self, with_version_dots):

				        version_parts = self.ios_version.render_dots_or_parts(with_version_dots)

				        version_parts = self.xcode_version.render_dots_or_parts(with_version_dots)

				        build_variant_suffix = "_".join([self.arch_variant.render(), "build"])

				        return [

				@ -61,9 +61,10 @@ class IOSJob:

				WORKFLOW_DATA = [

				    IOSJob(IOS_VERSION, ArchVariant("x86_64"), is_org_member_context=False),

				    IOSJob(IOS_VERSION, ArchVariant("arm64")),

				    IOSJob(IOS_VERSION, ArchVariant("arm64", True), extra_props={"op_list": "mobilenetv2.yaml"}),

				    IOSJob(XCODE_VERSION, ArchVariant("x86_64"), is_org_member_context=False),

				    IOSJob(XCODE_VERSION, ArchVariant("arm64")),

				    IOSJob(XCODE_VERSION, ArchVariant("arm64", "metal"), extra_props={"use_metal": miniutils.quote(str(int(True)))}),

				    IOSJob(XCODE_VERSION, ArchVariant("arm64", "custom"), extra_props={"op_list": "mobilenetv2.yaml"}),

				]

									
										36

.circleci/cimodel/data/simple/mobile_definitions.py
									
												View File
												
				@ -4,12 +4,23 @@ PyTorch Mobile PR builds (use linux host toolchain + mobile build options)

				import cimodel.lib.miniutils as miniutils

				import cimodel.data.simple.util.branch_filters

				from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_ASAN, DOCKER_IMAGE_NDK

				from cimodel.data.simple.util.docker_constants import (

				    DOCKER_IMAGE_ASAN,

				    DOCKER_REQUIREMENT_ASAN,

				    DOCKER_IMAGE_NDK,

				    DOCKER_REQUIREMENT_NDK

				)

				class MobileJob:

				    def __init__(self, docker_image, variant_parts, is_master_only=False):

				    def __init__(

				            self,

				            docker_image,

				            docker_requires,

				            variant_parts,

				            is_master_only=False):

				        self.docker_image = docker_image

				        self.docker_requires = docker_requires

				        self.variant_parts = variant_parts

				        self.is_master_only = is_master_only

				@ -30,6 +41,7 @@ class MobileJob:

				            "build_environment": build_env_name,

				            "build_only": miniutils.quote(str(int(True))),

				            "docker_image": self.docker_image,

				            "requires": self.docker_requires,

				            "name": full_job_name,

				        }

				@ -40,15 +52,27 @@ class MobileJob:

				WORKFLOW_DATA = [

				    MobileJob(DOCKER_IMAGE_ASAN, ["build"]),

				    MobileJob(DOCKER_IMAGE_ASAN, ["custom", "build", "static"]),

				    MobileJob(

				        DOCKER_IMAGE_ASAN,

				        [DOCKER_REQUIREMENT_ASAN],

				        ["build"]

				    ),

				    # Use LLVM-DEV toolchain in android-ndk-r19c docker image

				    MobileJob(DOCKER_IMAGE_NDK, ["custom", "build", "dynamic"]),

				    MobileJob(

				        DOCKER_IMAGE_NDK,

				        [DOCKER_REQUIREMENT_NDK],

				        ["custom", "build", "dynamic"]

				    ),

				    # Use LLVM-DEV toolchain in android-ndk-r19c docker image

				    # Most of this CI is already covered by "mobile-custom-build-dynamic" job

				    MobileJob(DOCKER_IMAGE_NDK, ["code", "analysis"], True),

				    MobileJob(

				        DOCKER_IMAGE_NDK,

				        [DOCKER_REQUIREMENT_NDK],

				        ["code", "analysis"],

				        True

				    ),

				]

									
										14

.circleci/cimodel/data/simple/nightly_android.py
									
												View File
												
				@ -1,4 +1,7 @@

				from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_NDK

				from cimodel.data.simple.util.docker_constants import (

				    DOCKER_IMAGE_NDK,

				    DOCKER_REQUIREMENT_NDK

				)

				class AndroidNightlyJob:

				@ -48,12 +51,13 @@ class AndroidNightlyJob:

				        return [{self.template_name: props_dict}]

				BASE_REQUIRES = [DOCKER_REQUIREMENT_NDK]

				WORKFLOW_DATA = [

				    AndroidNightlyJob(["x86_32"], "pytorch_linux_build"),

				    AndroidNightlyJob(["x86_64"], "pytorch_linux_build"),

				    AndroidNightlyJob(["arm", "v7a"], "pytorch_linux_build"),

				    AndroidNightlyJob(["arm", "v8a"], "pytorch_linux_build"),

				    AndroidNightlyJob(["x86_32"], "pytorch_linux_build", requires=BASE_REQUIRES),

				    AndroidNightlyJob(["x86_64"], "pytorch_linux_build", requires=BASE_REQUIRES),

				    AndroidNightlyJob(["arm", "v7a"], "pytorch_linux_build", requires=BASE_REQUIRES),

				    AndroidNightlyJob(["arm", "v8a"], "pytorch_linux_build", requires=BASE_REQUIRES),

				    AndroidNightlyJob(["android_gradle"], "pytorch_android_gradle_build",

				                      with_docker=False,

				                      requires=[

									
										2

.circleci/cimodel/data/simple/nightly_ios.py
									
												View File
												
				@ -18,7 +18,7 @@ class IOSNightlyJob:

				        common_name_pieces = [

				            "ios",

				        ] + ios_definitions.IOS_VERSION.render_dots_or_parts(with_version_dots) + [

				        ] + ios_definitions.XCODE_VERSION.render_dots_or_parts(with_version_dots) + [

				            "nightly",

				            self.variant,

				            "build",

									
										5

.circleci/cimodel/data/simple/util/branch_filters.py
									
												View File
												
				@ -4,6 +4,11 @@ NON_PR_BRANCH_LIST = [

				    r"/release\/.*/",

				]

				PR_BRANCH_LIST = [

				    r"/gh\/.*\/head/",

				    r"/pull\/.*/",

				]

				RC_PATTERN = r"/v[0-9]+(\.[0-9]+)*-rc[0-9]+/"

				def gen_filter_dict(

									
										43

.circleci/cimodel/data/simple/util/docker_constants.py
									
												View File
												
				@ -1,30 +1,33 @@

				AWS_DOCKER_HOST = "308535385114.dkr.ecr.us-east-1.amazonaws.com"

				# ARE YOU EDITING THIS NUMBER?  MAKE SURE YOU READ THE GUIDANCE AT THE

				# TOP OF .circleci/config.yml

				DOCKER_IMAGE_TAG = "209062ef-ab58-422a-b295-36c4eed6e906"

				def gen_docker_image(container_type):

				    return (

				        "/".join([AWS_DOCKER_HOST, "pytorch", container_type]),

				        f"docker-{container_type}",

				    )

				def gen_docker_image_requires(image_name):

				    return [f"docker-{image_name}"]

				def gen_docker_image_path(container_type):

				    return "/".join([

				        AWS_DOCKER_HOST,

				        "pytorch",

				        container_type + ":" + DOCKER_IMAGE_TAG,

				    ])

				DOCKER_IMAGE_BASIC, DOCKER_REQUIREMENT_BASE = gen_docker_image(

				    "pytorch-linux-xenial-py3.6-gcc5.4"

				)

				DOCKER_IMAGE_CUDA_10_2, DOCKER_REQUIREMENT_CUDA_10_2 = gen_docker_image(

				    "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				)

				DOCKER_IMAGE_GCC7, DOCKER_REQUIREMENT_GCC7 = gen_docker_image(

				    "pytorch-linux-xenial-py3.6-gcc7"

				)

				DOCKER_IMAGE_BASIC = gen_docker_image_path("pytorch-linux-xenial-py3.6-gcc5.4")

				DOCKER_IMAGE_CUDA_10_2 = gen_docker_image_path("pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7")

				DOCKER_IMAGE_GCC7 = gen_docker_image_path("pytorch-linux-xenial-py3.6-gcc7")

				def gen_mobile_docker_name(specifier):

				def gen_mobile_docker(specifier):

				    container_type = "pytorch-linux-xenial-py3-clang5-" + specifier

				    return gen_docker_image_path(container_type)

				    return gen_docker_image(container_type)

				DOCKER_IMAGE_ASAN = gen_mobile_docker_name("asan")

				DOCKER_IMAGE_ASAN, DOCKER_REQUIREMENT_ASAN = gen_mobile_docker("asan")

				DOCKER_IMAGE_NDK = gen_mobile_docker_name("android-ndk-r19c")

				DOCKER_IMAGE_NDK, DOCKER_REQUIREMENT_NDK = gen_mobile_docker("android-ndk-r19c")

									
										5

.circleci/cimodel/data/simple/util/versions.py
									
												View File
												
				@ -9,7 +9,7 @@ class MultiPartVersion:

				        with the prefix string.

				        """

				        if self.parts:

				            return [self.prefix + str(self.parts[0])] + list(map(str, self.parts[1:]))

				            return [self.prefix + str(self.parts[0])] + [str(part) for part in self.parts[1:]]

				        else:

				            return [self.prefix]

				@ -29,3 +29,6 @@ class CudaVersion(MultiPartVersion):

				        self.minor = minor

				        super().__init__([self.major, self.minor], "cuda")

				    def __str__(self):

				        return f"{self.major}.{self.minor}"

									
										42

.circleci/cimodel/data/windows_build_definitions.py
									
												View File
												
				@ -43,8 +43,11 @@ class WindowsJob:

				        if base_phase == "test":

				            prerequisite_jobs.append("_".join(base_name_parts + ["build"]))

				        if self.cuda_version:

				            self.cudnn_version = 8 if self.cuda_version.major == 11 else 7

				        arch_env_elements = (

				            ["cuda" + str(self.cuda_version.major), "cudnn7"]

				            ["cuda" + str(self.cuda_version.major), "cudnn" + str(self.cudnn_version)]

				            if self.cuda_version

				            else ["cpu"]

				        )

				@ -83,21 +86,25 @@ class WindowsJob:

				                props_dict["executor"] = "windows-with-nvidia-gpu"

				        props_dict["cuda_version"] = (

				            miniutils.quote(str(self.cuda_version.major))

				            miniutils.quote(str(self.cuda_version))

				            if self.cuda_version

				            else "cpu"

				        )

				        props_dict["name"] = "_".join(name_parts)

				        return [{key_name: props_dict}]

				class VcSpec:

				    def __init__(self, year, version_elements=None):

				    def __init__(self, year, version_elements=None, hide_version=False):

				        self.year = year

				        self.version_elements = version_elements or []

				        self.hide_version = hide_version

				    def get_elements(self):

				        if self.hide_version:

				            return [self.prefixed_year()]

				        return [self.prefixed_year()] + self.version_elements

				    def get_product(self):

				@ -110,7 +117,7 @@ class VcSpec:

				        return "vs" + str(self.year)

				    def render(self):

				        return "_".join(filter(None, [self.prefixed_year(), self.dotted_version()]))

				        return "_".join(self.get_elements())

				def FalsePred(_):

				    return False

				@ -118,23 +125,22 @@ def FalsePred(_):

				def TruePred(_):

				    return True

				_VC2019 = VcSpec(2019)

				WORKFLOW_DATA = [

				    # VS2017 CUDA-10.1

				    WindowsJob(None, VcSpec(2017, ["14", "11"]), CudaVersion(10, 1), master_only_pred=FalsePred),

				    WindowsJob(1, VcSpec(2017, ["14", "11"]), CudaVersion(10, 1)),

				    # VS2017 no-CUDA (builds only)

				    WindowsJob(None, VcSpec(2017, ["14", "16"]), CudaVersion(10, 1)),

				    WindowsJob(None, VcSpec(2017, ["14", "16"]), None),

				    # VS2019 CUDA-10.1

				    WindowsJob(None, VcSpec(2019), CudaVersion(10, 1)),

				    WindowsJob(1, VcSpec(2019), CudaVersion(10, 1)),

				    WindowsJob(2, VcSpec(2019), CudaVersion(10, 1)),

				    WindowsJob(None, _VC2019, CudaVersion(10, 1)),

				    WindowsJob(1, _VC2019, CudaVersion(10, 1)),

				    WindowsJob(2, _VC2019, CudaVersion(10, 1)),

				    # VS2019 CUDA-11.1

				    WindowsJob(None, _VC2019, CudaVersion(11, 1)),

				    WindowsJob(1, _VC2019, CudaVersion(11, 1), master_only_pred=TruePred),

				    WindowsJob(2, _VC2019, CudaVersion(11, 1), master_only_pred=TruePred),

				    # VS2019 CPU-only

				    WindowsJob(None, VcSpec(2019), None),

				    WindowsJob(1, VcSpec(2019), None),

				    WindowsJob(2, VcSpec(2019), None, master_only_pred=TruePred),

				    WindowsJob(1, VcSpec(2019), CudaVersion(10, 1), force_on_cpu=True),

				    WindowsJob(2, VcSpec(2019), CudaVersion(10, 1), force_on_cpu=True, master_only_pred=TruePred),

				    WindowsJob(None, _VC2019, None),

				    WindowsJob(1, _VC2019, None, master_only_pred=TruePred),

				    WindowsJob(2, _VC2019, None, master_only_pred=TruePred),

				    WindowsJob(1, _VC2019, CudaVersion(10, 1), force_on_cpu=True, master_only_pred=TruePred),

				]

7739

.circleci/config.yml

View File

File diff suppressed because it is too large Load Diff

									
										190

.circleci/docker/build.sh
									
												View File
												
				@ -10,18 +10,37 @@ if [ -z "${image}" ]; then

				  exit 1

				fi

				# TODO: Generalize

				OS="ubuntu"

				DOCKERFILE="${OS}/Dockerfile"

				if [[ "$image" == *-cuda* ]]; then

				  DOCKERFILE="${OS}-cuda/Dockerfile"

				elif [[ "$image" == *-rocm* ]]; then

				  DOCKERFILE="${OS}-rocm/Dockerfile"

				fi

				function extract_version_from_image_name() {

				  eval export $2=$(echo "${image}" | perl -n -e"/$1(\d+(\.\d+)?(\.\d+)?)/ && print \$1")

				  if [ "x${!2}" = x ]; then

				    echo "variable '$2' not correctly parsed from image='$image'"

				    exit 1

				  fi

				}

				if [[ "$image" == *-trusty* ]]; then

				  UBUNTU_VERSION=14.04

				elif [[ "$image" == *-xenial* ]]; then

				function extract_all_from_image_name() {

				  # parts $image into array, splitting on '-'

				  keep_IFS="$IFS"

				  IFS="-"

				  declare -a parts=($image)

				  IFS="$keep_IFS"

				  unset keep_IFS

				  for part in "${parts[@]}"; do

				    name=$(echo "${part}" | perl -n -e"/([a-zA-Z]+)\d+(\.\d+)?(\.\d+)?/ && print \$1")

				    vername="${name^^}_VERSION"

				    # "py" is the odd one out, needs this special case

				    if [ "x${name}" = xpy ]; then

				      vername=ANACONDA_PYTHON_VERSION

				    fi

				    # skip non-conforming fields such as "pytorch", "linux" or "xenial" without version string

				    if [ -n "${name}" ]; then

				      extract_version_from_image_name "${name}" "${vername}"

				    fi

				  done

				}

				if [[ "$image" == *-xenial* ]]; then

				  UBUNTU_VERSION=16.04

				elif [[ "$image" == *-artful* ]]; then

				  UBUNTU_VERSION=17.10

				@ -29,6 +48,26 @@ elif [[ "$image" == *-bionic* ]]; then

				  UBUNTU_VERSION=18.04

				elif [[ "$image" == *-focal* ]]; then

				  UBUNTU_VERSION=20.04

				elif [[ "$image" == *ubuntu* ]]; then

				  extract_version_from_image_name ubuntu UBUNTU_VERSION

				elif [[ "$image" == *centos* ]]; then

				  extract_version_from_image_name centos CENTOS_VERSION

				fi

				if [ -n "${UBUNTU_VERSION}" ]; then

				  OS="ubuntu"

				elif [ -n "${CENTOS_VERSION}" ]; then

				  OS="centos"

				else

				  echo "Unable to derive operating system base..."

				  exit 1

				fi

				DOCKERFILE="${OS}/Dockerfile"

				if [[ "$image" == *cuda* ]]; then

				  DOCKERFILE="${OS}-cuda/Dockerfile"

				elif [[ "$image" == *rocm* ]]; then

				  DOCKERFILE="${OS}-rocm/Dockerfile"

				fi

				TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/14.04/x86_64"

				@ -38,19 +77,10 @@ TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/u

				# from scratch

				case "$image" in

				  pytorch-linux-xenial-py3.8)

				    # TODO: This is a hack, get rid of this as soon as you get rid of the travis downloads

				    TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/16.04/x86_64"

				    TRAVIS_PYTHON_VERSION=3.8

				    ANACONDA_PYTHON_VERSION=3.8

				    GCC_VERSION=7

				    # Do not install PROTOBUF, DB, and VISION as a test

				    ;;

				  pytorch-linux-xenial-py3.6-gcc4.8)

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=4.8

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-py3.6-gcc5.4)

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=5

				@ -71,13 +101,6 @@ case "$image" in

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-pynightly)

				    TRAVIS_PYTHON_VERSION=nightly

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc5.4)

				    CUDA_VERSION=9.2

				    CUDNN_VERSION=7

				@ -126,7 +149,6 @@ case "$image" in

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-cuda11.0-cudnn8-py3-gcc7)

				    UBUNTU_VERSION=16.04-rc

				    CUDA_VERSION=11.0

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.6

				@ -136,6 +158,16 @@ case "$image" in

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7)

				    CUDA_VERSION=11.1

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-py3-clang5-asan)

				    ANACONDA_PYTHON_VERSION=3.6

				    CLANG_VERSION=5.0

				@ -143,6 +175,13 @@ case "$image" in

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-py3-clang7-onnx)

				    ANACONDA_PYTHON_VERSION=3.6

				    CLANG_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-py3-clang5-android-ndk-r19c)

				    ANACONDA_PYTHON_VERSION=3.6

				    CLANG_VERSION=5.0

				@ -167,6 +206,8 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    VULKAN_SDK_VERSION=1.2.148.0

				    SWIFTSHADER=yes

				    ;;

				  pytorch-linux-bionic-py3.8-gcc9)

				    ANACONDA_PYTHON_VERSION=3.8

				@ -194,7 +235,6 @@ case "$image" in

				    VISION=yes

				    ;;

				  pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9)

				    UBUNTU_VERSION=18.04-rc

				    CUDA_VERSION=11.0

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.6

				@ -205,7 +245,6 @@ case "$image" in

				    KATEX=yes

				    ;;

				  pytorch-linux-bionic-cuda11.0-cudnn8-py3.8-gcc9)

				    UBUNTU_VERSION=18.04-rc

				    CUDA_VERSION=11.0

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.8

				@ -215,22 +254,72 @@ case "$image" in

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-rocm3.3-py3.6)

				  pytorch-linux-bionic-cuda11.1-cudnn8-py3.6-gcc9)

				    CUDA_VERSION=11.1

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-bionic-cuda11.1-cudnn8-py3.8-gcc9)

				    CUDA_VERSION=11.1

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.8

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-bionic-rocm3.9-py3.6)

				    ANACONDA_PYTHON_VERSION=3.6

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ROCM_VERSION=3.3

				    # newer cmake version required

				    CMAKE_VERSION=3.6.3

				    ROCM_VERSION=3.9

				    ;;

				  pytorch-linux-bionic-rocm3.3-py3.6)

				  pytorch-linux-bionic-rocm3.10-py3.6)

				    ANACONDA_PYTHON_VERSION=3.6

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ROCM_VERSION=3.3

				    ROCM_VERSION=3.10

				    ;;

				  *)

				    # Catch-all for builds that are not hardcoded.

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    echo "image '$image' did not match an existing build configuration"

				    if [[ "$image" == *py* ]]; then

				      extract_version_from_image_name py ANACONDA_PYTHON_VERSION

				    fi

				    if [[ "$image" == *cuda* ]]; then

				      extract_version_from_image_name cuda CUDA_VERSION

				      extract_version_from_image_name cudnn CUDNN_VERSION

				    fi

				    if [[ "$image" == *rocm* ]]; then

				      extract_version_from_image_name rocm ROCM_VERSION

				    fi

				    if [[ "$image" == *gcc* ]]; then

				      extract_version_from_image_name gcc GCC_VERSION

				    fi

				    if [[ "$image" == *clang* ]]; then

				      extract_version_from_image_name clang CLANG_VERSION

				    fi

				    if [[ "$image" == *devtoolset* ]]; then

				      extract_version_from_image_name devtoolset DEVTOOLSET_VERSION

				    fi

				    if [[ "$image" == *glibc* ]]; then

				      extract_version_from_image_name glibc GLIBC_VERSION

				    fi

				    if [[ "$image" == *cmake* ]]; then

				      extract_version_from_image_name cmake CMAKE_VERSION

				    fi

				  ;;

				esac

				# Set Jenkins UID and GID if running Jenkins

				@ -259,15 +348,19 @@ docker build \

				       --build-arg "JENKINS_UID=${JENKINS_UID:-}" \

				       --build-arg "JENKINS_GID=${JENKINS_GID:-}" \

				       --build-arg "UBUNTU_VERSION=${UBUNTU_VERSION}" \

				       --build-arg "CENTOS_VERSION=${CENTOS_VERSION}" \

				       --build-arg "DEVTOOLSET_VERSION=${DEVTOOLSET_VERSION}" \

				       --build-arg "GLIBC_VERSION=${GLIBC_VERSION}" \

				       --build-arg "CLANG_VERSION=${CLANG_VERSION}" \

				       --build-arg "ANACONDA_PYTHON_VERSION=${ANACONDA_PYTHON_VERSION}" \

				       --build-arg "TRAVIS_PYTHON_VERSION=${TRAVIS_PYTHON_VERSION}" \

				       --build-arg "GCC_VERSION=${GCC_VERSION}" \

				       --build-arg "CUDA_VERSION=${CUDA_VERSION}" \

				       --build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \

				       --build-arg "ANDROID=${ANDROID}" \

				       --build-arg "ANDROID_NDK=${ANDROID_NDK_VERSION}" \

				       --build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \

				       --build-arg "VULKAN_SDK_VERSION=${VULKAN_SDK_VERSION}" \

				       --build-arg "SWIFTSHADER=${SWIFTSHADER}" \

				       --build-arg "CMAKE_VERSION=${CMAKE_VERSION:-}" \

				       --build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \

				       --build-arg "KATEX=${KATEX:-}" \

				@ -277,6 +370,14 @@ docker build \

				       "$@" \

				       .

				# NVIDIA dockers for RC releases use tag names like `11.0-cudnn8-devel-ubuntu18.04-rc`,

				# for this case we will set UBUNTU_VERSION to `18.04-rc` so that the Dockerfile could

				# find the correct image. As a result, here we have to replace the

				#   "$UBUNTU_VERSION" == "18.04-rc"

				# with

				#   "$UBUNTU_VERSION" == "18.04"

				UBUNTU_VERSION=$(echo ${UBUNTU_VERSION} | sed 's/-rc$//')

				function drun() {

				  docker run --rm "$tmp_tag" $*

				}

				@ -294,19 +395,6 @@ if [[ "$OS" == "ubuntu" ]]; then

				  fi

				fi

				if [ -n "$TRAVIS_PYTHON_VERSION" ]; then

				  if [[ "$TRAVIS_PYTHON_VERSION" != nightly ]]; then

				    if !(drun python --version 2>&1 | grep -qF "Python $TRAVIS_PYTHON_VERSION"); then

				      echo "TRAVIS_PYTHON_VERSION=$TRAVIS_PYTHON_VERSION, but:"

				      drun python --version

				      exit 1

				    fi

				  else

				    echo "Please manually check nightly is OK:"

				    drun python --version

				  fi

				fi

				if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  if !(drun python --version 2>&1 | grep -qF "Python $ANACONDA_PYTHON_VERSION"); then

				    echo "ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION, but:"

									
										6

.circleci/docker/build_docker.sh
									
												View File
												
				@ -13,7 +13,7 @@ retry () {

				#until we find a way to reliably reuse previous build, this last_tag is not in use

				# last_tag="$(( CIRCLE_BUILD_NUM - 1 ))"

				tag="${CIRCLE_WORKFLOW_ID}"

				tag="${DOCKER_TAG}"

				registry="308535385114.dkr.ecr.us-east-1.amazonaws.com"

				@ -45,9 +45,5 @@ trap "docker logout ${registry}" EXIT

				docker push "${image}:${tag}"

				# TODO: Get rid of duplicate tagging once ${DOCKER_TAG} becomes the default

				docker tag "${image}:${tag}" "${image}:${DOCKER_TAG}"

				docker push "${image}:${DOCKER_TAG}"

				docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"

				aws s3 cp "${IMAGE_NAME}:${tag}.tar" "s3://ossci-linux-build/pytorch/base/${IMAGE_NAME}:${tag}.tar" --acl public-read

									
										92

.circleci/docker/centos-rocm/Dockerfile
									
										Normal file
									
												View File
												
				@ -0,0 +1,92 @@

				ARG CENTOS_VERSION

				FROM centos:${CENTOS_VERSION}

				ARG CENTOS_VERSION

				# Install required packages to build Caffe2

				# Install common dependencies (so that this step can be cached separately)

				ARG EC2

				ADD ./common/install_base.sh install_base.sh

				RUN bash ./install_base.sh && rm install_base.sh

				# Install devtoolset

				ARG DEVTOOLSET_VERSION

				ADD ./common/install_devtoolset.sh install_devtoolset.sh

				RUN bash ./install_devtoolset.sh && rm install_devtoolset.sh

				ENV BASH_ENV "/etc/profile"

				# (optional) Install non-default glibc version

				ARG GLIBC_VERSION

				ADD ./common/install_glibc.sh install_glibc.sh

				RUN if [ -n "${GLIBC_VERSION}" ]; then bash ./install_glibc.sh; fi

				RUN rm install_glibc.sh

				# Install user

				ADD ./common/install_user.sh install_user.sh

				RUN bash ./install_user.sh && rm install_user.sh

				# Install conda and other packages (e.g., numpy, coverage, pytest)

				ENV PATH /opt/conda/bin:$PATH

				ARG ANACONDA_PYTHON_VERSION

				ADD ./common/install_conda.sh install_conda.sh

				RUN bash ./install_conda.sh && rm install_conda.sh

				# (optional) Install protobuf for ONNX

				ARG PROTOBUF

				ADD ./common/install_protobuf.sh install_protobuf.sh

				RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi

				RUN rm install_protobuf.sh

				ENV INSTALLED_PROTOBUF ${PROTOBUF}

				# (optional) Install database packages like LMDB and LevelDB

				ARG DB

				ADD ./common/install_db.sh install_db.sh

				RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi

				RUN rm install_db.sh

				ENV INSTALLED_DB ${DB}

				# (optional) Install vision packages like OpenCV and ffmpeg

				ARG VISION

				ADD ./common/install_vision.sh install_vision.sh

				RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi

				RUN rm install_vision.sh

				ENV INSTALLED_VISION ${VISION}

				# Install rocm

				ARG ROCM_VERSION

				ADD ./common/install_rocm.sh install_rocm.sh

				RUN bash ./install_rocm.sh

				RUN rm install_rocm.sh

				ENV PATH /opt/rocm/bin:$PATH

				ENV PATH /opt/rocm/hcc/bin:$PATH

				ENV PATH /opt/rocm/hip/bin:$PATH

				ENV PATH /opt/rocm/opencl/bin:$PATH

				ENV PATH /opt/rocm/llvm/bin:$PATH

				ENV LANG en_US.utf8

				ENV LC_ALL en_US.utf8

				# (optional) Install non-default CMake version

				ARG CMAKE_VERSION

				ADD ./common/install_cmake.sh install_cmake.sh

				RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi

				RUN rm install_cmake.sh

				# (optional) Install non-default Ninja version

				ARG NINJA_VERSION

				ADD ./common/install_ninja.sh install_ninja.sh

				RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi

				RUN rm install_ninja.sh

				# Install ccache/sccache (do this last, so we get priority in PATH)

				ADD ./common/install_cache.sh install_cache.sh

				ENV PATH /opt/cache/bin:$PATH

				RUN bash ./install_cache.sh && rm install_cache.sh

				# Include BUILD_ENVIRONMENT environment variable in image

				ARG BUILD_ENVIRONMENT

				ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}

				USER jenkins

				CMD ["bash"]

									
										38

.circleci/docker/common/install_android.sh
									
												View File
												
				@ -4,13 +4,15 @@ set -ex

				[ -n "${ANDROID_NDK}" ]

				_https_amazon_aws=https://ossci-android.s3.amazonaws.com

				apt-get update

				apt-get install -y --no-install-recommends autotools-dev autoconf unzip

				apt-get autoclean && apt-get clean

				rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				pushd /tmp

				curl -Os --retry 3 https://dl.google.com/android/repository/android-ndk-${ANDROID_NDK}-linux-x86_64.zip

				curl -Os --retry 3 $_https_amazon_aws/android-ndk-${ANDROID_NDK}-linux-x86_64.zip

				popd

				_ndk_dir=/opt/ndk

				mkdir -p "$_ndk_dir"

				@ -45,43 +47,22 @@ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/

				# Installing android sdk

				# https://github.com/circleci/circleci-images/blob/staging/android/Dockerfile.m4

				_sdk_version=sdk-tools-linux-3859397.zip

				_tmp_sdk_zip=/tmp/android-sdk-linux.zip

				_android_home=/opt/android/sdk

				rm -rf $_android_home

				sudo mkdir -p $_android_home

				curl --silent --show-error --location --fail --retry 3 --output /tmp/$_sdk_version https://dl.google.com/android/repository/$_sdk_version

				sudo unzip -q /tmp/$_sdk_version -d $_android_home

				rm /tmp/$_sdk_version

				curl --silent --show-error --location --fail --retry 3 --output /tmp/android-sdk-linux.zip $_https_amazon_aws/android-sdk-linux-tools3859397-build-tools2803-2902-platforms28-29.zip

				sudo unzip -q $_tmp_sdk_zip -d $_android_home

				rm $_tmp_sdk_zip

				sudo chmod -R 777 $_android_home

				export ANDROID_HOME=$_android_home

				export ADB_INSTALL_TIMEOUT=120

				export PATH="${ANDROID_HOME}/emulator:${ANDROID_HOME}/tools:${ANDROID_HOME}/tools/bin:${ANDROID_HOME}/platform-tools:${PATH}"

				export PATH="${ANDROID_HOME}/tools:${ANDROID_HOME}/tools/bin:${ANDROID_HOME}/platform-tools:${PATH}"

				echo "PATH:${PATH}"

				alias sdkmanager="$ANDROID_HOME/tools/bin/sdkmanager"

				sudo mkdir ~/.android && sudo echo '### User Sources for Android SDK Manager' > ~/.android/repositories.cfg

				sudo chmod -R 777 ~/.android

				yes | sdkmanager --licenses

				yes | sdkmanager --update

				sdkmanager \

				  "tools" \

				  "platform-tools" \

				  "emulator"

				sdkmanager \

				  "build-tools;28.0.3" \

				  "build-tools;29.0.2"

				sdkmanager \

				  "platforms;android-28" \

				  "platforms;android-29"

				sdkmanager --list

				# Installing Gradle

				echo "GRADLE_VERSION:${GRADLE_VERSION}"

				@ -89,8 +70,7 @@ _gradle_home=/opt/gradle

				sudo rm -rf $gradle_home

				sudo mkdir -p $_gradle_home

				wget --no-verbose --output-document=/tmp/gradle.zip \

				"https://services.gradle.org/distributions/gradle-${GRADLE_VERSION}-bin.zip"

				curl --silent --output /tmp/gradle.zip --retry 3 $_https_amazon_aws/gradle-${GRADLE_VERSION}-bin.zip

				sudo unzip -q /tmp/gradle.zip -d $_gradle_home

				rm /tmp/gradle.zip

									
										157

.circleci/docker/common/install_base.sh
									
												View File
												
				@ -2,55 +2,112 @@

				set -ex

				# NVIDIA dockers for RC releases use tag names like `11.0-cudnn8-devel-ubuntu18.04-rc`,

				# for this case we will set UBUNTU_VERSION to `18.04-rc` so that the Dockerfile could

				# find the correct image. As a result, here we have to check for

				#   "$UBUNTU_VERSION" == "18.04"*

				# instead of

				#   "$UBUNTU_VERSION" == "18.04"

				if [[ "$UBUNTU_VERSION" == "18.04"* ]]; then

				  cmake3="cmake=3.10*"

				else

				  cmake3="cmake=3.5*"

				fi

				install_ubuntu() {

				  # NVIDIA dockers for RC releases use tag names like `11.0-cudnn8-devel-ubuntu18.04-rc`,

				  # for this case we will set UBUNTU_VERSION to `18.04-rc` so that the Dockerfile could

				  # find the correct image. As a result, here we have to check for

				  #   "$UBUNTU_VERSION" == "18.04"*

				  # instead of

				  #   "$UBUNTU_VERSION" == "18.04"

				  if [[ "$UBUNTU_VERSION" == "18.04"* ]]; then

				    cmake3="cmake=3.10*"

				  else

				    cmake3="cmake=3.5*"

				  fi

				# Install common dependencies

				apt-get update

				# TODO: Some of these may not be necessary

				# TODO: libiomp also gets installed by conda, aka there's a conflict

				ccache_deps="asciidoc docbook-xml docbook-xsl xsltproc"

				numpy_deps="gfortran"

				apt-get install -y --no-install-recommends \

				  $ccache_deps \

				  $numpy_deps \

				  ${cmake3} \

				  apt-transport-https \

				  autoconf \

				  automake \

				  build-essential \

				  ca-certificates \

				  curl \

				  git \

				  libatlas-base-dev \

				  libc6-dbg \

				  libiomp-dev \

				  libyaml-dev \

				  libz-dev \

				  libjpeg-dev \

				  libasound2-dev \

				  libsndfile-dev \

				  python \

				  python-dev \

				  python-setuptools \

				  python-wheel \

				  software-properties-common \

				  sudo \

				  wget \

				  vim

				  # Install common dependencies

				  apt-get update

				  # TODO: Some of these may not be necessary

				  ccache_deps="asciidoc docbook-xml docbook-xsl xsltproc"

				  numpy_deps="gfortran"

				  apt-get install -y --no-install-recommends \

				    $ccache_deps \

				    $numpy_deps \

				    ${cmake3} \

				    apt-transport-https \

				    autoconf \

				    automake \

				    build-essential \

				    ca-certificates \

				    curl \

				    git \

				    libatlas-base-dev \

				    libc6-dbg \

				    libiomp-dev \

				    libyaml-dev \

				    libz-dev \

				    libjpeg-dev \

				    libasound2-dev \

				    libsndfile-dev \

				    software-properties-common \

				    sudo \

				    wget \

				    vim

				  # Cleanup package manager

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				}

				install_centos() {

				  # Need EPEL for many packages we depend on.

				  # See http://fedoraproject.org/wiki/EPEL

				  yum --enablerepo=extras install -y epel-release

				  ccache_deps="asciidoc docbook-dtds docbook-style-xsl libxslt"

				  numpy_deps="gcc-gfortran"

				  # Note: protobuf-c-{compiler,devel} on CentOS are too old to be used

				  # for Caffe2. That said, we still install them to make sure the build

				  # system opts to build/use protoc and libprotobuf from third-party.

				  yum install -y \

				    $ccache_deps \

				    $numpy_deps \

				    autoconf \

				    automake \

				    bzip2 \

				    cmake \

				    cmake3 \

				    curl \

				    gcc \

				    gcc-c++ \

				    gflags-devel \

				    git \

				    glibc-devel \

				    glibc-headers \

				    glog-devel \

				    hiredis-devel \

				    libstdc++-devel \

				    make \

				    opencv-devel \

				    sudo \

				    wget \

				    vim

				  # Cleanup

				  yum clean all

				  rm -rf /var/cache/yum

				  rm -rf /var/lib/yum/yumdb

				  rm -rf /var/lib/yum/history

				}

				# Install base packages depending on the base OS

				ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')

				case "$ID" in

				  ubuntu)

				    install_ubuntu

				    ;;

				  centos)

				    install_centos

				    ;;

				  *)

				    echo "Unable to determine OS..."

				    exit 1

				    ;;

				esac

				# Install Valgrind separately since the apt-get version is too old.

				mkdir valgrind_build && cd valgrind_build

				VALGRIND_VERSION=3.15.0

				VALGRIND_VERSION=3.16.1

				if ! wget http://valgrind.org/downloads/valgrind-${VALGRIND_VERSION}.tar.bz2

				then

				  wget https://sourceware.org/ftp/valgrind/valgrind-${VALGRIND_VERSION}.tar.bz2

				@ -63,13 +120,3 @@ sudo make install

				cd ../../

				rm -rf valgrind_build

				alias valgrind="/usr/local/bin/valgrind"

				# TODO: THIS IS A HACK!!!

				# distributed nccl(2) tests are a bit busted, see https://github.com/pytorch/pytorch/issues/5877

				if dpkg -s libnccl-dev; then

				  apt-get remove -y libnccl-dev libnccl2 --allow-change-held-packages

				fi

				# Cleanup package manager

				apt-get autoclean && apt-get clean

				rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

									
										94

.circleci/docker/common/install_cache.sh
									
												View File
												
				@ -2,17 +2,51 @@

				set -ex

				install_ubuntu() {

				  echo "Preparing to build sccache from source"

				  apt-get update

				  apt-get install -y cargo pkg-config libssl-dev

				  echo "Checking out sccache repo"

				  git clone https://github.com/pytorch/sccache

				  cd sccache

				  echo "Building sccache"

				  cargo build --release

				  cp target/release/sccache /opt/cache/bin

				  echo "Cleaning up"

				  cd ..

				  rm -rf sccache

				  apt-get remove -y cargo rustc

				  apt-get autoclean && apt-get clean

				}

				install_binary() {

				  echo "Downloading sccache binary from S3 repo"

				  curl --retry 3 https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache

				}

				mkdir -p /opt/cache/bin

				mkdir -p /opt/cache/lib

				sed -e 's|PATH="\(.*\)"|PATH="/opt/cache/bin:\1"|g' -i /etc/environment

				export PATH="/opt/cache/bin:$PATH"

				# Setup compiler cache

				curl --retry 3 https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache

				if [ -n "$ROCM_VERSION" ]; then

				  curl --retry 3 http://repo.radeon.com/misc/.sccache_amd/sccache -o /opt/cache/bin/sccache

				else

				  ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')

				  case "$ID" in

				    ubuntu)

				      install_ubuntu

				      ;;

				    *)

				      install_binary

				      ;;

				  esac

				fi

				chmod a+x /opt/cache/bin/sccache

				function write_sccache_stub() {

				  printf "#!/bin/sh\nexec sccache $(which $1) \$*" > "/opt/cache/bin/$1"

				  printf "#!/bin/sh\nif [ \$(ps -p \$PPID -o comm=) != sccache ]; then\n  exec sccache $(which $1) \"\$@\"\nelse\n  exec $(which $1) \"\$@\"\nfi" > "/opt/cache/bin/$1"

				  chmod a+x "/opt/cache/bin/$1"

				}

				@ -20,8 +54,12 @@ write_sccache_stub cc

				write_sccache_stub c++

				write_sccache_stub gcc

				write_sccache_stub g++

				write_sccache_stub clang

				write_sccache_stub clang++

				# NOTE: See specific ROCM_VERSION case below.

				if [ "x$ROCM_VERSION" = x ]; then

				  write_sccache_stub clang

				  write_sccache_stub clang++

				fi

				if [ -n "$CUDA_VERSION" ]; then

				  # TODO: This is a workaround for the fact that PyTorch's FindCUDA

				@ -30,6 +68,50 @@ if [ -n "$CUDA_VERSION" ]; then

				  # where CUDA is installed.  Instead, we install an nvcc symlink outside

				  # of the PATH, and set CUDA_NVCC_EXECUTABLE so that we make use of it.

				  printf "#!/bin/sh\nexec sccache $(which nvcc) \"\$@\"" > /opt/cache/lib/nvcc

				  chmod a+x /opt/cache/lib/nvcc

				  write_sccache_stub nvcc

				  mv /opt/cache/bin/nvcc /opt/cache/lib/

				fi

				if [ -n "$ROCM_VERSION" ]; then

				  # ROCm compiler is hcc or clang. However, it is commonly invoked via hipcc wrapper.

				  # hipcc will call either hcc or clang using an absolute path starting with /opt/rocm,

				  # causing the /opt/cache/bin to be skipped. We must create the sccache wrappers

				  # directly under /opt/rocm while also preserving the original compiler names.

				  # Note symlinks will chain as follows: [hcc or clang++] -> clang -> clang-??

				  # Final link in symlink chain must point back to original directory.

				  # Original compiler is moved one directory deeper. Wrapper replaces it.

				  function write_sccache_stub_rocm() {

				    OLDCOMP=$1

				    COMPNAME=$(basename $OLDCOMP)

				    TOPDIR=$(dirname $OLDCOMP)

				    WRAPPED="$TOPDIR/original/$COMPNAME"

				    mv "$OLDCOMP" "$WRAPPED"

				    printf "#!/bin/sh\nexec sccache $WRAPPED \"\$@\"" > "$OLDCOMP"

				    chmod a+x "$OLDCOMP"

				  }

				  if [[ -e "/opt/rocm/hcc/bin/hcc" ]]; then

				    # ROCm 3.3 or earlier.

				    mkdir /opt/rocm/hcc/bin/original

				    write_sccache_stub_rocm /opt/rocm/hcc/bin/hcc

				    write_sccache_stub_rocm /opt/rocm/hcc/bin/clang

				    write_sccache_stub_rocm /opt/rocm/hcc/bin/clang++

				    # Fix last link in symlink chain, clang points to versioned clang in prior dir

				    pushd /opt/rocm/hcc/bin/original

				    ln -s ../$(readlink clang)

				    popd

				  elif [[ -e "/opt/rocm/llvm/bin/clang" ]]; then

				    # ROCm 3.5 and beyond.

				    mkdir /opt/rocm/llvm/bin/original

				    write_sccache_stub_rocm /opt/rocm/llvm/bin/clang

				    write_sccache_stub_rocm /opt/rocm/llvm/bin/clang++

				    # Fix last link in symlink chain, clang points to versioned clang in prior dir

				    pushd /opt/rocm/llvm/bin/original

				    ln -s ../$(readlink clang)

				    popd

				  else

				    echo "Cannot find ROCm compiler."

				    exit 1

				  fi

				fi

									
										57

.circleci/docker/common/install_conda.sh
									
												View File
												
				@ -24,13 +24,20 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  mkdir /opt/conda

				  chown jenkins:jenkins /opt/conda

				  # Work around bug where devtoolset replaces sudo and breaks it.

				  if [ -n "$DEVTOOLSET_VERSION" ]; then

				    SUDO=/bin/sudo

				  else

				    SUDO=sudo

				  fi

				  as_jenkins() {

				    # NB: unsetting the environment variables works around a conda bug

				    # https://github.com/conda/conda/issues/6576

				    # NB: Pass on PATH and LD_LIBRARY_PATH to sudo invocation

				    # NB: This must be run from a directory that jenkins has access to,

				    # works around https://github.com/conda/conda-package-handling/pull/34

				    sudo -H -u jenkins env -u SUDO_UID -u SUDO_GID -u SUDO_COMMAND -u SUDO_USER env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*

				    $SUDO -H -u jenkins env -u SUDO_UID -u SUDO_GID -u SUDO_COMMAND -u SUDO_USER env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*

				  }

				  pushd /tmp

				@ -49,10 +56,10 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  pushd /opt/conda

				  # Track latest conda update

				  as_jenkins conda update -n base conda

				  as_jenkins conda update -y -n base conda

				  # Install correct Python version

				  as_jenkins conda install python="$ANACONDA_PYTHON_VERSION"

				  as_jenkins conda install -y python="$ANACONDA_PYTHON_VERSION"

				  conda_install() {

				    # Ensure that the install command don't upgrade/downgrade Python

				@ -65,11 +72,13 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  # DO NOT install cmake here as it would install a version newer than 3.5, but

				  # we want to pin to version 3.5.

				  if [ "$ANACONDA_PYTHON_VERSION" = "3.8" ]; then

				    # DO NOT install typing if installing python-3.8, since its part of python-3.8 core packages

				    # Install llvm-8 as it is required to compile llvmlite-0.30.0 from source

				    conda_install numpy pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0

				    conda_install numpy=1.18.5 pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0

				  elif [ "$ANACONDA_PYTHON_VERSION" = "3.7" ]; then

				    # DO NOT install dataclasses if installing python-3.7, since its part of python-3.7 core packages

				    conda_install numpy=1.18.5 pyyaml mkl mkl-include setuptools cffi future six typing_extensions

				  else

				    conda_install numpy pyyaml mkl mkl-include setuptools cffi typing future six

				    conda_install numpy=1.18.5 pyyaml mkl mkl-include setuptools cffi future six dataclasses typing_extensions

				  fi

				  if [[ "$CUDA_VERSION" == 9.2* ]]; then

				    conda_install magma-cuda92 -c pytorch

				@ -79,18 +88,42 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				    conda_install magma-cuda101 -c pytorch

				  elif [[ "$CUDA_VERSION" == 10.2* ]]; then

				    conda_install magma-cuda102 -c pytorch

				  elif [[ "$CUDA_VERSION" == 11.0* ]]; then

				    conda_install magma-cuda110 -c pytorch

				  elif [[ "$CUDA_VERSION" == 11.1* ]]; then

				    conda_install magma-cuda111 -c pytorch

				  elif [[ "$CUDA_VERSION" == 11.2* ]]; then

				    conda_install magma-cuda112 -c pytorch

				  fi

				  # TODO: This isn't working atm

				  conda_install nnpack -c killeent

				  # Install some other packages

				  # Install some other packages, including those needed for Python test reporting

				  # TODO: Why is scipy pinned

				  # numba & llvmlite is pinned because of https://github.com/numba/numba/issues/4368

				  # scikit-learn is pinned because of

				  # https://github.com/scikit-learn/scikit-learn/issues/14485 (affects gcc 5.5

				  # only)

				  as_jenkins pip install --progress-bar off pytest scipy==1.1.0 scikit-learn==0.20.3 scikit-image librosa>=0.6.2 psutil numba==0.46.0 llvmlite==0.30.0

				  # Pin MyPy version because new errors are likely to appear with each release

				  # Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136

				  as_jenkins pip install --progress-bar off pytest \

				    scipy==1.1.0 \

				    scikit-image \

				    librosa>=0.6.2 \

				    psutil \

				    numba \

				    llvmlite \

				    unittest-xml-reporting \

				    boto3==1.16.34 \

				    coverage \

				    hypothesis==4.53.2 \

				    mypy==0.770 \

				    tb-nightly

				  # Update scikit-learn to a python-3.8 compatible version

				  if [[ $(python -c "import sys; print(int(sys.version_info >= (3, 8)))") == "1" ]]; then

				    as_jenkins pip install --progress-bar off -U scikit-learn

				  else

				    # Pinned scikit-learn due to https://github.com/scikit-learn/scikit-learn/issues/14485 (affects gcc 5.5 only)

				    as_jenkins pip install --progress-bar off scikit-learn==0.20.3

				  fi

				  popd

				fi

									
										21

.circleci/docker/common/install_db.sh
									
												View File
												
				@ -51,11 +51,16 @@ install_centos() {

				}

				# Install base packages depending on the base OS

				if [ -f /etc/lsb-release ]; then

				  install_ubuntu

				elif [ -f /etc/os-release ]; then

				  install_centos

				else

				  echo "Unable to determine OS..."

				  exit 1

				fi

				ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')

				case "$ID" in

				  ubuntu)

				    install_ubuntu

				    ;;

				  centos)

				    install_centos

				    ;;

				  *)

				    echo "Unable to determine OS..."

				    exit 1

				    ;;

				esac

									
										10

.circleci/docker/common/install_devtoolset.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,10 @@

				#!/bin/bash

				set -ex

				[ -n "$DEVTOOLSET_VERSION" ]

				yum install -y centos-release-scl

				yum install -y devtoolset-$DEVTOOLSET_VERSION

				echo "source scl_source enable devtoolset-$DEVTOOLSET_VERSION" > "/etc/profile.d/devtoolset-$DEVTOOLSET_VERSION.sh"

									
										1

.circleci/docker/common/install_gcc.sh
									
												View File
												
				@ -15,6 +15,7 @@ if [ -n "$GCC_VERSION" ]; then

				  update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50

				  update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50

				  update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION" 50

				  # Cleanup package manager

				  apt-get autoclean && apt-get clean

									
										34

.circleci/docker/common/install_glibc.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,34 @@

				#!/bin/bash

				set -ex

				[ -n "$GLIBC_VERSION" ]

				if [[ -n "$CENTOS_VERSION" ]]; then

				  [ -n "$DEVTOOLSET_VERSION" ]

				fi

				yum install -y wget sed

				mkdir -p /packages && cd /packages

				wget -q http://ftp.gnu.org/gnu/glibc/glibc-$GLIBC_VERSION.tar.gz

				tar xzf glibc-$GLIBC_VERSION.tar.gz

				if [[ "$GLIBC_VERSION" == "2.26" ]]; then

				  cd glibc-$GLIBC_VERSION

				  sed -i 's/$name ne "nss_test1"/$name ne "nss_test1" \&\& $name ne "nss_test2"/' scripts/test-installation.pl

				  cd ..

				fi

				mkdir -p glibc-$GLIBC_VERSION-build && cd glibc-$GLIBC_VERSION-build

				if [[ -n "$CENTOS_VERSION" ]]; then

				  export PATH=/opt/rh/devtoolset-$DEVTOOLSET_VERSION/root/usr/bin:$PATH

				fi

				../glibc-$GLIBC_VERSION/configure --prefix=/usr CFLAGS='-Wno-stringop-truncation -Wno-format-overflow -Wno-restrict -Wno-format-truncation -g -O2'

				make -j$(nproc)

				make install

				# Cleanup

				rm -rf /packages

				rm -rf /var/cache/yum/*

				rm -rf /var/lib/rpm/__db.*

				yum clean all

									
										8

.circleci/docker/common/install_lcov.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,8 @@

				#!/bin/bash

				set -ex

				git clone --branch v1.15 https://github.com/linux-test-project/lcov.git

				pushd lcov

				sudo make install   # will be installed in /usr/local/bin/lcov

				popd

									
										30

.circleci/docker/common/install_llvm.sh
									
												View File
											
				@ -1,30 +0,0 @@

				#!/bin/bash

				set -ex

				llvm_url="https://github.com/llvm/llvm-project/releases/download/llvmorg-9.0.1/llvm-9.0.1.src.tar.xz"

				mkdir /opt/llvm

				pushd /tmp

				wget --no-verbose --output-document=llvm.tar.xz "$llvm_url"

				mkdir llvm

				tar -xf llvm.tar.xz -C llvm --strip-components 1

				rm -f llvm.tar.xz

				cd llvm

				mkdir build

				cd build

				cmake -G "Unix Makefiles" \

				  -DCMAKE_BUILD_TYPE=MinSizeRel \

				  -DLLVM_ENABLE_ASSERTIONS=ON \

				  -DCMAKE_INSTALL_PREFIX=/opt/llvm \

				  -DLLVM_TARGETS_TO_BUILD="host" \

				  -DLLVM_BUILD_TOOLS=OFF \

				  -DLLVM_BUILD_UTILS=OFF \

				  -DLLVM_TEMPORARILY_ALLOW_OLD_TOOLCHAIN=ON \

				  ../

				make -j4

				sudo make install

				popd

									
										4

.circleci/docker/common/install_nccl.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,4 @@

				#!/bin/bash

				sudo apt-get -qq update

				sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1

4

.circleci/docker/common/install_openmpi.sh Normal file

View File

 @ -0,0 +1,4 @@
 #!/bin/bash
 sudo apt-get update
 sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

									
										21

.circleci/docker/common/install_protobuf.sh
									
												View File
												
				@ -46,11 +46,16 @@ install_centos() {

				}

				# Install base packages depending on the base OS

				if [ -f /etc/lsb-release ]; then

				  install_ubuntu

				elif [ -f /etc/os-release ]; then

				  install_centos

				else

				  echo "Unable to determine OS..."

				  exit 1

				fi

				ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')

				case "$ID" in

				  ubuntu)

				    install_ubuntu

				    ;;

				  centos)

				    install_centos

				    ;;

				  *)

				    echo "Unable to determine OS..."

				    exit 1

				    ;;

				esac

									
										82

.circleci/docker/common/install_rocm.sh
									
												View File
												
				@ -2,47 +2,68 @@

				set -ex

				install_magma() {

				    # "install" hipMAGMA into /opt/rocm/magma by copying after build

				    git clone https://bitbucket.org/icl/magma.git -b hipMAGMA

				    pushd magma

				    cp make.inc-examples/make.inc.hip-mkl-gcc make.inc

				    echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc

				    echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc

				    echo 'DEVCCFLAGS += --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908' >> make.inc

				    export PATH="${PATH}:/opt/rocm/bin"

				    make -f make.gen.hipMAGMA -j $(nproc)

				    make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda

				    make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda

				    popd

				    mv magma /opt/rocm

				}

				install_ubuntu() {

				    apt-get update

				    if [[ $UBUNTU_VERSION == 18.04 ]]; then

				      # gpg-agent is not available by default on 18.04

				      apt-get install -y --no-install-recommends gpg-agent

				    fi

				    apt-get install -y kmod

				    apt-get install -y wget

				    apt-get install -y libopenblas-dev

				    # Need the libc++1 and libc++abi1 libraries to allow torch._C to load at runtime

				    apt-get install -y libc++1

				    apt-get install -y libc++abi1

				    DEB_ROCM_REPO=http://repo.radeon.com/rocm/apt/${ROCM_VERSION}

				    # Add rocm repository

				    wget -qO - $DEB_ROCM_REPO/rocm.gpg.key | apt-key add -

				    echo "deb [arch=amd64] $DEB_ROCM_REPO xenial main" > /etc/apt/sources.list.d/rocm.list

				    wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -

				    echo "deb [arch=amd64] http://repo.radeon.com/rocm/apt/${ROCM_VERSION} xenial main" > /etc/apt/sources.list.d/rocm.list

				    apt-get update --allow-insecure-repositories

				    DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \

				                   rocm-dev \

				                   rocm-utils \

				                   rocfft \

				                   miopen-hip \

				                   rocblas \

				                   hipsparse \

				                   rocrand \

				                   hipcub \

				                   rocthrust \

				                   rocm-libs \

				                   rccl \

				                   rocprofiler-dev \

				                   roctracer-dev

				  # Cleanup

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				    # precompiled miopen kernels added in ROCm 3.5; search for all unversioned packages

				    # if search fails it will abort this script; use true to avoid case where search fails

				    MIOPENKERNELS=$(apt-cache search --names-only miopenkernels | awk '{print $1}' | grep -F -v . || true)

				    if [[ "x${MIOPENKERNELS}" = x ]]; then

				      echo "miopenkernels package not available"

				    else

				      DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ${MIOPENKERNELS}

				    fi

				    install_magma

				    # Cleanup

				    apt-get autoclean && apt-get clean

				    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				}

				install_centos() {

				  yum update -y

				  yum install -y kmod

				  yum install -y wget

				  yum install -y openblas-devel

				@ -51,7 +72,7 @@ install_centos() {

				  echo "[ROCm]" > /etc/yum.repos.d/rocm.repo

				  echo "name=ROCm" >> /etc/yum.repos.d/rocm.repo

				  echo "baseurl=http://repo.radeon.com/rocm/yum/rpm/" >> /etc/yum.repos.d/rocm.repo

				  echo "baseurl=http://repo.radeon.com/rocm/yum/${ROCM_VERSION}" >> /etc/yum.repos.d/rocm.repo

				  echo "enabled=1" >> /etc/yum.repos.d/rocm.repo

				  echo "gpgcheck=0" >> /etc/yum.repos.d/rocm.repo

				@ -60,17 +81,13 @@ install_centos() {

				  yum install -y \

				                   rocm-dev \

				                   rocm-utils \

				                   rocfft \

				                   miopen-hip \

				                   rocblas \

				                   hipsparse \

				                   rocrand \

				                   rocm-libs \

				                   rccl \

				                   hipcub \

				                   rocthrust \

				                   rocprofiler-dev \

				                   roctracer-dev

				  install_magma

				  # Cleanup

				  yum clean all

				  rm -rf /var/cache/yum

				@ -79,11 +96,16 @@ install_centos() {

				}

				# Install Python packages depending on the base OS

				if [ -f /etc/lsb-release ]; then

				  install_ubuntu

				elif [ -f /etc/os-release ]; then

				  install_centos

				else

				  echo "Unable to determine OS..."

				  exit 1

				fi

				ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')

				case "$ID" in

				  ubuntu)

				    install_ubuntu

				    ;;

				  centos)

				    install_centos

				    ;;

				  *)

				    echo "Unable to determine OS..."

				    exit 1

				    ;;

				esac

									
										24

.circleci/docker/common/install_swiftshader.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,24 @@

				#!/bin/bash

				set -ex

				[ -n "${SWIFTSHADER}" ]

				retry () {

				    $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)

				}

				_https_amazon_aws=https://ossci-android.s3.amazonaws.com

				# SwiftShader

				_swiftshader_dir=/var/lib/jenkins/swiftshader

				_swiftshader_file_targz=swiftshader-abe07b943-prebuilt.tar.gz

				mkdir -p $_swiftshader_dir

				_tmp_swiftshader_targz="/tmp/${_swiftshader_file_targz}"

				curl --silent --show-error --location --fail --retry 3 \

				  --output "${_tmp_swiftshader_targz}" "$_https_amazon_aws/${_swiftshader_file_targz}"

				tar -C "${_swiftshader_dir}" -xzf "${_tmp_swiftshader_targz}"

				export VK_ICD_FILENAMES="${_swiftshader_dir}/build/Linux/vk_swiftshader_icd.json"

									
										97

.circleci/docker/common/install_travis_python.sh
									
												View File
											
				@ -1,97 +0,0 @@

				#!/bin/bash

				set -ex

				as_jenkins() {

				  # NB: Preserve PATH and LD_LIBRARY_PATH changes

				  sudo -H -u jenkins env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*

				}

				if [ -n "$TRAVIS_PYTHON_VERSION" ]; then

				  mkdir -p /opt/python

				  chown jenkins:jenkins /opt/python

				  # Download Python binary from Travis

				  pushd tmp

				  as_jenkins wget --quiet ${TRAVIS_DL_URL_PREFIX}/python-$TRAVIS_PYTHON_VERSION.tar.bz2

				  # NB: The tarball also comes with /home/travis virtualenv that we

				  # don't care about.  (Maybe we should, but we've worked around the

				  # "how do I install to python" issue by making this entire directory

				  # user-writable "lol")

				  # NB: Relative ordering of opt/python and flags matters

				  as_jenkins tar xjf python-$TRAVIS_PYTHON_VERSION.tar.bz2 --strip-components=2 --directory /opt/python opt/python

				  popd

				  echo "/opt/python/$TRAVIS_PYTHON_VERSION/lib" > /etc/ld.so.conf.d/travis-python.conf

				  ldconfig

				  sed -e 's|PATH="\(.*\)"|PATH="/opt/python/'"$TRAVIS_PYTHON_VERSION"'/bin:\1"|g' -i /etc/environment

				  export PATH="/opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH"

				  python --version

				  pip --version

				  # Install pip from source.

				  # The python-pip package on Ubuntu Trusty is old

				  # and upon install numpy doesn't use the binary

				  # distribution, and fails to compile it from source.

				  pushd tmp

				  as_jenkins curl -L -O https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz

				  as_jenkins tar zxf pip-9.0.1.tar.gz

				  pushd pip-9.0.1

				  as_jenkins python setup.py install

				  popd

				  rm -rf pip-9.0.1*

				  popd

				  # Install pip packages

				  as_jenkins pip install --upgrade pip

				  pip --version

				  if [[ "$TRAVIS_PYTHON_VERSION" == nightly ]]; then

				      # These two packages have broken Cythonizations uploaded

				      # to PyPi, see:

				      #

				      #  - https://github.com/numpy/numpy/issues/10500

				      #  - https://github.com/yaml/pyyaml/issues/117

				      #

				      # Furthermore, the released version of Cython does not

				      # have these issues fixed.

				      #

				      # While we are waiting on fixes for these, we build

				      # from Git for now.  Feel free to delete this conditional

				      # branch if things start working again (you may need

				      # to do this if these packages regress on Git HEAD.)

				      as_jenkins pip install git+https://github.com/cython/cython.git

				      as_jenkins pip install git+https://github.com/numpy/numpy.git

				      as_jenkins pip install git+https://github.com/yaml/pyyaml.git

				  else

				      as_jenkins pip install numpy pyyaml

				  fi

				  as_jenkins pip install \

				      future \

				      hypothesis \

				      protobuf \

				      pytest \

				      pillow \

				      typing

				  as_jenkins pip install mkl mkl-devel

				  # SciPy does not support Python 3.7 or Python 2.7.9

				  if [[ "$TRAVIS_PYTHON_VERSION" != nightly ]] && [[ "$TRAVIS_PYTHON_VERSION" != "2.7.9" ]]; then

				      as_jenkins pip install scipy==1.1.0 scikit-image librosa>=0.6.2

				  fi

				  # Install psutil for dataloader tests

				  as_jenkins pip install psutil

				  # Install dill for serialization tests

				  as_jenkins pip install "dill>=0.3.1"

				  # Cleanup package manager

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				fi

									
										21

.circleci/docker/common/install_vision.sh
									
												View File
												
				@ -47,11 +47,16 @@ install_centos() {

				}

				# Install base packages depending on the base OS

				if [ -f /etc/lsb-release ]; then

				  install_ubuntu

				elif [ -f /etc/os-release ]; then

				  install_centos

				else

				  echo "Unable to determine OS..."

				  exit 1

				fi

				ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')

				case "$ID" in

				  ubuntu)

				    install_ubuntu

				    ;;

				  centos)

				    install_centos

				    ;;

				  *)

				    echo "Unable to determine OS..."

				    exit 1

				    ;;

				esac

									
										23

.circleci/docker/common/install_vulkan_sdk.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,23 @@

				#!/bin/bash

				set -ex

				[ -n "${VULKAN_SDK_VERSION}" ]

				retry () {

				    $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)

				}

				_https_amazon_aws=https://ossci-android.s3.amazonaws.com

				_vulkansdk_dir=/var/lib/jenkins/vulkansdk

				mkdir -p $_vulkansdk_dir

				_tmp_vulkansdk_targz=/tmp/vulkansdk.tar.gz

				curl --silent --show-error --location --fail --retry 3 \

				  --output "$_tmp_vulkansdk_targz" "$_https_amazon_aws/vulkansdk-linux-x86_64-${VULKAN_SDK_VERSION}.tar.gz"

				tar -C "$_vulkansdk_dir" -xzf "$_tmp_vulkansdk_targz" --strip-components 1

				export VULKAN_SDK="$_vulkansdk_dir/"

				rm "$_tmp_vulkansdk_targz"

									
										23

.circleci/docker/ubuntu-cuda/Dockerfile
									
												View File
												
				@ -24,7 +24,7 @@ ARG KATEX

				ADD ./common/install_katex.sh install_katex.sh

				RUN bash ./install_katex.sh && rm install_katex.sh

				# Install conda

				# Install conda and other packages (e.g., numpy, coverage, pytest)

				ENV PATH /opt/conda/bin:$PATH

				ARG ANACONDA_PYTHON_VERSION

				ADD ./common/install_conda.sh install_conda.sh

				@ -40,12 +40,6 @@ ARG CLANG_VERSION

				ADD ./common/install_clang.sh install_clang.sh

				RUN bash ./install_clang.sh && rm install_clang.sh

				# Install non-standard Python versions (via Travis binaries)

				ARG TRAVIS_PYTHON_VERSION

				ENV PATH /opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH

				ADD ./common/install_travis_python.sh install_travis_python.sh

				RUN bash ./install_travis_python.sh && rm install_travis_python.sh

				# (optional) Install protobuf for ONNX

				ARG PROTOBUF

				ADD ./common/install_protobuf.sh install_protobuf.sh

				@ -78,6 +72,16 @@ ADD ./common/install_jni.sh install_jni.sh

				ADD ./java/jni.h jni.h

				RUN bash ./install_jni.sh && rm install_jni.sh

				# Install NCCL for when CUDA is version 10.1

				ADD ./common/install_nccl.sh install_nccl.sh

				RUN if [ "${CUDA_VERSION}" = 10.1 ]; then bash ./install_nccl.sh; fi

				RUN rm install_nccl.sh

				# Install Open MPI for CUDA

				ADD ./common/install_openmpi.sh install_openmpi.sh

				RUN if [ -n "${CUDA_VERSION}" ]; then bash install_openmpi.sh; fi

				RUN rm install_openmpi.sh

				# Include BUILD_ENVIRONMENT environment variable in image

				ARG BUILD_ENVIRONMENT

				ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}

				@ -86,9 +90,8 @@ ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}

				ENV TORCH_CUDA_ARCH_LIST Maxwell

				ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"

				# Install LLVM dev version

				ADD ./common/install_llvm.sh install_llvm.sh

				RUN bash ./install_llvm.sh

				# Install LLVM dev version (Defined in the pytorch/builder github repository)

				COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm

				USER jenkins

				CMD ["bash"]

									
										5

.circleci/docker/ubuntu-rocm/Dockerfile
									
												View File
												
				@ -21,7 +21,7 @@ RUN bash ./install_clang.sh && rm install_clang.sh

				ADD ./common/install_user.sh install_user.sh

				RUN bash ./install_user.sh && rm install_user.sh

				# Install conda

				# Install conda and other packages (e.g., numpy, coverage, pytest)

				ENV PATH /opt/conda/bin:$PATH

				ARG ANACONDA_PYTHON_VERSION

				ADD ./common/install_conda.sh install_conda.sh

				@ -57,7 +57,8 @@ ENV PATH /opt/rocm/bin:$PATH

				ENV PATH /opt/rocm/hcc/bin:$PATH

				ENV PATH /opt/rocm/hip/bin:$PATH

				ENV PATH /opt/rocm/opencl/bin:$PATH

				ENV HIP_PLATFORM hcc

				ENV PATH /opt/rocm/llvm/bin:$PATH

				ENV MAGMA_HOME /opt/rocm/magma

				ENV LANG C.UTF-8

				ENV LC_ALL C.UTF-8

									
										28

.circleci/docker/ubuntu/Dockerfile
									
												View File
												
				@ -33,7 +33,7 @@ ARG KATEX

				ADD ./common/install_katex.sh install_katex.sh

				RUN bash ./install_katex.sh && rm install_katex.sh

				# Install conda

				# Install conda and other packages (e.g., numpy, coverage, pytest)

				ENV PATH /opt/conda/bin:$PATH

				ARG ANACONDA_PYTHON_VERSION

				ADD ./common/install_conda.sh install_conda.sh

				@ -44,12 +44,9 @@ ARG GCC_VERSION

				ADD ./common/install_gcc.sh install_gcc.sh

				RUN bash ./install_gcc.sh && rm install_gcc.sh

				# Install non-standard Python versions (via Travis binaries)

				ARG TRAVIS_PYTHON_VERSION

				ARG TRAVIS_DL_URL_PREFIX

				ENV PATH /opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH

				ADD ./common/install_travis_python.sh install_travis_python.sh

				RUN bash ./install_travis_python.sh && rm install_travis_python.sh

				# Install lcov for C++ code coverage

				ADD ./common/install_lcov.sh install_lcov.sh

				RUN  bash ./install_lcov.sh && rm install_lcov.sh

				# (optional) Install protobuf for ONNX

				ARG PROTOBUF

				@ -85,6 +82,18 @@ RUN rm AndroidManifest.xml

				RUN rm build.gradle

				ENV INSTALLED_ANDROID ${ANDROID}

				# (optional) Install Vulkan SDK

				ARG VULKAN_SDK_VERSION

				ADD ./common/install_vulkan_sdk.sh install_vulkan_sdk.sh

				RUN if [ -n "${VULKAN_SDK_VERSION}" ]; then bash ./install_vulkan_sdk.sh; fi

				RUN rm install_vulkan_sdk.sh

				# (optional) Install swiftshader

				ARG SWIFTSHADER

				ADD ./common/install_swiftshader.sh install_swiftshader.sh

				RUN if [ -n "${SWIFTSHADER}" ]; then bash ./install_swiftshader.sh; fi

				RUN rm install_swiftshader.sh

				# (optional) Install non-default CMake version

				ARG CMAKE_VERSION

				ADD ./common/install_cmake.sh install_cmake.sh

				@ -111,9 +120,8 @@ RUN bash ./install_jni.sh && rm install_jni.sh

				ARG BUILD_ENVIRONMENT

				ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}

				# Install LLVM dev version

				ADD ./common/install_llvm.sh install_llvm.sh

				RUN bash ./install_llvm.sh

				# Install LLVM dev version (Defined in the pytorch/builder github repository)

				COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm

				USER jenkins

				CMD ["bash"]

									
										74

.circleci/ecr_gc_docker/gc.py
									
												View File
												
				@ -88,6 +88,9 @@ parser = argparse.ArgumentParser(description="Delete old Docker tags from regist

				parser.add_argument(

				    "--dry-run", action="store_true", help="Dry run; print tags that would be deleted"

				)

				parser.add_argument(

				    "--debug", action="store_true", help="Debug, print ignored / saved tags"

				)

				parser.add_argument(

				    "--keep-stable-days",

				    type=int,

				@ -164,51 +167,48 @@ for repo in repos(client):

				    # Keep list of image digests to delete for this repository

				    digest_to_delete = []

				    print(repositoryName)

				    for image in images(client, repo):

				        tags = image.get("imageTags")

				        if not isinstance(tags, (list,)) or len(tags) == 0:

				            continue

				        tag = tags[0]

				        created = image["imagePushedAt"]

				        age = now - created

				        if any([

				                looks_like_git_sha(tag),

				                tag.isdigit(),

				                tag.count("-") == 4,  # TODO: Remove, this no longer applies as tags are now built using a SHA1

				                tag in ignore_tags]):

				            window = stable_window

				            if tag in ignore_tags:

				                stable_window_tags.append((repositoryName, tag, "", age, created))

				            elif age < window:

				                stable_window_tags.append((repositoryName, tag, window, age, created))

				        else:

				            window = unstable_window

				        for tag in tags:

				            if any([

				                    looks_like_git_sha(tag),

				                    tag.isdigit(),

				                    tag.count("-") == 4,  # TODO: Remove, this no longer applies as tags are now built using a SHA1

				                    tag in ignore_tags]):

				                window = stable_window

				                if tag in ignore_tags:

				                    stable_window_tags.append((repositoryName, tag, "", age, created))

				                elif age < window:

				                    stable_window_tags.append((repositoryName, tag, window, age, created))

				            else:

				                window = unstable_window

				        if tag in ignore_tags:

				            print("Ignoring tag {}:{} (age: {})".format(repositoryName, tag, age))

				            continue

				        if age < window:

				            print("Not deleting manifest for tag {}:{} (age: {})".format(repositoryName, tag, age))

				            continue

				        if args.dry_run:

				            print("(dry run) Deleting manifest for tag {}:{} (age: {})".format(repositoryName, tag, age))

				            if tag in ignore_tags or age < window:

				                if args.debug:

				                    print("Ignoring {}:{} (age: {})".format(repositoryName, tag, age))

				                break

				        else:

				            print("Deleting manifest for tag{}:{} (age: {})".format(repositoryName, tag, age))

				            for tag in tags:

				                print("{}Deleting {}:{} (age: {})".format("(dry run) " if args.dry_run else "", repositoryName, tag, age))

				            digest_to_delete.append(image["imageDigest"])

				    if args.dry_run:

				        if args.debug:

				            print("Skipping actual deletion, moving on...")

				    else:

				        # Issue batch delete for all images to delete for this repository

				        # Note that as of 2018-07-25, the maximum number of images you can

				        # delete in a single batch is 100, so chunk our list into batches of

				        # 100

				        for c in chunks(digest_to_delete, 100):

				            client.batch_delete_image(

				                registryId="308535385114",

				                repositoryName=repositoryName,

				                imageIds=[{"imageDigest": digest} for digest in c],

				            )

				    # Issue batch delete for all images to delete for this repository

				    # Note that as of 2018-07-25, the maximum number of images you can

				    # delete in a single batch is 100, so chunk our list into batches of

				    # 100

				    for c in chunks(digest_to_delete, 100):

				        client.batch_delete_image(

				            registryId="308535385114",

				            repositoryName=repositoryName,

				            imageIds=[{"imageDigest": digest} for digest in c],

				        )

				    save_to_s3(args.filter_prefix, stable_window_tags)

				        save_to_s3(args.filter_prefix, stable_window_tags)

									
										35

.circleci/generate_config_yml.py
									
												View File
												
				@ -8,10 +8,9 @@ Please see README.md in this directory for details.

				import os

				import shutil

				import sys

				from collections import OrderedDict, namedtuple

				from collections import namedtuple

				import cimodel.data.binary_build_definitions as binary_build_definitions

				import cimodel.data.caffe2_build_definitions as caffe2_build_definitions

				import cimodel.data.pytorch_build_definitions as pytorch_build_definitions

				import cimodel.data.simple.android_definitions

				import cimodel.data.simple.bazel_definitions

				@ -23,6 +22,7 @@ import cimodel.data.simple.macos_definitions

				import cimodel.data.simple.mobile_definitions

				import cimodel.data.simple.nightly_android

				import cimodel.data.simple.nightly_ios

				import cimodel.data.simple.anaconda_prune_defintions

				import cimodel.data.windows_build_definitions as windows_build_definitions

				import cimodel.lib.miniutils as miniutils

				import cimodel.lib.miniyaml as miniyaml

				@ -83,6 +83,7 @@ class Header(object):

				def gen_build_workflows_tree():

				    build_workflows_functions = [

				        cimodel.data.simple.docker_definitions.get_workflow_jobs,

				        pytorch_build_definitions.get_workflow_jobs,

				        cimodel.data.simple.macos_definitions.get_workflow_jobs,

				        cimodel.data.simple.android_definitions.get_workflow_jobs,

				@ -90,23 +91,19 @@ def gen_build_workflows_tree():

				        cimodel.data.simple.mobile_definitions.get_workflow_jobs,

				        cimodel.data.simple.ge_config_tests.get_workflow_jobs,

				        cimodel.data.simple.bazel_definitions.get_workflow_jobs,

				        caffe2_build_definitions.get_workflow_jobs,

				        cimodel.data.simple.binary_smoketest.get_workflow_jobs,

				        cimodel.data.simple.nightly_ios.get_workflow_jobs,

				        cimodel.data.simple.nightly_android.get_workflow_jobs,

				        cimodel.data.simple.anaconda_prune_defintions.get_workflow_jobs,

				        windows_build_definitions.get_windows_workflows,

				        binary_build_definitions.get_post_upload_jobs,

				        binary_build_definitions.get_binary_smoke_test_jobs,

				    ]

				    binary_build_functions = [

				        binary_build_definitions.get_binary_build_jobs,

				        binary_build_definitions.get_nightly_tests,

				        binary_build_definitions.get_nightly_uploads,

				        binary_build_definitions.get_post_upload_jobs,

				        binary_build_definitions.get_binary_smoke_test_jobs,

				    ]

				    docker_builder_functions = [

				        cimodel.data.simple.docker_definitions.get_workflow_jobs

				    ]

				    return {

				@ -115,20 +112,10 @@ def gen_build_workflows_tree():

				                "when": r"<< pipeline.parameters.run_binary_tests >>",

				                "jobs": [f() for f in binary_build_functions],

				            },

				            "docker_build": OrderedDict(

				                {

				                    "triggers": [

				                        {

				                            "schedule": {

				                                "cron": miniutils.quote("0 15 * * 0"),

				                                "filters": {"branches": {"only": ["master"]}},

				                            }

				                        }

				                    ],

				                    "jobs": [f() for f in docker_builder_functions],

				                }

				            ),

				            "build": {"jobs": [f() for f in build_workflows_functions]},

				            "build": {

				                "when": r"<< pipeline.parameters.run_build >>",

				                "jobs": [f() for f in build_workflows_functions]

				            },

				        }

				    }

				@ -140,12 +127,10 @@ YAML_SOURCES = [

				    File("nightly-binary-build-defaults.yml"),

				    Header("Build parameters"),

				    File("build-parameters/pytorch-build-params.yml"),

				    File("build-parameters/caffe2-build-params.yml"),

				    File("build-parameters/binary-build-params.yml"),

				    File("build-parameters/promote-build-params.yml"),

				    Header("Job specs"),

				    File("job-specs/pytorch-job-specs.yml"),

				    File("job-specs/caffe2-job-specs.yml"),

				    File("job-specs/binary-job-specs.yml"),

				    File("job-specs/job-specs-custom.yml"),

				    File("job-specs/job-specs-promote.yml"),

									
										5

.circleci/scripts/binary_checkout.sh
									
												View File
												
				@ -33,6 +33,11 @@ else

				  export BUILDER_ROOT="$workdir/builder"

				fi

				# Try to extract PR number from branch if not already set

				if [[ -z "${CIRCLE_PR_NUMBER:-}" ]]; then

				  CIRCLE_PR_NUMBER="$(echo ${CIRCLE_BRANCH} | sed -E -n 's/pull\/([0-9]*).*/\1/p')"

				fi

				# Clone the Pytorch branch

				retry git clone https://github.com/pytorch/pytorch.git "$PYTORCH_ROOT"

				pushd "$PYTORCH_ROOT"

									
										3

.circleci/scripts/binary_ios_build.sh
									
												View File
												
				@ -15,7 +15,8 @@ export PATH="~/anaconda/bin:${PATH}"

				source ~/anaconda/bin/activate

				# Install dependencies

				conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes

				conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests --yes

				conda install -c conda-forge valgrind --yes

				export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

				# sync submodules

									
										4

.circleci/scripts/binary_ios_test.sh
									
												View File
												
				@ -13,7 +13,7 @@ base64 --decode cert.txt -o Certificates.p12

				rm cert.txt

				bundle exec fastlane install_cert

				# install the provisioning profile

				PROFILE=TestApp_CI.mobileprovision

				PROFILE=PyTorch_CI_2021.mobileprovision

				PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				mkdir -pv "${PROVISIONING_PROFILES}"

				cd "${PROVISIONING_PROFILES}"

				@ -25,5 +25,5 @@ if ! [ -x "$(command -v xcodebuild)" ]; then

				    echo 'Error: xcodebuild is not installed.'

				    exit 1

				fi 

				PROFILE=TestApp_CI

				PROFILE=PyTorch_CI_2021

				ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

									
										10

.circleci/scripts/binary_ios_upload.sh
									
												View File
												
				@ -14,7 +14,7 @@ mkdir -p ${ZIP_DIR}/src

				cp -R ${ARTIFACTS_DIR}/arm64/include ${ZIP_DIR}/install/

				# build a FAT bianry

				cd ${ZIP_DIR}/install/lib

				target_libs=(libc10.a libclog.a libcpuinfo.a libeigen_blas.a libpytorch_qnnpack.a libtorch_cpu.a libtorch.a libXNNPACK.a)

				target_libs=(libc10.a libclog.a libcpuinfo.a libeigen_blas.a libpthreadpool.a libpytorch_qnnpack.a libtorch_cpu.a libtorch.a libXNNPACK.a)

				for lib in ${target_libs[*]}

				do

				    if [ -f "${ARTIFACTS_DIR}/x86_64/lib/${lib}" ] && [ -f "${ARTIFACTS_DIR}/arm64/lib/${lib}" ]; then

				@ -34,7 +34,13 @@ touch version.txt

				echo $(date +%s) > version.txt

				zip -r ${ZIPFILE} install src version.txt LICENSE

				# upload to aws

				brew install awscli

				# Install conda then 'conda install' awscli

				curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				chmod +x ~/conda.sh

				/bin/bash ~/conda.sh -b -p ~/anaconda

				export PATH="~/anaconda/bin:${PATH}"

				source ~/anaconda/bin/activate

				conda install -c conda-forge awscli --yes

				set +x

				export AWS_ACCESS_KEY_ID=${AWS_S3_ACCESS_KEY_FOR_PYTORCH_BINARY_UPLOAD}

				export AWS_SECRET_ACCESS_KEY=${AWS_S3_ACCESS_SECRET_FOR_PYTORCH_BINARY_UPLOAD}

									
										20

.circleci/scripts/binary_linux_build.sh
									
												View File
												
				@ -5,26 +5,22 @@ set -eux -o pipefail

				source /env

				# Defaults here so they can be changed in one place

				export MAX_JOBS=12

				export MAX_JOBS=${MAX_JOBS:-$(( $(nproc) - 2 ))}

				if [[ "${DESIRED_CUDA}" == "cu111" ]]; then

				  export BUILD_SPLIT_CUDA="ON"

				fi

				# Parse the parameters

				if [[ "$PACKAGE_TYPE" == 'conda' ]]; then

				  build_script='conda/build_pytorch.sh'

				elif [[ "$DESIRED_CUDA" == cpu ]]; then

				  build_script='manywheel/build_cpu.sh'

				elif [[ "$DESIRED_CUDA" == *"rocm"* ]]; then

				  build_script='manywheel/build_rocm.sh'

				else

				  build_script='manywheel/build.sh'

				fi

				# We want to call unbuffer, which calls tclsh which finds the expect

				# package. The expect was installed by yum into /usr/bin so we want to

				# find /usr/bin/tclsh, but this is shadowed by /opt/conda/bin/tclsh in

				# the conda docker images, so we prepend it to the path here.

				if [[ "$PACKAGE_TYPE" == 'conda' ]]; then

				  mkdir /just_tclsh_bin

				  ln -s /usr/bin/tclsh /just_tclsh_bin/tclsh

				  export PATH=/just_tclsh_bin:$PATH

				fi

				# Build the package

				SKIP_ALL_TESTS=1 unbuffer "/builder/$build_script" | ts

				SKIP_ALL_TESTS=1 "/builder/$build_script"

									
										60

.circleci/scripts/binary_linux_test.sh
									
												View File
												
				@ -5,12 +5,17 @@ cat >/home/circleci/project/ci_test_script.sh <<EOL

				# =================== The following code will be executed inside Docker container ===================

				set -eux -o pipefail

				python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"

				# Set up Python

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  # There was a bug that was introduced in conda-package-handling >= 1.6.1 that makes archives

				  # above a certain size fail out when attempting to extract

				  # see: https://github.com/conda/conda-package-handling/issues/71

				  conda install -y conda-package-handling=1.6.0

				  retry conda create -qyn testenv python="$DESIRED_PYTHON"

				  source activate testenv >/dev/null

				elif [[ "$PACKAGE_TYPE" != libtorch ]]; then

				  python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"

				  python_path="/opt/python/cp\$python_nodot-cp\${python_nodot}"

				  # Prior to Python 3.8 paths were suffixed with an 'm'

				  if [[ -d  "\${python_path}/bin" ]]; then

				@ -20,6 +25,19 @@ elif [[ "$PACKAGE_TYPE" != libtorch ]]; then

				  fi

				fi

				EXTRA_CONDA_FLAGS=""

				NUMPY_PIN=""

				if [[ "\$python_nodot" = *39* ]]; then

				  EXTRA_CONDA_FLAGS="-c=conda-forge"

				  # There's an issue with conda channel priority where it'll randomly pick 1.19 over 1.20

				  # we set a lower boundary here just to be safe

				  NUMPY_PIN=">=1.20"

				fi

				if [[ "$DESIRED_CUDA" == "cu112" ]]; then

				  EXTRA_CONDA_FLAGS="-c=conda-forge"

				fi

				# Install the package

				# These network calls should not have 'retry's because they are installing

				# locally and aren't actually network calls

				@ -28,23 +46,37 @@ fi

				#   conda build scripts themselves. These should really be consolidated

				pkg="/final_pkgs/\$(ls /final_pkgs)"

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  conda install -y "\$pkg" --offline

				  if [[ "$DESIRED_CUDA" == 'cpu' ]]; then

				    retry conda install -y cpuonly -c pytorch

				  fi

				  retry conda install -yq future numpy protobuf six

				  if [[ "$DESIRED_CUDA" != 'cpu' ]]; then

				    # DESIRED_CUDA is in format cu90 or cu102

				    if [[ "${#DESIRED_CUDA}" == 4 ]]; then

				      cu_ver="${DESIRED_CUDA:2:1}.${DESIRED_CUDA:3}"

				  (

				    # For some reason conda likes to re-activate the conda environment when attempting this install

				    # which means that a deactivate is run and some variables might not exist when that happens,

				    # namely CONDA_MKL_INTERFACE_LAYER_BACKUP from libblas so let's just ignore unbound variables when

				    # it comes to the conda installation commands

				    set +u

				    retry conda install \${EXTRA_CONDA_FLAGS} -yq \

				      "numpy\${NUMPY_PIN}" \

				      future \

				      mkl>=2018 \

				      ninja \

				      dataclasses \

				      typing-extensions \

				      defaults::protobuf \

				      six

				    if [[ "$DESIRED_CUDA" == 'cpu' ]]; then

				      retry conda install -c pytorch -y cpuonly

				    else

				      cu_ver="${DESIRED_CUDA:2:2}.${DESIRED_CUDA:4}"

				      # DESIRED_CUDA is in format cu90 or cu102

				      if [[ "${#DESIRED_CUDA}" == 4 ]]; then

				        cu_ver="${DESIRED_CUDA:2:1}.${DESIRED_CUDA:3}"

				      else

				        cu_ver="${DESIRED_CUDA:2:2}.${DESIRED_CUDA:4}"

				      fi

				      retry conda install \${EXTRA_CONDA_FLAGS} -yq -c nvidia -c pytorch "cudatoolkit=\${cu_ver}"

				    fi

				    retry conda install -yq -c pytorch "cudatoolkit=\${cu_ver}"

				  fi

				    conda install \${EXTRA_CONDA_FLAGS} -y "\$pkg" --offline

				  )

				elif [[ "$PACKAGE_TYPE" != libtorch ]]; then

				  pip install "\$pkg"

				  retry pip install -q future numpy protobuf six

				  retry pip install -q future numpy protobuf typing-extensions six

				fi

				if [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  pkg="\$(ls /final_pkgs/*-latest.zip)"

									
										49

.circleci/scripts/binary_linux_upload.sh
									
												View File
											
				@ -1,49 +0,0 @@

				#!/bin/bash

				# Do NOT set -x

				source /home/circleci/project/env

				set -eu -o pipefail

				set +x

				declare -x "AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"

				declare -x "AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				# DO NOT TURN -x ON BEFORE THIS LINE

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				set -eux -o pipefail

				export PATH="$MINICONDA_ROOT/bin:$PATH"

				# This gets set in binary_populate_env.sh, but lets have a sane default just in case

				PIP_UPLOAD_FOLDER=${PIP_UPLOAD_FOLDER:-nightly}

				# TODO: Combine CONDA_UPLOAD_CHANNEL and PIP_UPLOAD_FOLDER into one variable

				#       The only difference is the trailing slash

				# Strip trailing slashes if there

				CONDA_UPLOAD_CHANNEL=$(echo "${PIP_UPLOAD_FOLDER}" | sed 's:/*$::')

				BACKUP_BUCKET="s3://pytorch-backup"

				# Upload the package to the final location

				pushd /home/circleci/project/final_pkgs

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  retry conda install -yq anaconda-client

				  retry anaconda -t "${CONDA_PYTORCHBOT_TOKEN}" upload  "$(ls)" -u "pytorch-${CONDA_UPLOAD_CHANNEL}" --label main --no-progress --force

				  # Fetch  platform (eg. win-64, linux-64, etc.) from index file

				  # Because there's no actual conda command to read this

				  subdir=$(tar -xOf ./*.bz2 info/index.json | grep subdir  | cut -d ':' -f2 | sed -e 's/[[:space:]]//' -e 's/"//g' -e 's/,//')

				  BACKUP_DIR="conda/${subdir}"

				elif [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  for pkg in $(ls); do

				    retry aws s3 cp "$pkg" "$s3_dir" --acl public-read

				  done

				  BACKUP_DIR="libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				else

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  retry aws s3 cp "$(ls)" "$s3_dir" --acl public-read

				  BACKUP_DIR="whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				fi

				if [[ -n "${CIRCLE_TAG:-}" ]]; then

				  s3_dir="${BACKUP_BUCKET}/${CIRCLE_TAG}/${BACKUP_DIR}"

				  retry aws s3 cp . "$s3_dir"

				fi

									
										4

.circleci/scripts/binary_macos_test.sh
									
												View File
												
				@ -20,9 +20,9 @@ if [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  unzip "$pkg" -d /tmp

				  cd /tmp/libtorch

				elif [[ "$PACKAGE_TYPE" == conda ]]; then

				  conda install -y "$pkg" --offline

				  conda install -y "$pkg"

				else

				  pip install "$pkg" --no-index --no-dependencies -v

				  pip install "$pkg" -v

				fi

				# Test

									
										49

.circleci/scripts/binary_macos_upload.sh
									
												View File
											
				@ -1,49 +0,0 @@

				#!/bin/bash

				# Do NOT set -x

				set -eu -o pipefail

				set +x

				export AWS_ACCESS_KEY_ID="${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"

				export AWS_SECRET_ACCESS_KEY="${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				# DO NOT TURN -x ON BEFORE THIS LINE

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				set -eux -o pipefail

				source "/Users/distiller/project/env"

				export "PATH=$workdir/miniconda/bin:$PATH"

				# This gets set in binary_populate_env.sh, but lets have a sane default just in case

				PIP_UPLOAD_FOLDER=${PIP_UPLOAD_FOLDER:-nightly}

				# TODO: Combine CONDA_UPLOAD_CHANNEL and PIP_UPLOAD_FOLDER into one variable

				#       The only difference is the trailing slash

				# Strip trailing slashes if there

				CONDA_UPLOAD_CHANNEL=$(echo "${PIP_UPLOAD_FOLDER}" | sed 's:/*$::')

				BACKUP_BUCKET="s3://pytorch-backup"

				pushd "$workdir/final_pkgs"

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  retry conda install -yq anaconda-client

				  retry anaconda -t "${CONDA_PYTORCHBOT_TOKEN}" upload "$(ls)" -u "pytorch-${CONDA_UPLOAD_CHANNEL}" --label main --no-progress --force

				  # Fetch  platform (eg. win-64, linux-64, etc.) from index file

				  # Because there's no actual conda command to read this

				  subdir=$(tar -xOf ./*.bz2 info/index.json | grep subdir  | cut -d ':' -f2 | sed -e 's/[[:space:]]//' -e 's/"//g' -e 's/,//')

				  BACKUP_DIR="conda/${subdir}"

				elif [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  for pkg in $(ls); do

				    retry aws s3 cp "$pkg" "$s3_dir" --acl public-read

				  done

				  BACKUP_DIR="libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				else

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  retry aws s3 cp "$(ls)" "$s3_dir" --acl public-read

				  BACKUP_DIR="whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				fi

				if [[ -n "${CIRCLE_TAG:-}" ]]; then

				  s3_dir="${BACKUP_BUCKET}/${CIRCLE_TAG}/${BACKUP_DIR}"

				  retry aws s3 cp . "$s3_dir"

				fi

									
										13

.circleci/scripts/binary_populate_env.sh
									
												View File
												
				@ -73,7 +73,7 @@ PIP_UPLOAD_FOLDER='nightly/'

				# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it

				export DATE="$(date -u +%Y%m%d)"

				#TODO: We should be pulling semver version from the base version.txt

				BASE_BUILD_VERSION="1.6.0.dev$DATE"

				BASE_BUILD_VERSION="1.8.0.dev$DATE"

				# Change BASE_BUILD_VERSION to git tag when on a git tag

				# Use 'git -C' to make doubly sure we're in the correct directory for checking

				# the git tag

				@ -85,7 +85,7 @@ if tagged_version >/dev/null; then

				  # Turns tag v1.6.0-rc1 -> v1.6.0

				  BASE_BUILD_VERSION="$(tagged_version | sed -e 's/^v//' -e 's/-.*$//')"

				fi

				if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu102" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then

				if [[ "$(uname)" == 'Darwin' ]] || [[ "$PACKAGE_TYPE" == conda ]]; then

				  export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}"

				else

				  export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}+$DESIRED_CUDA"

				@ -100,8 +100,14 @@ if [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  POSSIBLE_JAVA_HOMES+=(/usr/local)

				  POSSIBLE_JAVA_HOMES+=(/usr/lib/jvm/java-8-openjdk-amd64)

				  POSSIBLE_JAVA_HOMES+=(/Library/Java/JavaVirtualMachines/*.jdk/Contents/Home)

				  # Add the Windows-specific JNI path

				  POSSIBLE_JAVA_HOMES+=("$PWD/.circleci/windows-jni/")

				  for JH in "${POSSIBLE_JAVA_HOMES[@]}" ; do

				    if [[ -e "$JH/include/jni.h" ]] ; then

				      # Skip if we're not on Windows but haven't found a JAVA_HOME

				      if [[ "$JH" == "$PWD/.circleci/windows-jni/" && "$OSTYPE" != "msys" ]] ; then

				        break

				      fi

				      echo "Found jni.h under $JH"

				      JAVA_HOME="$JH"

				      BUILD_JNI=ON

				@ -130,7 +136,7 @@ if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then

				fi

				export DATE="$DATE"

				export NIGHTLIES_DATE_PREAMBLE=1.6.0.dev

				export NIGHTLIES_DATE_PREAMBLE=1.8.0.dev

				export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"

				export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"

				export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"

				@ -161,6 +167,7 @@ export CIRCLE_TAG="${CIRCLE_TAG:-}"

				export CIRCLE_SHA1="$CIRCLE_SHA1"

				export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"

				export CIRCLE_BRANCH="$CIRCLE_BRANCH"

				export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"

				# =================== The above code will be executed inside Docker container ===================

				EOL

									
										2

.circleci/scripts/binary_run_in_docker.sh
									
												View File
												
				@ -19,7 +19,7 @@ chmod +x /home/circleci/project/ci_test_script.sh

				VOLUME_MOUNTS="-v /home/circleci/project/:/circleci_stuff -v /home/circleci/project/final_pkgs:/final_pkgs -v ${PYTORCH_ROOT}:/pytorch -v ${BUILDER_ROOT}:/builder"

				# Run the docker

				if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

				  export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia ${VOLUME_MOUNTS} -t -d "${DOCKER_IMAGE}")

				  export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --gpus all ${VOLUME_MOUNTS} -t -d "${DOCKER_IMAGE}")

				else

				  export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined ${VOLUME_MOUNTS} -t -d "${DOCKER_IMAGE}")

				fi

									
										98

.circleci/scripts/binary_upload.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,98 @@

				#!/usr/bin/env bash

				set -euo pipefail

				PACKAGE_TYPE=${PACKAGE_TYPE:-conda}

				PKG_DIR=${PKG_DIR:-/tmp/workspace/final_pkgs}

				# Designates whether to submit as a release candidate or a nightly build

				# Value should be `test` when uploading release candidates

				# currently set within `designate_upload_channel`

				UPLOAD_CHANNEL=${UPLOAD_CHANNEL:-nightly}

				# Designates what subfolder to put packages into

				UPLOAD_SUBFOLDER=${UPLOAD_SUBFOLDER:-cpu}

				UPLOAD_BUCKET="s3://pytorch"

				BACKUP_BUCKET="s3://pytorch-backup"

				DRY_RUN=${DRY_RUN:-enabled}

				# Don't actually do work unless explicit

				ANACONDA="true anaconda"

				AWS_S3_CP="aws s3 cp --dryrun"

				if [[ "${DRY_RUN}" = "disabled" ]]; then

				  ANACONDA="anaconda"

				  AWS_S3_CP="aws s3 cp"

				fi

				do_backup() {

				  local backup_dir

				  backup_dir=$1

				  (

				    pushd /tmp/workspace

				    set -x

				    ${AWS_S3_CP} --recursive . "${BACKUP_BUCKET}/${CIRCLE_TAG}/${backup_dir}/"

				  )

				}

				conda_upload() {

				  (

				    set -x

				    ${ANACONDA} \

				      upload  \

				      ${PKG_DIR}/*.tar.bz2 \

				      -u "pytorch-${UPLOAD_CHANNEL}" \

				      --label main \

				      --no-progress \

				      --force

				  )

				}

				s3_upload() {

				  local extension

				  local pkg_type

				  extension="$1"

				  pkg_type="$2"

				  s3_dir="${UPLOAD_BUCKET}/${pkg_type}/${UPLOAD_CHANNEL}/${UPLOAD_SUBFOLDER}/"

				  (

				    for pkg in ${PKG_DIR}/*.${extension}; do

				      (

				        set -x

				        ${AWS_S3_CP} --no-progress --acl public-read "${pkg}" "${s3_dir}"

				      )

				    done

				  )

				}

				case "${PACKAGE_TYPE}" in

				  conda)

				    conda_upload

				    # Fetch  platform (eg. win-64, linux-64, etc.) from index file

				    # Because there's no actual conda command to read this

				    subdir=$(\

				      tar -xOf ${PKG_DIR}/*.bz2 info/index.json \

				        | grep subdir  \

				        | cut -d ':' -f2 \

				        | sed -e 's/[[:space:]]//' -e 's/"//g' -e 's/,//' \

				    )

				    BACKUP_DIR="conda/${subdir}"

				    ;;

				  libtorch)

				    s3_upload "zip" "libtorch"

				    BACKUP_DIR="libtorch/${UPLOAD_CHANNEL}/${UPLOAD_SUBFOLDER}"

				    ;;

				  # wheel can either refer to wheel/manywheel

				  *wheel)

				    s3_upload "whl" "whl"

				    BACKUP_DIR="whl/${UPLOAD_CHANNEL}/${UPLOAD_SUBFOLDER}"

				    ;;

				  *)

				    echo "ERROR: unknown package type: ${PACKAGE_TYPE}"

				    exit 1

				    ;;

				esac

				# CIRCLE_TAG is defined by upstream circleci,

				# this can be changed to recognize tagged versions

				if [[ -n "${CIRCLE_TAG:-}" ]]; then

				  do_backup "${BACKUP_DIR}"

				fi

									
										4

.circleci/scripts/binary_windows_build.sh
									
												View File
												
				@ -15,6 +15,10 @@ else

				  export VC_YEAR=2019

				fi

				if [[ "${DESIRED_CUDA}" == "cu111" ]]; then

				  export BUILD_SPLIT_CUDA="ON"

				fi

				set +x

				export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}

				export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}

									
										48

.circleci/scripts/binary_windows_upload.sh
									
												View File
											
				@ -1,48 +0,0 @@

				#!/bin/bash

				set -eu -o pipefail

				set +x

				declare -x "AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"

				declare -x "AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				# DO NOT TURN -x ON BEFORE THIS LINE

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				set -eux -o pipefail

				source "/env"

				# This gets set in binary_populate_env.sh, but lets have a sane default just in case

				PIP_UPLOAD_FOLDER=${PIP_UPLOAD_FOLDER:-nightly/}

				# TODO: Combine CONDA_UPLOAD_CHANNEL and PIP_UPLOAD_FOLDER into one variable

				#       The only difference is the trailing slash

				# Strip trailing slashes if there

				CONDA_UPLOAD_CHANNEL=$(echo "${PIP_UPLOAD_FOLDER}" | sed 's:/*$::')

				BACKUP_BUCKET="s3://pytorch-backup"

				pushd /root/workspace/final_pkgs

				# Upload the package to the final location

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  retry conda install -yq anaconda-client

				  retry anaconda -t "${CONDA_PYTORCHBOT_TOKEN}" upload  "$(ls)" -u "pytorch-${CONDA_UPLOAD_CHANNEL}" --label main --no-progress --force

				  # Fetch  platform (eg. win-64, linux-64, etc.) from index file

				  # Because there's no actual conda command to read this

				  subdir=$(tar -xOf ./*.bz2 info/index.json | grep subdir  | cut -d ':' -f2 | sed -e 's/[[:space:]]//' -e 's/"//g' -e 's/,//')

				  BACKUP_DIR="conda/${subdir}"

				elif [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  retry conda install -c conda-forge -yq awscli

				  s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  for pkg in $(ls); do

				    retry aws s3 cp "$pkg" "$s3_dir" --acl public-read

				  done

				  BACKUP_DIR="libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				else

				  retry conda install -c conda-forge -yq awscli

				  s3_dir="s3://pytorch/whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  retry aws s3 cp "$(ls)" "$s3_dir" --acl public-read

				  BACKUP_DIR="whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				fi

				if [[ -n "${CIRCLE_TAG:-}" ]]; then

				  s3_dir="${BACKUP_BUCKET}/${CIRCLE_TAG}/${BACKUP_DIR}"

				  retry aws s3 cp . "$s3_dir"

				fi

									
										46

.circleci/scripts/build_android_gradle.sh
									
												View File
												
				@ -1,7 +1,11 @@

				#!/usr/bin/env bash

				set -eux -o pipefail

				env

				echo "BUILD_ENVIRONMENT:$BUILD_ENVIRONMENT"

				export ANDROID_NDK_HOME=/opt/ndk

				export ANDROID_NDK=/opt/ndk

				export ANDROID_HOME=/opt/android/sdk

				# Must be in sync with GRADLE_VERSION in docker image for android

				@ -10,6 +14,31 @@ export GRADLE_VERSION=4.10.3

				export GRADLE_HOME=/opt/gradle/gradle-$GRADLE_VERSION

				export GRADLE_PATH=$GRADLE_HOME/bin/gradle

				# touch gradle cache files to prevent expiration

				while IFS= read -r -d '' file

				do

				  touch "$file" || true

				done < <(find /var/lib/jenkins/.gradle -type f -print0)

				export GRADLE_LOCAL_PROPERTIES=~/workspace/android/local.properties

				rm -f $GRADLE_LOCAL_PROPERTIES

				echo "sdk.dir=/opt/android/sdk" >> $GRADLE_LOCAL_PROPERTIES

				echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES

				echo "cmake.dir=/usr/local" >> $GRADLE_LOCAL_PROPERTIES

				retry () {

				  $* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)

				}

				# Run custom build script

				if [[ "${BUILD_ENVIRONMENT}" == *-gradle-custom-build* ]]; then

				  # Install torch & torchvision - used to download & dump used ops from test model.

				  retry pip install torch torchvision --progress-bar off

				  exec "$(dirname "${BASH_SOURCE[0]}")/../../android/build_test_app_custom.sh" armeabi-v7a

				fi

				# Run default build

				BUILD_ANDROID_INCLUDE_DIR_x86=~/workspace/build_android/install/include

				BUILD_ANDROID_LIB_DIR_x86=~/workspace/build_android/install/lib

				@ -44,9 +73,6 @@ ln -s ${BUILD_ANDROID_INCLUDE_DIR_arm_v8a} ${JNI_INCLUDE_DIR}/arm64-v8a

				ln -s ${BUILD_ANDROID_LIB_DIR_arm_v8a} ${JNI_LIBS_DIR}/arm64-v8a

				fi

				env

				echo "BUILD_ENVIRONMENT:$BUILD_ENVIRONMENT"

				GRADLE_PARAMS="-p android assembleRelease --debug --stacktrace"

				if [[ "${BUILD_ENVIRONMENT}" == *-gradle-build-only-x86_32* ]]; then

				    GRADLE_PARAMS+=" -PABI_FILTERS=x86"

				@ -56,20 +82,6 @@ if [ -n "{GRADLE_OFFLINE:-}" ]; then

				    GRADLE_PARAMS+=" --offline"

				fi

				# touch gradle cache files to prevent expiration

				while IFS= read -r -d '' file

				do

				  touch "$file" || true

				done < <(find /var/lib/jenkins/.gradle -type f -print0)

				env

				export GRADLE_LOCAL_PROPERTIES=~/workspace/android/local.properties

				rm -f $GRADLE_LOCAL_PROPERTIES

				echo "sdk.dir=/opt/android/sdk" >> $GRADLE_LOCAL_PROPERTIES

				echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES

				echo "cmake.dir=/usr/local" >> $GRADLE_LOCAL_PROPERTIES

				$GRADLE_PATH $GRADLE_PARAMS

				find . -type f -name "*.a" -exec ls -lh {} \;

									
										43

.circleci/scripts/cpp_doc_push_script.sh
									
												View File
												
				@ -30,13 +30,7 @@ if [ "$version" == "master" ]; then

				  is_master_doc=true

				fi

				# Argument 3: (optional) If present, we will NOT do any pushing. Used for testing.

				dry_run=false

				if [ "$3" != "" ]; then

				  dry_run=true

				fi

				echo "install_path: $install_path  version: $version  dry_run: $dry_run"

				echo "install_path: $install_path  version: $version"

				# ======================== Building PyTorch C++ API Docs ========================

				@ -53,31 +47,22 @@ sudo apt-get -y install doxygen

				# Generate ATen files

				pushd "${pt_checkout}"

				pip install -r requirements.txt

				time python aten/src/ATen/gen.py \

				time python -m tools.codegen.gen \

				  -s aten/src/ATen \

				  -d build/aten/src/ATen \

				  aten/src/ATen/Declarations.cwrap \

				  aten/src/THCUNN/generic/THCUNN.h \

				  aten/src/ATen/nn.yaml \

				  aten/src/ATen/native/native_functions.yaml

				  -d build/aten/src/ATen

				# Copy some required files

				cp aten/src/ATen/common_with_cwrap.py tools/shared/cwrap_common.py

				cp torch/_utils_internal.py tools/shared

				# Generate PyTorch files

				time python tools/setup_helpers/generate_code.py \

				  --declarations-path build/aten/src/ATen/Declarations.yaml \

				  --native-functions-path aten/src/ATen/native/native_functions.yaml \

				  --nn-path aten/src/

				# Build the docs

				pushd docs/cpp

				pip install breathe==4.13.0 bs4 lxml six

				pip install --no-cache-dir -e "git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme"

				pip install exhale>=0.2.1

				pip install sphinx==2.4.4

				# Uncomment once it is fixed

				# pip install -r requirements.txt

				pip install -r requirements.txt

				time make VERBOSE=1 html -j

				popd

				@ -103,24 +88,8 @@ git status

				git config user.email "soumith+bot@pytorch.org"

				git config user.name "pytorchbot"

				# If there aren't changes, don't make a commit; push is no-op

				git commit -m "Automatic sync on $(date)" || true

				git commit -m "Generate C++ docs from pytorch/pytorch@$CIRCLE_SHA1" || true

				git status

				if [ "$dry_run" = false ]; then

				  echo "Pushing to https://github.com/pytorch/cppdocs"

				  set +x

				/usr/bin/expect <<DONE

				  spawn git push -u origin master

				  expect "Username*"

				  send "pytorchbot\n"

				  expect "Password*"

				  send "$::env(GITHUB_PYTORCHBOT_TOKEN)\n"

				  expect eof

				DONE

				  set -x

				else

				  echo "Skipping push due to dry_run"

				fi

				popd

				# =================== The above code **should** be executed inside Docker container ===================

									
										8

.circleci/scripts/driver_update.bat
									
										Normal file
									
												View File
												
				@ -0,0 +1,8 @@

				set "DRIVER_DOWNLOAD_LINK=https://s3.amazonaws.com/ossci-windows/452.39-data-center-tesla-desktop-win10-64bit-international.exe"

				curl --retry 3 -kL %DRIVER_DOWNLOAD_LINK% --output 452.39-data-center-tesla-desktop-win10-64bit-international.exe

				if errorlevel 1 exit /b 1

				start /wait 452.39-data-center-tesla-desktop-win10-64bit-international.exe -s -noreboot

				if errorlevel 1 exit /b 1

				del 452.39-data-center-tesla-desktop-win10-64bit-international.exe || ver > NUL

									
										79

.circleci/scripts/python_doc_push_script.sh
									
												View File
												
				@ -7,6 +7,8 @@ sudo apt-get -y install expect-dev

				# This is where the local pytorch install in the docker image is located

				pt_checkout="/var/lib/jenkins/workspace"

				source "$pt_checkout/.jenkins/pytorch/common_utils.sh"

				echo "python_doc_push_script.sh: Invoked with $*"

				set -ex

				@ -38,15 +40,30 @@ echo "error: python_doc_push_script.sh: branch (arg3) not specified"

				  exit 1

				fi

				# Argument 4: (optional) If present, we will NOT do any pushing. Used for testing.

				dry_run=false

				if [ "$4" != "" ]; then

				  dry_run=true

				fi

				echo "install_path: $install_path  version: $version"

				echo "install_path: $install_path  version: $version  dry_run: $dry_run"

				git clone https://github.com/pytorch/pytorch.github.io -b $branch

				build_docs () {

				  set +e

				  set -o pipefail

				  make $1 2>&1 | tee /tmp/docs_build.txt

				  code=$?

				  if [ $code -ne 0 ]; then

				    set +x

				    echo =========================

				    grep "WARNING:" /tmp/docs_build.txt

				    echo =========================

				    echo Docs build failed. If the failure is not clear, scan back in the log

				    echo for any WARNINGS or for the line "build finished with problems"

				    echo "(tried to echo the WARNINGS above the ==== line)"

				    echo =========================

				  fi

				  set -ex

				  return $code

				}

				git clone https://github.com/pytorch/pytorch.github.io -b $branch --depth 1

				pushd pytorch.github.io

				export LC_ALL=C

				@ -54,26 +71,15 @@ export PATH=/opt/conda/bin:$PATH

				rm -rf pytorch || true

				# Install TensorBoard in python 3 so torch.utils.tensorboard classes render

				pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl

				# Get all the documentation sources, put them in one place

				pushd "$pt_checkout"

				git clone https://github.com/pytorch/vision

				pushd vision

				conda install -q pillow

				time python setup.py install

				popd

				pushd docs

				rm -rf source/torchvision

				cp -a ../vision/docs/source source/torchvision

				# Build the docs

				pip -q install -r requirements.txt || true

				pip -q install -r requirements.txt

				if [ "$is_master_doc" = true ]; then

				  # TODO: fix gh-38011 then enable this which changes warnings into errors

				  # export SPHINXOPTS="-WT --keep-going"

				  make html

				  build_docs html

				  [ $? -eq 0 ] || exit $?

				  make coverage

				  # Now we have the coverage report, we need to make sure it is empty.

				  # Count the number of lines in the file and turn that number into a variable

				@ -94,8 +100,9 @@ if [ "$is_master_doc" = true ]; then

				    exit 1

				  fi

				else

				  # Don't fail the build on coverage problems

				  make html-stable

				  # skip coverage, format for stable or tags

				  build_docs html-stable

				  [ $? -eq 0 ] || exit $?

				fi

				# Move them into the docs repo

				@ -104,14 +111,6 @@ popd

				git rm -rf "$install_path" || true

				mv "$pt_checkout/docs/build/html" "$install_path"

				# Add the version handler by search and replace.

				# XXX: Consider moving this to the docs Makefile or site build

				if [ "$is_master_doc" = true ]; then

				  find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\1 \&#x25BC</a>@g"

				else

				  find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>$version \&#x25BC</a>@g"

				fi

				# Prevent Google from indexing $install_path/_modules. This folder contains

				# generated source files.

				# NB: the following only works on gnu sed. The sed shipped with mac os is different.

				@ -123,24 +122,8 @@ git status

				git config user.email "soumith+bot@pytorch.org"

				git config user.name "pytorchbot"

				# If there aren't changes, don't make a commit; push is no-op

				git commit -m "auto-generating sphinx docs" || true

				git commit -m "Generate Python docs from pytorch/pytorch@$CIRCLE_SHA1" || true

				git status

				if [ "$dry_run" = false ]; then

				  echo "Pushing to pytorch.github.io:$branch"

				  set +x

				/usr/bin/expect <<DONE

				  spawn git push origin $branch

				  expect "Username*"

				  send "pytorchbot\n"

				  expect "Password*"

				  send "$::env(GITHUB_PYTORCHBOT_TOKEN)\n"

				  expect eof

				DONE

				  set -x

				else

				  echo "Skipping push due to dry_run"

				fi

				popd

				# =================== The above code **should** be executed inside Docker container ===================

									
										95

.circleci/scripts/setup_ci_environment.sh
									
												View File
												
				@ -1,12 +1,6 @@

				#!/usr/bin/env bash

				set -ex -o pipefail

				# Set up NVIDIA docker repo

				curl -s -L --retry 3 https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

				echo "deb https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				echo "deb https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				# Remove unnecessary sources

				sudo rm -f /etc/apt/sources.list.d/google-chrome.list

				sudo rm -f /etc/apt/heroku.list

				@ -14,7 +8,7 @@ sudo rm -f /etc/apt/openjdk-r-ubuntu-ppa-xenial.list

				sudo rm -f /etc/apt/partner.list

				retry () {

				    $*  || $* || $* || $* || $*

				  $*  || $* || $* || $* || $*

				}

				# Method adapted from here: https://askubuntu.com/questions/875213/apt-get-to-retry-downloading

				@ -22,70 +16,75 @@ retry () {

				# This is better than retrying the whole apt-get command

				echo "APT::Acquire::Retries \"3\";" | sudo tee /etc/apt/apt.conf.d/80-retries

				sudo apt-get -y update

				sudo apt-get -y remove linux-image-generic linux-headers-generic linux-generic docker-ce

				# WARNING: Docker version is hardcoded here; you must update the

				# version number below for docker-ce and nvidia-docker2 to get newer

				# versions of Docker.  We hardcode these numbers because we kept

				# getting broken CI when Docker would update their docker version,

				# and nvidia-docker2 would be out of date for a day until they

				# released a newer version of their package.

				#

				# How to figure out what the correct versions of these packages are?

				# My preferred method is to start a Docker instance of the correct

				# Ubuntu version (e.g., docker run -it ubuntu:16.04) and then ask

				# apt what the packages you need are.  Note that the CircleCI image

				# comes with Docker.

				#

				# Using 'retry' here as belt-and-suspenders even though we are

				# presumably retrying at the single-package level via the

				# apt.conf.d/80-retries technique.

				retry sudo apt-get update -qq

				retry sudo apt-get -y install \

				  linux-headers-$(uname -r) \

				  linux-image-generic \

				  moreutils \

				  docker-ce=5:18.09.4~3-0~ubuntu-xenial \

				  nvidia-container-runtime=2.0.0+docker18.09.4-1 \

				  nvidia-docker2=2.0.3+docker18.09.4-1 \

				  expect-dev

				sudo pkill -SIGHUP dockerd

				echo "== DOCKER VERSION =="

				docker version

				retry sudo pip -q install awscli==1.16.35

				if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

				  DRIVER_FN="NVIDIA-Linux-x86_64-440.59.run"

				  DRIVER_FN="NVIDIA-Linux-x86_64-460.39.run"

				  wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"

				  sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)

				  nvidia-smi

				  # Taken directly from https://github.com/NVIDIA/nvidia-docker

				  # Add the package repositories

				  distribution=$(. /etc/os-release;echo "$ID$VERSION_ID")

				  curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

				  curl -s -L "https://nvidia.github.io/nvidia-docker/${distribution}/nvidia-docker.list" | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

				  sudo apt-get update -qq

				  # Necessary to get the `--gpus` flag to function within docker

				  sudo apt-get install -y nvidia-container-toolkit

				  sudo systemctl restart docker

				else

				  # Explicitly remove nvidia docker apt repositories if not building for cuda

				  sudo rm -rf /etc/apt/sources.list.d/nvidia-docker.list

				fi

				add_to_env_file() {

				  local content

				  content=$1

				  # BASH_ENV should be set by CircleCI

				  echo "${content}" >> "${BASH_ENV:-/tmp/env}"

				}

				add_to_env_file "IN_CI=1"

				add_to_env_file "COMMIT_SOURCE=${CIRCLE_BRANCH:-}"

				add_to_env_file "BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}"

				add_to_env_file "CIRCLE_PULL_REQUEST=${CIRCLE_PULL_REQUEST}"

				if [[ "${BUILD_ENVIRONMENT}" == *-build ]]; then

				  echo "declare -x IN_CIRCLECI=1" > /home/circleci/project/env

				  echo "declare -x COMMIT_SOURCE=${CIRCLE_BRANCH:-}" >> /home/circleci/project/env

				  echo "declare -x SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> /home/circleci/project/env

				  add_to_env_file "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2"

				  SCCACHE_MAX_JOBS=$(( $(nproc) - 1 ))

				  MEMORY_LIMIT_MAX_JOBS=8  # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM

				  MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))

				  add_to_env_file "MAX_JOBS=${MAX_JOBS}"

				  if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

				    echo "declare -x TORCH_CUDA_ARCH_LIST=5.2" >> /home/circleci/project/env

				    add_to_env_file "TORCH_CUDA_ARCH_LIST=5.2"

				  fi

				  export SCCACHE_MAX_JOBS=`expr $(nproc) - 1`

				  export MEMORY_LIMIT_MAX_JOBS=8  # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM

				  export MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))

				  echo "declare -x MAX_JOBS=${MAX_JOBS}" >> /home/circleci/project/env

				  if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				    # This IAM user allows write access to S3 bucket for sccache & bazels3cache

				    set +x

				    echo "declare -x XLA_CLANG_CACHE_S3_BUCKET_NAME=${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}" >> /home/circleci/project/env

				    echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}" >> /home/circleci/project/env

				    echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}" >> /home/circleci/project/env

				    add_to_env_file "XLA_CLANG_CACHE_S3_BUCKET_NAME=${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}"

				    add_to_env_file "AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}"

				    add_to_env_file "AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}"

				    set -x

				  else

				    # This IAM user allows write access to S3 bucket for sccache

				    set +x

				    echo "declare -x XLA_CLANG_CACHE_S3_BUCKET_NAME=${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}" >> /home/circleci/project/env

				    echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}" >> /home/circleci/project/env

				    echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}" >> /home/circleci/project/env

				    add_to_env_file "XLA_CLANG_CACHE_S3_BUCKET_NAME=${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}"

				    add_to_env_file "AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}"

				    add_to_env_file "AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}"

				    set -x

				  fi

				fi

				@ -94,5 +93,5 @@ fi

				set +x

				export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4:-}

				export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4:-}

				eval $(aws ecr get-login --region us-east-1 --no-include-email)

				eval "$(aws ecr get-login --region us-east-1 --no-include-email)"

				set -x

									
										2

.circleci/scripts/setup_linux_system_environment.sh
									
												View File
												
				@ -33,7 +33,7 @@ systemctl list-units --all | cat

				sudo pkill apt-get || true

				# For even better luck, purge unattended-upgrades

				sudo apt-get purge -y unattended-upgrades

				sudo apt-get purge -y unattended-upgrades || true

				cat /etc/apt/sources.list

									
										4

.circleci/scripts/upload_binary_size_to_scuba.py
									
												View File
												
				@ -41,11 +41,13 @@ def build_message(size):

				            "build_num": os.environ.get("CIRCLE_BUILD_NUM"),

				            "sha1": os.environ.get("CIRCLE_SHA1"),

				            "branch": os.environ.get("CIRCLE_BRANCH"),

				            "workflow_id": os.environ.get("CIRCLE_WORKFLOW_ID"),

				        },

				        "int": {

				            "time": int(time.time()),

				            "size": size,

				            "commit_time": int(os.environ.get("COMMIT_TIME", "0")),

				            "run_duration": int(time.time() - os.path.getmtime(os.path.realpath(__file__))),

				        },

				    }

				@ -114,10 +116,12 @@ def report_android_sizes(file_dir):

				                    "build_num": os.environ.get("CIRCLE_BUILD_NUM"),

				                    "sha1": os.environ.get("CIRCLE_SHA1"),

				                    "branch": os.environ.get("CIRCLE_BRANCH"),

				                    "workflow_id": os.environ.get("CIRCLE_WORKFLOW_ID"),

				                },

				                "int": {

				                    "time": int(time.time()),

				                    "commit_time": int(os.environ.get("COMMIT_TIME", "0")),

				                    "run_duration": int(time.time() - os.path.getmtime(os.path.realpath(__file__))),

				                    "size": comp_size,

				                    "raw_size": uncomp_size,

				                },

									
										2

.circleci/scripts/vs_install.ps1
									
												View File
												
				@ -1,7 +1,7 @@

				$VS_DOWNLOAD_LINK = "https://aka.ms/vs/15/release/vs_buildtools.exe"

				$COLLECT_DOWNLOAD_LINK = "https://aka.ms/vscollect.exe"

				$VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStudio.Workload.VCTools",

				                                                     "--add Microsoft.VisualStudio.Component.VC.Tools.14.11",

				                                                     "--add Microsoft.VisualStudio.Component.VC.Tools.14.13",

				                                                     "--add Microsoft.Component.MSBuild",

				                                                     "--add Microsoft.VisualStudio.Component.Roslyn.Compiler",

				                                                     "--add Microsoft.VisualStudio.Component.TextTemplating",

									
										5

.circleci/scripts/vs_install_cmath.ps1
									
										Normal file
									
												View File
												
				@ -0,0 +1,5 @@

				$CMATH_DOWNLOAD_LINK = "https://raw.githubusercontent.com/microsoft/STL/12c684bba78f9b032050526abdebf14f58ca26a3/stl/inc/cmath"

				$VC14_28_INSTALL_PATH="C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\include"

				curl.exe --retry 3 -kL $CMATH_DOWNLOAD_LINK --output "$home\cmath"

				Move-Item -Path "$home\cmath" -Destination "$VC14_28_INSTALL_PATH" -Force

									
										52

.circleci/scripts/windows_cuda_install.sh
									
												View File
												
				@ -1,30 +1,54 @@

				#!/bin/bash

				set -eux -o pipefail

				curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/cuda_10.1.243_426.00_win10.exe

				7z x cuda_10.1.243_426.00_win10.exe -ocuda_10.1.243_426.00_win10

				cd cuda_10.1.243_426.00_win10

				cuda_major_version=${CUDA_VERSION%.*}

				if [[ "$cuda_major_version" == "10" ]]; then

				    cuda_installer_name="cuda_10.1.243_426.00_win10"

				    msbuild_project_dir="CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"

				    cuda_install_packages="nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1"

				elif [[ "$cuda_major_version" == "11" ]]; then

				    cuda_installer_name="cuda_11.1.0_456.43_win10"

				    msbuild_project_dir="visual_studio_integration/CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"

				    cuda_install_packages="nvcc_11.1 cuobjdump_11.1 nvprune_11.1 nvprof_11.1 cupti_11.1 cublas_11.1 cublas_dev_11.1 cudart_11.1 cufft_11.1 cufft_dev_11.1 curand_11.1 curand_dev_11.1 cusolver_11.1 cusolver_dev_11.1 cusparse_11.1 cusparse_dev_11.1 npp_11.1 npp_dev_11.1 nvrtc_11.1 nvrtc_dev_11.1 nvml_dev_11.1"

				else

				    echo "CUDA_VERSION $CUDA_VERSION is not supported yet"

				    exit 1

				fi

				if [[ "$cuda_major_version" == "11" && "${JOB_EXECUTOR}" == "windows-with-nvidia-gpu" ]]; then

				    cuda_install_packages="${cuda_install_packages} Display.Driver"

				fi

				cuda_installer_link="https://ossci-windows.s3.amazonaws.com/${cuda_installer_name}.exe"

				curl --retry 3 -kLO $cuda_installer_link

				7z x ${cuda_installer_name}.exe -o${cuda_installer_name}

				cd ${cuda_installer_name}

				mkdir cuda_install_logs

				set +e

				./setup.exe -s nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1 -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"

				./setup.exe -s ${cuda_install_packages} -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"

				set -e

				if [[ "${VC_YEAR}" == "2017" ]]; then

				    cp -r CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions/* "C:/Program Files (x86)/Microsoft Visual Studio/2017/${VC_PRODUCT}/Common7/IDE/VC/VCTargets/BuildCustomizations/"

				    cp -r ${msbuild_project_dir}/* "C:/Program Files (x86)/Microsoft Visual Studio/2017/${VC_PRODUCT}/Common7/IDE/VC/VCTargets/BuildCustomizations/"

				else

				    cp -r CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions/* "C:/Program Files (x86)/Microsoft Visual Studio/2019/${VC_PRODUCT}/MSBuild/Microsoft/VC/v160/BuildCustomizations/"

				    cp -r ${msbuild_project_dir}/* "C:/Program Files (x86)/Microsoft Visual Studio/2019/${VC_PRODUCT}/MSBuild/Microsoft/VC/v160/BuildCustomizations/"

				fi

				curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z

				7z x NvToolsExt.7z -oNvToolsExt

				mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"

				cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"

				export NVTOOLSEXT_PATH="C:\\Program Files\\NVIDIA Corporation\\NvToolsExt\\"

				if ! ls "/c/Program Files/NVIDIA Corporation/NvToolsExt/bin/x64/nvToolsExt64_1.dll"

				then

				    curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z

				    7z x NvToolsExt.7z -oNvToolsExt

				    mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"

				    cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"

				    export NVTOOLSEXT_PATH="C:\\Program Files\\NVIDIA Corporation\\NvToolsExt\\"

				fi

				if ! ls "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/bin/nvcc.exe"

				if ! ls "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe"

				then

				    echo "CUDA installation failed"

				    mkdir -p /c/w/build-results

				@ -33,5 +57,5 @@ then

				fi

				cd ..

				rm -rf ./cuda_10.1.243_426.00_win10

				rm -f ./cuda_10.1.243_426.00_win10.exe

				rm -rf ./${cuda_installer_name}

				rm -f ./${cuda_installer_name}.exe

									
										21

.circleci/scripts/windows_cudnn_install.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,21 @@

				#!/bin/bash

				set -eux -o pipefail

				cuda_major_version=${CUDA_VERSION%.*}

				if [[ "$cuda_major_version" == "10" ]]; then

				    cudnn_installer_name="cudnn-${CUDA_VERSION}-windows10-x64-v7.6.4.38"

				elif [[ "$cuda_major_version" == "11" ]]; then

				    cudnn_installer_name="cudnn-${CUDA_VERSION}-windows-x64-v8.0.5.39"

				else

				    echo "CUDNN for CUDA_VERSION $CUDA_VERSION is not supported yet"

				    exit 1

				fi

				cudnn_installer_link="https://ossci-windows.s3.amazonaws.com/${cudnn_installer_name}.zip"

				curl --retry 3 -O $cudnn_installer_link

				7z x ${cudnn_installer_name}.zip -ocudnn

				cp -r cudnn/cuda/* "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/"

				rm -rf cudnn

				rm -f ${cudnn_installer_name}.zip

									
										45

.circleci/validate-docker-version.py
									
												View File
											
				@ -1,45 +0,0 @@

				#!/usr/bin/env python3

				import cimodel.data.caffe2_build_definitions as caffe2_build_definitions

				import cimodel.data.simple.util.docker_constants as pytorch_docker_constants

				from yaml import load

				try:

				    from yaml import CLoader as Loader

				except ImportError:

				    from yaml import Loader

				def load_config(filename=".circleci/config.yml"):

				    with open(filename, "r") as fh:

				        return load("".join(fh.readlines()), Loader)

				def load_tags_for_projects(workflow_config):

				    return {

				        v["ecr_gc_job"]["project"]: v["ecr_gc_job"]["tags_to_keep"]

				        for v in workflow_config["workflows"]["ecr_gc"]["jobs"]

				        if isinstance(v, dict) and "ecr_gc_job" in v

				    }

				def check_version(job, tags, expected_version):

				    valid_versions = tags[job].split(",")

				    if expected_version not in valid_versions:

				        raise RuntimeError(

				            "We configured {} to use Docker version {}; but this "

				            "version is not configured in job ecr_gc_job_for_{}.  Non-deployed versions will be "

				            "garbage collected two weeks after they are created.  DO NOT LAND "

				            "THIS TO MASTER without also updating ossci-job-dsl with this version."

				            "\n\nDeployed versions: {}".format(job, expected_version, job, tags[job])

				        )

				def validate_docker_version():

				    tags = load_tags_for_projects(load_config())

				    check_version("pytorch", tags, pytorch_docker_constants.DOCKER_IMAGE_TAG)

				    check_version("caffe2", tags, caffe2_build_definitions.DOCKER_IMAGE_VERSION)

				if __name__ == "__main__":

				    validate_docker_version()

									
										2

.circleci/verbatim-sources/build-parameters/binary-build-params.yml
									
												View File
												
				@ -59,7 +59,7 @@ binary_windows_params: &binary_windows_params

				      default: ""

				    executor:

				      type: string

				      default: "windows-cpu-with-nvidia-cuda"

				      default: "windows-xlarge-cpu-with-nvidia-cuda"

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    BUILD_FOR_SYSTEM: windows

									
										27

.circleci/verbatim-sources/build-parameters/caffe2-build-params.yml
									
												View File
											
				@ -1,27 +0,0 @@

				caffe2_params: &caffe2_params

				  parameters:

				    build_environment:

				      type: string

				      default: ""

				    build_ios:

				      type: string

				      default: ""

				    docker_image:

				      type: string

				      default: ""

				    use_cuda_docker_runtime:

				      type: string

				      default: ""

				    build_only:

				      type: string

				      default: ""

				    resource_class:

				      type: string

				      default: "large"

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    BUILD_IOS: << parameters.build_ios >>

				    USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>

				    DOCKER_IMAGE: << parameters.docker_image >>

				    BUILD_ONLY: << parameters.build_only >>

				  resource_class: << parameters.resource_class >>

									
										12

.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml
									
												View File
												
				@ -36,17 +36,21 @@ pytorch_ios_params: &pytorch_ios_params

				    op_list:

				      type: string

				      default: ""

				    use_metal:

				      type: string

				      default: "0"

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    IOS_ARCH: << parameters.ios_arch >>

				    IOS_PLATFORM: << parameters.ios_platform >>

				    SELECTED_OP_LIST: << parameters.op_list >>

				    USE_PYTORCH_METAL: << parameters.use_metal >>

				pytorch_windows_params: &pytorch_windows_params

				  parameters:

				    executor:

				      type: string

				      default: "windows-cpu-with-nvidia-cuda"

				      default: "windows-xlarge-cpu-with-nvidia-cuda"

				    build_environment:

				      type: string

				      default: ""

				@ -55,16 +59,16 @@ pytorch_windows_params: &pytorch_windows_params

				      default: ""

				    cuda_version:

				      type: string

				      default: "10"

				      default: "10.1"

				    python_version:

				      type: string

				      default: "3.6"

				    vc_version:

				      type: string

				      default: "14.11"

				      default: "14.16"

				    vc_year:

				      type: string

				      default: "2017"

				      default: "2019"

				    vc_product:

				      type: string

				      default: "BuildTools"

									
										79

.circleci/verbatim-sources/commands.yml
									
												View File
												
				@ -1,23 +1,26 @@

				commands:

				  # Must be run after attaching workspace from previous steps

				  load_shared_env:

				    description: "Loads .circleci/shared/env_file into ${BASH_ENV}"

				    parameters:

				      # For some weird reason we decide to reattach our workspace to ~/workspace so

				      # in the vein of making it simple let's assume our share env_file is here

				      root:

				        type: string

				        default: "~/workspace"

				  calculate_docker_image_tag:

				    description: "Calculates the docker image tag"

				    steps:

				      - run:

				          name: "Load .circleci/shared/env_file into ${BASH_ENV}"

				          name: "Calculate docker image hash"

				          command: |

				            if [[ -f  "<< parameters.root >>/.circleci/shared/env_file" ]]; then

				              cat << parameters.root >>/.circleci/shared/env_file >> ${BASH_ENV}

				            else

				              echo "We didn't have a shared env file, that's weird"

				            DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				            echo "DOCKER_TAG=${DOCKER_TAG}" >> "${BASH_ENV}"

				  designate_upload_channel:

				    description: "inserts the correct upload channel into ${BASH_ENV}"

				    steps:

				      - run:

				          name: adding UPLOAD_CHANNEL to BASH_ENV

				          command: |

				            our_upload_channel=nightly

				            # On tags upload to test instead

				            if [[ -n "${CIRCLE_TAG}" ]]; then

				              our_upload_channel=test

				            fi

				            echo "export UPLOAD_CHANNEL=${our_upload_channel}" >> ${BASH_ENV}

				  # This system setup script is meant to run before the CI-related scripts, e.g.,

				  # installing Git client, checking out code, setting up CI env, and

				@ -100,7 +103,7 @@ commands:

				          name: (Optional) Merge target branch

				          no_output_timeout: "10m"

				          command: |

				            if [ -n "$CIRCLE_PULL_REQUEST" ]; then

				            if [[ -n "$CIRCLE_PULL_REQUEST" && "$CIRCLE_BRANCH" != "nightly" ]]; then

				              PR_NUM=$(basename $CIRCLE_PULL_REQUEST)

				              CIRCLE_PR_BASE_BRANCH=$(curl -s https://api.github.com/repos/$CIRCLE_PROJECT_USERNAME/$CIRCLE_PROJECT_REPONAME/pulls/$PR_NUM | jq -r '.base.ref')

				              if [[ "${BUILD_ENVIRONMENT}" == *"xla"* || "${BUILD_ENVIRONMENT}" == *"gcc5"* ]] ; then

				@ -108,11 +111,11 @@ commands:

				                git config --global user.email "circleci.ossci@gmail.com"

				                git config --global user.name "CircleCI"

				                git config remote.origin.url https://github.com/pytorch/pytorch.git

				                git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master

				                git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=100 --quiet

				                git config --add remote.origin.fetch +refs/heads/release/1.8:refs/remotes/origin/release/1.8

				                git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/release/1.8:refs/remotes/origin/release/1.8 --depth=100 --quiet

				                # PRs generated from ghstack has format CIRCLE_PR_BASE_BRANCH=gh/xxx/1234/base

				                if [[ "${CIRCLE_PR_BASE_BRANCH}" == "gh/"* ]]; then

				                  CIRCLE_PR_BASE_BRANCH=master

				                  CIRCLE_PR_BASE_BRANCH=release/1.8

				                fi

				                export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/$CIRCLE_PR_BASE_BRANCH`

				                echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}

				@ -130,4 +133,42 @@ commands:

				              echo "This is not a pull request, skipping..."

				            fi

				  upload_binary_size_for_android_build:

				    description: "Upload binary size data for Android build"

				    parameters:

				      build_type:

				        type: string

				        default: ""

				      artifacts:

				        type: string

				        default: ""

				    steps:

				      - run:

				          name: "Binary Size - Install Dependencies"

				          no_output_timeout: "5m"

				          command: |

				            retry () {

				              $* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)

				            }

				            retry pip3 install requests

				      - run:

				          name: "Binary Size - Untar Artifacts"

				          no_output_timeout: "5m"

				          command: |

				            # The artifact file is created inside docker container, which contains the result binaries.

				            # Now unpackage it into the project folder. The subsequent script will scan project folder

				            # to locate result binaries and report their sizes.

				            # If artifact file is not provided it assumes that the project folder has been mounted in

				            # the docker during build and already contains the result binaries, so this step can be skipped.

				            export ARTIFACTS="<< parameters.artifacts >>"

				            if [ -n "${ARTIFACTS}" ]; then

				              tar xf "${ARTIFACTS}" -C ~/project

				            fi

				      - run:

				          name: "Binary Size - Upload << parameters.build_type >>"

				          no_output_timeout: "5m"

				          command: |

				            cd ~/project

				            export ANDROID_BUILD_TYPE="<< parameters.build_type >>"

				            export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				            python3 .circleci/scripts/upload_binary_size_to_scuba.py android

									
										12

.circleci/verbatim-sources/header-section.yml
									
												View File
												
				@ -11,6 +11,9 @@ parameters:

				  run_binary_tests:

				    type: boolean

				    default: false

				  run_build:

				    type: boolean

				    default: true

				docker_config_defaults: &docker_config_defaults

				  user: jenkins

				@ -26,9 +29,14 @@ executors:

				      image: windows-server-2019-nvidia:stable

				      shell: bash.exe

				  windows-cpu-with-nvidia-cuda:

				  windows-xlarge-cpu-with-nvidia-cuda:

				    machine:

				      # we will change to CPU host when it's ready

				      resource_class: windows.xlarge

				      image: windows-server-2019-vs2019:stable

				      shell: bash.exe

				  windows-medium-cpu-with-nvidia-cuda:

				    machine:

				      resource_class: windows.medium

				      image: windows-server-2019-vs2019:stable

				      shell: bash.exe

									
										204

.circleci/verbatim-sources/job-specs/binary-job-specs.yml
									
												View File
												
				@ -1,60 +1,42 @@

				  binary_linux_build:

				    <<: *binary_linux_build_params

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				    - calculate_docker_image_tag

				    - run:

				        <<: *binary_checkout

				    - run:

				        <<: *binary_populate_env

				    - run:

				        name: Install unbuffer and ts

				        command: |

				            set -eux -o pipefail

				            source /env

				            OS_NAME=`awk -F= '/^NAME/{print $2}' /etc/os-release`

				            if [[ "$OS_NAME" == *"CentOS Linux"* ]]; then

				              retry yum -q -y install epel-release

				              retry yum -q -y install expect moreutils

				            elif [[ "$OS_NAME" == *"Ubuntu"* ]]; then

				              retry apt-get update

				              retry apt-get -y install expect moreutils

				              retry conda install -y -c eumetsat expect

				              retry conda install -y cmake

				            fi

				    - run:

				        name: Update compiler to devtoolset7

				        command: |

				            set -eux -o pipefail

				            source /env

				            if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' ]]; then

				              source "/builder/update_compiler.sh"

				              # Env variables are not persisted into the next step

				              echo "export PATH=$PATH" >> /env

				              echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> /env

				            else

				              echo "Not updating compiler"

				            fi

				    - run:

				        name: Build

				        no_output_timeout: "1h"

				        command: |

				            source "/pytorch/.circleci/scripts/binary_linux_build.sh"

				            # Preserve build log

				            if [ -f /pytorch/build/.ninja_log ]; then

				              cp /pytorch/build/.ninja_log /final_pkgs

				            fi

				    - run:

				        name: Output binary sizes

				        no_output_timeout: "1m"

				        command: |

				            ls -lah /final_pkgs

				    - run:

				        name: save binary size

				        no_output_timeout: "5m"

				        command: |

				            source /env

				            cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				            pip3 install requests && \

				            python3 -mpip install requests && \

				            SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \

				            python3 /pytorch/.circleci/scripts/upload_binary_size_to_scuba.py || exit 0

				    - persist_to_workspace:

				        root: /

				        paths: final_pkgs

				    - store_artifacts:

				        path: /final_pkgs

				    # This should really just be another step of the binary_linux_build job above.

				    # This isn't possible right now b/c the build job uses the docker executor

				    # (otherwise they'd be really really slow) but this one uses the macine

				@ -63,11 +45,10 @@

				  binary_linux_test:

				    <<: *binary_linux_test_upload_params

				    machine:

				        image: ubuntu-1604:201903-01

				        image: ubuntu-1604:202007-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				    # TODO: We shouldn't attach the workspace multiple times

				    - attach_workspace:

				        at: /home/circleci/project

				    - setup_linux_system_environment

				@ -83,25 +64,41 @@

				    - run:

				        <<: *binary_run_in_docker

				  binary_linux_upload:

				    <<: *binary_linux_test_upload_params

				    machine:

				        image: ubuntu-1604:201903-01

				  binary_upload:

				    parameters:

				      package_type:

				        type: string

				        description: "What type of package we are uploading (eg. wheel, libtorch, conda)"

				        default: "wheel"

				      upload_subfolder:

				        type: string

				        description: "What subfolder to put our package into (eg. cpu, cudaX.Y, etc.)"

				        default: "cpu"

				    docker:

				      - image: continuumio/miniconda3

				    environment:

				      - DRY_RUN: disabled

				      - PACKAGE_TYPE: "<< parameters.package_type >>"

				      - UPLOAD_SUBFOLDER: "<< parameters.upload_subfolder >>"

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - attach_workspace:

				        at: /home/circleci/project

				    - run:

				        <<: *binary_populate_env

				    - run:

				        <<: *binary_install_miniconda

				    - run:

				        name: Upload

				        no_output_timeout: "1h"

				        command: .circleci/scripts/binary_linux_upload.sh

				      - attach_workspace:

				          at: /tmp/workspace

				      - checkout

				      - designate_upload_channel

				      - run:

				          name: Install dependencies

				          no_output_timeout: "1h"

				          command: |

				            conda install -yq anaconda-client

				            pip install -q awscli

				      - run:

				          name: Do upload

				          no_output_timeout: "1h"

				          command: |

				            AWS_ACCESS_KEY_ID="${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}" \

				              AWS_SECRET_ACCESS_KEY="${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}" \

				              ANACONDA_API_TOKEN="${CONDA_PYTORCHBOT_TOKEN}" \

				              .circleci/scripts/binary_upload.sh

				  # Nighlty build smoke tests defaults

				  # These are the second-round smoke tests. These make sure that the binaries are

				@ -111,9 +108,10 @@

				  smoke_linux_test:

				    <<: *binary_linux_test_upload_params

				    machine:

				      image: ubuntu-1604:201903-01

				      image: ubuntu-1604:202007-01

				    steps:

				    - checkout

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -137,7 +135,7 @@

				  smoke_mac_test:

				    <<: *binary_linux_test_upload_params

				    macos:

				      xcode: "9.4.1"

				      xcode: "12.0"

				    steps:

				      - checkout

				      - run:

				@ -162,7 +160,7 @@

				  binary_mac_build:

				    <<: *binary_mac_params

				    macos:

				      xcode: "9.4.1"

				      xcode: "12.0"

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				@ -176,7 +174,7 @@

				    - run:

				        name: Build

				        no_output_timeout: "1h"

				        no_output_timeout: "90m"

				        command: |

				          # Do not set -u here; there is some problem with CircleCI

				          # variable expansion with PROMPT_COMMAND

				@ -200,10 +198,13 @@

				        root: /Users/distiller/project

				        paths: final_pkgs

				  binary_mac_upload: &binary_mac_upload

				    - store_artifacts:

				        path: /Users/distiller/project/final_pkgs

				  binary_macos_arm64_build:

				    <<: *binary_mac_params

				    macos:

				      xcode: "9.4.1"

				      xcode: "12.3.0"

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				@ -214,20 +215,31 @@

				    - brew_update

				    - run:

				        <<: *binary_install_miniconda

				    - attach_workspace: # TODO - we can `cp` from ~/workspace

				        at: /Users/distiller/project

				    - run:

				        name: Upload

				        no_output_timeout: "10m"

				        name: Build

				        no_output_timeout: "90m"

				        command: |

				          script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_upload.sh"

				          # Do not set -u here; there is some problem with CircleCI

				          # variable expansion with PROMPT_COMMAND

				          set -ex -o pipefail

				          export CROSS_COMPILE_ARM64=1

				          script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_build.sh"

				          cat "$script"

				          source "$script"

				    - persist_to_workspace:

				        root: /Users/distiller/project

				        paths: final_pkgs

				    - store_artifacts:

				        path: /Users/distiller/project/final_pkgs

				  binary_ios_build:

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "11.2.1"

				      xcode: "12.0"

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				@ -254,7 +266,7 @@

				  binary_ios_upload:

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "11.2.1"

				      xcode: "12.0"

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				@ -276,11 +288,16 @@

				        default: ""

				      executor:

				        type: string

				        default: "windows-cpu-with-nvidia-cuda"

				        default: "windows-xlarge-cpu-with-nvidia-cuda"

				    executor: <<parameters.executor>>

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				    - run:

				        name: _HACK_ Install CUDA compatible cmath

				        no_output_timeout: 1m

				        command: |

				            powershell .circleci/scripts/vs_install_cmath.ps1

				    - run:

				        <<: *binary_checkout

				    - run:

				@ -305,7 +322,7 @@

				        default: ""

				      executor:

				        type: string

				        default: "windows-cpu-with-nvidia-cuda"

				        default: "windows-medium-cpu-with-nvidia-cuda"

				    executor: <<parameters.executor>>

				    steps:

				    - checkout

				@ -324,28 +341,6 @@

				          cat "$script"

				          source "$script"

				  binary_windows_upload:

				    <<: *binary_windows_params

				    docker:

				      - image: continuumio/miniconda

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				    - attach_workspace:

				        at: /root/workspace

				    - run:

				        <<: *binary_checkout

				    - run:

				        <<: *binary_populate_env

				    - run:

				        name: Upload

				        no_output_timeout: "10m"

				        command: |

				          set -eux -o pipefail

				          script="/pytorch/.circleci/scripts/binary_windows_upload.sh"

				          cat "$script"

				          source "$script"

				  smoke_windows_test:

				    <<: *binary_windows_params

				    parameters:

				@ -354,7 +349,7 @@

				        default: ""

				      executor:

				        type: string

				        default: "windows-cpu-with-nvidia-cuda"

				        default: "windows-medium-cpu-with-nvidia-cuda"

				    executor: <<parameters.executor>>

				    steps:

				    - checkout

				@ -372,3 +367,32 @@

				          cat "$script"

				          source "$script"

				  anaconda_prune:

				    parameters:

				      packages:

				        type: string

				        description: "What packages are we pruning? (quoted, space-separated string. eg. 'pytorch', 'torchvision torchaudio', etc.)"

				        default: "pytorch"

				      channel:

				        type: string

				        description: "What channel are we pruning? (eq. pytorch-nightly)"

				        default: "pytorch-nightly"

				    docker:

				      - image: continuumio/miniconda3

				    environment:

				      - PACKAGES: "<< parameters.packages >>"

				      - CHANNEL: "<< parameters.channel >>"

				    steps:

				      - checkout

				      - run:

				          name: Install dependencies

				          no_output_timeout: "1h"

				          command: |

				            conda install -yq anaconda-client

				      - run:

				          name: Prune packages

				          no_output_timeout: "1h"

				          command: |

				              ANACONDA_API_TOKEN="${CONDA_PYTORCHBOT_TOKEN}" \

				              scripts/release/anaconda-prune/run.sh

									
										3

.circleci/verbatim-sources/job-specs/binary_update_htmls.yml
									
												View File
												
				@ -8,7 +8,8 @@

				  # then install the one with the most recent version.

				  update_s3_htmls: &update_s3_htmls

				    machine:

				      image: ubuntu-1604:201903-01

				      image: ubuntu-1604:202007-01

				    resource_class: medium

				    steps:

				    - checkout

				    - setup_linux_system_environment

									
										198

.circleci/verbatim-sources/job-specs/caffe2-job-specs.yml
									
												View File
											
				@ -1,198 +0,0 @@

				  caffe2_linux_build:

				    <<: *caffe2_params

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: Build

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          cat >/home/circleci/project/ci_build_script.sh \<<EOL

				          # =================== The following code will be executed inside Docker container ===================

				          set -ex

				          export BUILD_ENVIRONMENT="$BUILD_ENVIRONMENT"

				          # Reinitialize submodules

				          git submodule sync && git submodule update -q --init --recursive

				          # conda must be added to the path for Anaconda builds (this location must be

				          # the same as that in install_anaconda.sh used to build the docker image)

				          if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				            export PATH=/opt/conda/bin:$PATH

				            sudo chown -R jenkins:jenkins '/opt/conda'

				          fi

				          # Build

				          ./.jenkins/caffe2/build.sh

				          # Show sccache stats if it is running

				          if pgrep sccache > /dev/null; then

				            sccache --show-stats

				          fi

				          # =================== The above code will be executed inside Docker container ===================

				          EOL

				          chmod +x /home/circleci/project/ci_build_script.sh

				          echo "DOCKER_IMAGE: "${DOCKER_IMAGE}

				          time docker pull ${DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_build_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # Push intermediate Docker image for next phase to use

				          if [ -z "${BUILD_ONLY}" ]; then

				            if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then

				              export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-cmake-${CIRCLE_SHA1}

				            else

				              export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				            fi

				            docker commit "$id" ${COMMIT_DOCKER_IMAGE}

				            time docker push ${COMMIT_DOCKER_IMAGE}

				          fi

				  caffe2_linux_test:

				    <<: *caffe2_params

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: Test

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          # TODO: merge this into Caffe2 test.sh

				          cat >/home/circleci/project/ci_test_script.sh \<<EOL

				          # =================== The following code will be executed inside Docker container ===================

				          set -ex

				          export BUILD_ENVIRONMENT="$BUILD_ENVIRONMENT"

				          # libdc1394 (dependency of OpenCV) expects /dev/raw1394 to exist...

				          sudo ln /dev/null /dev/raw1394

				          # conda must be added to the path for Anaconda builds (this location must be

				          # the same as that in install_anaconda.sh used to build the docker image)

				          if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				            export PATH=/opt/conda/bin:$PATH

				          fi

				          # Upgrade SSL module to avoid old SSL warnings

				          pip -q install --user --upgrade pyOpenSSL ndg-httpsclient pyasn1

				          pip -q install --user -b /tmp/pip_install_onnx "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"

				          # Build

				          ./.jenkins/caffe2/test.sh

				          # Remove benign core dumps.

				          # These are tests for signal handling (including SIGABRT).

				          rm -f ./crash/core.fatal_signal_as.*

				          rm -f ./crash/core.logging_test.*

				          # =================== The above code will be executed inside Docker container ===================

				          EOL

				          chmod +x /home/circleci/project/ci_test_script.sh

				          if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then

				            export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-cmake-${CIRCLE_SHA1}

				          else

				            export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          fi

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          else

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          fi

				          docker cp /home/circleci/project/. "$id:/var/lib/jenkins/workspace"

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_test_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				  caffe2_macos_build:

				    <<: *caffe2_params

				    macos:

				      xcode: "9.4.1"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      - checkout

				      - run_brew_for_macos_build

				      - run:

				          name: Build

				          no_output_timeout: "1h"

				          command: |

				            set -e

				            export IN_CIRCLECI=1

				            brew install cmake

				            # Reinitialize submodules

				            git submodule sync && git submodule update -q --init --recursive

				            # Reinitialize path (see man page for path_helper(8))

				            eval `/usr/libexec/path_helper -s`

				            export PATH=/usr/local/opt/python/libexec/bin:/usr/local/bin:$PATH

				            # Install Anaconda if we need to

				            if [ -n "${CAFFE2_USE_ANACONDA}" ]; then

				              rm -rf ${TMPDIR}/anaconda

				              curl --retry 3 -o ${TMPDIR}/conda.sh https://repo.anaconda.com/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh

				              chmod +x ${TMPDIR}/conda.sh

				              /bin/bash ${TMPDIR}/conda.sh -b -p ${TMPDIR}/anaconda

				              rm -f ${TMPDIR}/conda.sh

				              export PATH="${TMPDIR}/anaconda/bin:${PATH}"

				              source ${TMPDIR}/anaconda/bin/activate

				            fi

				            pip -q install numpy

				            # Install sccache

				            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				            # This IAM user allows write access to S3 bucket for sccache

				            set +x

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}

				            set -x

				            export SCCACHE_BIN=${PWD}/sccache_bin

				            mkdir -p ${SCCACHE_BIN}

				            if which sccache > /dev/null; then

				              printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${SCCACHE_BIN}/clang++"

				              chmod a+x "${SCCACHE_BIN}/clang++"

				              printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${SCCACHE_BIN}/clang"

				              chmod a+x "${SCCACHE_BIN}/clang"

				              export PATH="${SCCACHE_BIN}:$PATH"

				            fi

				            # Build

				            if [ "${BUILD_IOS:-0}" -eq 1 ]; then

				              unbuffer scripts/build_ios.sh 2>&1 | ts

				            elif [ -n "${CAFFE2_USE_ANACONDA}" ]; then

				              # All conda build logic should be in scripts/build_anaconda.sh

				              unbuffer scripts/build_anaconda.sh 2>&1 | ts

				            else

				              unbuffer scripts/build_local.sh 2>&1 | ts

				            fi

				            # Show sccache stats if it is running

				            if which sccache > /dev/null; then

				              sccache --show-stats

				            fi

									
										49

.circleci/verbatim-sources/job-specs/docker_jobs.yml
									
												View File
												
				@ -4,7 +4,7 @@

				          type: string

				          default: ""

				      machine:

				        image: ubuntu-1604:201903-01

				        image: ubuntu-1604:202007-01

				      resource_class: large

				      environment:

				        IMAGE_NAME: << parameters.image_name >>

				@ -13,20 +13,7 @@

				        DOCKER_BUILDKIT: 1

				      steps:

				        - checkout

				        - run:

				            name: Calculate docker tag

				            command: |

				              set -x

				              mkdir .circleci/shared

				              # git keeps a hash of all sub trees

				              echo "export DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)" >> .circleci/shared/env_file

				        # Saves our calculated docker tag to our workpace for later use

				        - persist_to_workspace:

				            root: .

				            paths:

				              - .circleci/shared/

				        - load_shared_env:

				            root: .

				        - calculate_docker_image_tag

				        - run:

				            name: Check if image should be built

				            command: |

				@ -35,7 +22,6 @@

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              eval $(aws ecr get-login --no-include-email --region us-east-1)

				              set -x

				              PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker")

				              # Check if image already exists, if it does then skip building it

				              if docker manifest inspect "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${IMAGE_NAME}:${DOCKER_TAG}"; then

				                circleci-agent step halt

				@ -43,8 +29,15 @@

				                # explicitly exit the step here ourselves before it causes too much trouble

				                exit 0

				              fi

				              # Covers the case where a previous tag doesn't exist for the tree

				              # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

				              if ! git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker"; then

				                echo "Directory '.circleci/docker' not found in tree << pipeline.git.base_revision >>, you should probably rebase onto a more recent commit"

				                exit 1

				              fi

				              PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker")

				              # If no image exists but the hash is the same as the previous hash then we should error out here

				              if [[ ${PREVIOUS_DOCKER_TAG} = ${DOCKER_TAG} ]]; then

				              if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

				                echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				                echo "       contact the PyTorch team to restore the original images"

				                exit 1

				@ -60,7 +53,7 @@

				              cd .circleci/docker && ./build_docker.sh

				  docker_for_ecr_gc_build_job:

				      machine:

				        image: ubuntu-1604:201903-01

				        image: ubuntu-1604:202007-01

				      steps:

				        - checkout

				        - run:

				@ -113,23 +106,3 @@

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              set -x

				              /usr/bin/gc.py --filter-prefix ${PROJECT}  --ignore-tags "${IMAGE_TAG},${GENERATED_IMAGE_TAG}"

				  docker_hub_index_job:

				      docker:

				        - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr

				          aws_auth:

				            aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				            aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				      steps:

				        - run:

				            name: garbage collecting for ecr images

				            no_output_timeout: "1h"

				            command: |

				              set +x

				              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              export DOCKER_HUB_USERNAME=${CIRCLECI_DOCKER_HUB_USERNAME}

				              export DOCKER_HUB_PASSWORD=${CIRCLECI_DOCKER_HUB_PASSWORD}

				              set -x

				              /usr/bin/docker_hub.py

									
										300

.circleci/verbatim-sources/job-specs/job-specs-custom.yml
									
												View File
												
				@ -1,13 +1,39 @@

				  pytorch_python_doc_push:

				  pytorch_doc_push:

				    resource_class: medium

				    machine:

				      image: ubuntu-1604:202007-01

				    parameters:

				      branch:

				        type: string

				        default: "master"

				    steps:

				    - attach_workspace:

				        at: /tmp/workspace

				    - run:

				        name: Generate netrc

				        command: |

				          # set credentials for https pushing

				          cat > ~/.netrc \<<DONE

				            machine github.com

				            login pytorchbot

				            password ${GITHUB_PYTORCHBOT_TOKEN}

				          DONE

				    - run:

				        name: Docs push

				        command: |

				          pushd /tmp/workspace

				          git push -u origin "<< parameters.branch >>"

				  pytorch_python_doc_build:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-python-doc-push

				      # TODO: stop hardcoding this

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:209062ef-ab58-422a-b295-36c4eed6e906"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				      image: ubuntu-1604:202007-01

				    steps:

				    - checkout

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -15,49 +41,44 @@

				        no_output_timeout: "1h"

				        command: |

				          set -ex

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          tag=${CIRCLE_TAG:1:5}

				          target=${tag:-master}

				          echo "building for ${target}"

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          # master branch docs push

				          if [[ "${CIRCLE_BRANCH}" == "master" ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/master master site") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          # stable release docs push. Due to some circleci limitations, we keep

				          # an eternal PR open for merging v1.2.0 -> master for this job.

				          # XXX: The following code is only run on the v1.2.0 branch, which might

				          # not be exactly the same as what you see here.

				          elif [[ "${CIRCLE_BRANCH}" == "v1.2.0" ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/stable 1.2.0 site dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          # For open PRs: Do a dry_run of the docs build, don't push build

				          else

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/master master site dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          fi

				          export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && '"export CIRCLE_SHA1='$CIRCLE_SHA1'"' && . ./.circleci/scripts/python_doc_push_script.sh docs/'$target' '$target' site") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir -p ~/workspace/build_artifacts

				          docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io/docs/master ~/workspace/build_artifacts

				          docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io /tmp/workspace

				          # Save the docs build so we can debug any problems

				          export DEBUG_COMMIT_DOCKER_IMAGE=${COMMIT_DOCKER_IMAGE}-debug

				          docker commit "$id" ${DEBUG_COMMIT_DOCKER_IMAGE}

				          time docker push ${DEBUG_COMMIT_DOCKER_IMAGE}

				    - persist_to_workspace:

				        root: /tmp/workspace

				        paths:

				          - .

				    - store_artifacts:

				        path: ~/workspace/build_artifacts/master

				        destination: docs

				  pytorch_cpp_doc_push:

				  pytorch_cpp_doc_build:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-cpp-doc-push

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:209062ef-ab58-422a-b295-36c4eed6e906"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				      image: ubuntu-1604:202007-01

				    steps:

				    - checkout

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -65,39 +86,36 @@

				        no_output_timeout: "1h"

				        command: |

				          set -ex

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          tag=${CIRCLE_TAG:1:5}

				          target=${tag:-master}

				          echo "building for ${target}"

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          # master branch docs push

				          if [[ "${CIRCLE_BRANCH}" == "master" ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/master master") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          # stable release docs push. Due to some circleci limitations, we keep

				          # an eternal PR open (#16502) for merging v1.0.1 -> master for this job.

				          # XXX: The following code is only run on the v1.0.1 branch, which might

				          # not be exactly the same as what you see here.

				          elif [[ "${CIRCLE_BRANCH}" == "v1.0.1" ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/stable 1.0.1") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          # For open PRs: Do a dry_run of the docs build, don't push build

				          else

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/master master dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          fi

				          export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && '"export CIRCLE_SHA1='$CIRCLE_SHA1'"' && . ./.circleci/scripts/cpp_doc_push_script.sh docs/"$target" master") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir -p ~/workspace/build_artifacts

				          docker cp $id:/var/lib/jenkins/workspace/cppdocs/ /tmp/workspace

				          # Save the docs build so we can debug any problems

				          export DEBUG_COMMIT_DOCKER_IMAGE=${COMMIT_DOCKER_IMAGE}-debug

				          docker commit "$id" ${DEBUG_COMMIT_DOCKER_IMAGE}

				          time docker push ${DEBUG_COMMIT_DOCKER_IMAGE}

				    - persist_to_workspace:

				        root: /tmp/workspace

				        paths:

				          - .

				  pytorch_macos_10_13_py3_build:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build

				    macos:

				      xcode: "9.4.1"

				      xcode: "12.0"

				    steps:

				      - checkout

				      - run_brew_for_macos_build

				@ -106,7 +124,7 @@

				          no_output_timeout: "1h"

				          command: |

				            set -e

				            export IN_CIRCLECI=1

				            export IN_CI=1

				            # Install sccache

				            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				@ -131,7 +149,7 @@

				    environment:

				      BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test

				    macos:

				      xcode: "9.4.1"

				      xcode: "12.0"

				    steps:

				      - checkout

				      - attach_workspace:

				@ -142,7 +160,7 @@

				          no_output_timeout: "1h"

				          command: |

				            set -e

				            export IN_CIRCLECI=1

				            export IN_CI=1

				            chmod a+x .jenkins/pytorch/macos-test.sh

				            unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts

				@ -152,13 +170,14 @@

				  pytorch_android_gradle_build:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:209062ef-ab58-422a-b295-36c4eed6e906"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				      image: ubuntu-1604:202007-01

				    steps:

				    - checkout

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -166,7 +185,7 @@

				        no_output_timeout: "1h"

				        command: |

				          set -eux

				          docker_image_commit=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          docker_image_commit=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          docker_image_libtorch_android_x86_32=${docker_image_commit}-android-x86_32

				          docker_image_libtorch_android_x86_64=${docker_image_commit}-android-x86_64

				@ -181,16 +200,16 @@

				          # x86_32

				          time docker pull ${docker_image_libtorch_android_x86_32} >/dev/null

				          export id_x86_32=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})

				          export id_x86_32=$(docker run --env-file "${BASH_ENV}" -e GRADLE_OFFLINE=1 --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'

				          export COMMAND='((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # arm-v7a

				          time docker pull ${docker_image_libtorch_android_arm_v7a} >/dev/null

				          export id_arm_v7a=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v7a})

				          export id_arm_v7a=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v7a})

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v7a" bash) 2>&1'

				          export COMMAND='((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v7a" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir -p ~/workspace/build_android_install_arm_v7a

				@ -198,9 +217,9 @@

				          # x86_64

				          time docker pull ${docker_image_libtorch_android_x86_64} >/dev/null

				          export id_x86_64=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_64})

				          export id_x86_64=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_64})

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_x86_64" bash) 2>&1'

				          export COMMAND='((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_x86_64" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir -p ~/workspace/build_android_install_x86_64

				@ -208,9 +227,9 @@

				          # arm-v8a

				          time docker pull ${docker_image_libtorch_android_arm_v8a} >/dev/null

				          export id_arm_v8a=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v8a})

				          export id_arm_v8a=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v8a})

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v8a" bash) 2>&1'

				          export COMMAND='((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v8a" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir -p ~/workspace/build_android_install_arm_v8a

				@ -221,7 +240,7 @@

				          docker cp ~/workspace/build_android_install_arm_v8a $id_x86_32:/var/lib/jenkins/workspace/build_android_install_arm_v8a

				          # run gradle buildRelease

				          export COMMAND='((echo "source ./workspace/env" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GRADLE_OFFLINE=1" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'

				          export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir -p ~/workspace/build_android_artifacts

				@ -230,26 +249,9 @@

				          output_image=$docker_image_libtorch_android_x86_32-gradle

				          docker commit "$id_x86_32" ${output_image}

				          time docker push ${output_image}

				    - run:

				        name: save binary size

				        no_output_timeout: "5m"

				        command: |

				          docker_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}-android-x86_32-gradle

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image})

				          echo "docker-id: $id"

				          cat \<< EOL | docker exec -u jenkins -i "$id" bash

				          # ============================== Begin Docker ==============================

				          cd workspace

				          source ./env

				          export ANDROID_BUILD_TYPE="prebuild"

				          export COMMIT_TIME=\$(git log --max-count=1 --format=%ct || echo 0)

				          export CIRCLE_BUILD_NUM="${CIRCLE_BUILD_NUM}"

				          export CIRCLE_SHA1="${CIRCLE_SHA1}"

				          export CIRCLE_BRANCH="${CIRCLE_BRANCH}"

				          export SCRIBE_GRAPHQL_ACCESS_TOKEN="${SCRIBE_GRAPHQL_ACCESS_TOKEN}"

				          python .circleci/scripts/upload_binary_size_to_scuba.py android

				          # ==============================  End Docker  ==============================

				          EOL

				    - upload_binary_size_for_android_build:

				        build_type: prebuilt

				        artifacts: /home/circleci/workspace/build_android_artifacts/artifacts.tgz

				    - store_artifacts:

				        path: ~/workspace/build_android_artifacts/artifacts.tgz

				        destination: artifacts.tgz

				@ -257,22 +259,22 @@

				  pytorch_android_publish_snapshot:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-publish-snapshot

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:209062ef-ab58-422a-b295-36c4eed6e906"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				      image: ubuntu-1604:202007-01

				    steps:

				    - checkout

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - checkout

				    - setup_ci_environment

				    - run:

				        name: pytorch android gradle build

				        no_output_timeout: "1h"

				        command: |

				          set -eux

				          docker_image_commit=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          docker_image_commit=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          docker_image_libtorch_android_x86_32_gradle=${docker_image_commit}-android-x86_32-gradle

				@ -281,9 +283,9 @@

				          # x86_32

				          time docker pull ${docker_image_libtorch_android_x86_32_gradle} >/dev/null

				          export id_x86_32=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32_gradle})

				          export id_x86_32=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32_gradle})

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export SONATYPE_NEXUS_USERNAME=${SONATYPE_NEXUS_USERNAME}" && echo "export SONATYPE_NEXUS_PASSWORD=${SONATYPE_NEXUS_PASSWORD}" && echo "export ANDROID_SIGN_KEY=${ANDROID_SIGN_KEY}" && echo "export ANDROID_SIGN_PASS=${ANDROID_SIGN_PASS}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/publish_android_snapshot.sh") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'

				          export COMMAND='((echo "sudo chown -R jenkins workspace" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export SONATYPE_NEXUS_USERNAME=${SONATYPE_NEXUS_USERNAME}" && echo "export SONATYPE_NEXUS_PASSWORD=${SONATYPE_NEXUS_PASSWORD}" && echo "export ANDROID_SIGN_KEY=${ANDROID_SIGN_KEY}" && echo "export ANDROID_SIGN_PASS=${ANDROID_SIGN_PASS}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/publish_android_snapshot.sh") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          output_image=${docker_image_libtorch_android_x86_32_gradle}-publish-snapshot

				@ -293,21 +295,14 @@

				  pytorch_android_gradle_build-x86_32:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:209062ef-ab58-422a-b295-36c4eed6e906"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				      image: ubuntu-1604:202007-01

				    steps:

				    - checkout

				    - run:

				        name: filter out not PR runs

				        no_output_timeout: "5m"

				        command: |

				          echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST:-}"

				          if [ -z "${CIRCLE_PULL_REQUEST:-}" ]; then

				            circleci step halt

				          fi

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - checkout

				    - setup_ci_environment

				@ -316,14 +311,14 @@

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          docker_image_libtorch_android_x86_32=${DOCKER_IMAGE}-${CIRCLE_SHA1}-android-x86_32

				          docker_image_libtorch_android_x86_32=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}-android-x86_32

				          echo "docker_image_libtorch_android_x86_32: "${docker_image_libtorch_android_x86_32}

				          # x86

				          time docker pull ${docker_image_libtorch_android_x86_32} >/dev/null

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})

				          export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})

				          export COMMAND='((echo "source ./workspace/env" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GRADLE_OFFLINE=1" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GRADLE_OFFLINE=1" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir -p ~/workspace/build_android_x86_32_artifacts

				@ -332,34 +327,57 @@

				          output_image=${docker_image_libtorch_android_x86_32}-gradle

				          docker commit "$id" ${output_image}

				          time docker push ${output_image}

				    - run:

				        name: save binary size

				        no_output_timeout: "5m"

				        command: |

				          docker_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}-android-x86_32-gradle

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image})

				          echo "docker-id: $id"

				          cat \<< EOL | docker exec -u jenkins -i "$id" bash

				          # ============================== Begin Docker ==============================

				          cd workspace

				          source ./env

				          export ANDROID_BUILD_TYPE="prebuild-single"

				          export COMMIT_TIME=\$(git log --max-count=1 --format=%ct || echo 0)

				          export CIRCLE_BUILD_NUM="${CIRCLE_BUILD_NUM}"

				          export CIRCLE_SHA1="${CIRCLE_SHA1}"

				          export CIRCLE_BRANCH="${CIRCLE_BRANCH}"

				          export SCRIBE_GRAPHQL_ACCESS_TOKEN="${SCRIBE_GRAPHQL_ACCESS_TOKEN}"

				          python .circleci/scripts/upload_binary_size_to_scuba.py android

				          # ==============================  End Docker  ==============================

				          EOL

				    - upload_binary_size_for_android_build:

				        build_type: prebuilt-single

				        artifacts: /home/circleci/workspace/build_android_x86_32_artifacts/artifacts.tgz

				    - store_artifacts:

				        path: ~/workspace/build_android_x86_32_artifacts/artifacts.tgz

				        destination: artifacts.tgz

				  pytorch_android_gradle_custom_build_single:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:202007-01

				    steps:

				    - checkout

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - checkout

				    - calculate_docker_image_tag

				    - setup_ci_environment

				    - run:

				        name: pytorch android gradle custom build single architecture (for PR)

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          # Unlike other gradle jobs, it's not worth building libtorch in a separate CI job and share via docker, because:

				          # 1) Not shareable: it's custom selective build, which is different from default libtorch mobile build;

				          # 2) Not parallelizable by architecture: it only builds libtorch for one architecture;

				          echo "DOCKER_IMAGE: ${DOCKER_IMAGE}:${DOCKER_TAG}"

				          time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null

				          git submodule sync && git submodule update -q --init --recursive

				          VOLUME_MOUNTS="-v /home/circleci/project/:/var/lib/jenkins/workspace"

				          export id=$(docker run --env-file "${BASH_ENV}" ${VOLUME_MOUNTS} --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})

				          export COMMAND='((echo "export GRADLE_OFFLINE=1" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # Skip docker push as this job is purely for size analysis purpose.

				          # Result binaries are already in `/home/circleci/project/` as it's mounted instead of copied.

				    - upload_binary_size_for_android_build:

				        build_type: custom-build-single

				  pytorch_ios_build:

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "11.2.1"

				      xcode: "12.0"

				    steps:

				      - checkout

				      - run_brew_for_ios_build

				@ -378,7 +396,7 @@

				            rm cert.txt

				            bundle exec fastlane install_cert

				            # install the provisioning profile

				            PROFILE=TestApp_CI.mobileprovision

				            PROFILE=PyTorch_CI_2021.mobileprovision

				            PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				            mkdir -pv "${PROVISIONING_PROFILES}"

				            cd "${PROVISIONING_PROFILES}"

				@ -390,7 +408,7 @@

				          no_output_timeout: "1h"

				          command: |

				            set -e

				            export IN_CIRCLECI=1

				            export IN_CI=1

				            WORKSPACE=/Users/distiller/workspace

				            PROJ_ROOT=/Users/distiller/project

				            export TCLLIBPATH="/usr/local/lib"

				@ -407,7 +425,7 @@

				                $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)

				            }

				            retry conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes

				            retry conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes

				            # sync submodules

				            cd ${PROJ_ROOT}

				@ -421,6 +439,7 @@

				            chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh

				            echo "IOS_ARCH: ${IOS_ARCH}"

				            echo "IOS_PLATFORM: ${IOS_PLATFORM}"

				            echo "USE_PYTORCH_METAL": "${USE_METAL}"

				            #check the custom build flag

				            echo "SELECTED_OP_LIST: ${SELECTED_OP_LIST}"

				@ -429,6 +448,9 @@

				            fi

				            export IOS_ARCH=${IOS_ARCH}

				            export IOS_PLATFORM=${IOS_PLATFORM}

				            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then

				              export USE_PYTORCH_METAL=${USE_METAL}

				            fi

				            unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts

				      - run:

				          name: Run Build Test

				@ -436,7 +458,7 @@

				          command: |

				            set -e

				            PROJ_ROOT=/Users/distiller/project

				            PROFILE=TestApp_CI

				            PROFILE=PyTorch_CI_2021

				            # run the ruby build script

				            if ! [ -x "$(command -v xcodebuild)" ]; then

				              echo 'Error: xcodebuild is not installed.'

				@ -475,9 +497,10 @@

				  pytorch_linux_bazel_build:

				    <<: *pytorch_params

				    machine:

				      image: ubuntu-1604:201903-01

				      image: ubuntu-1604:202007-01

				    steps:

				    - checkout

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -486,9 +509,9 @@

				        command: |

				          set -e

				          # Pull Docker image and run build

				          echo "DOCKER_IMAGE: "${DOCKER_IMAGE}

				          time docker pull ${DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				          echo "DOCKER_IMAGE: "${DOCKER_IMAGE}:${DOCKER_TAG}

				          time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null

				          export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})

				          echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"

				@ -496,14 +519,14 @@

				          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # Push intermediate Docker image for next phase to use

				          if [ -z "${BUILD_ONLY}" ]; then

				            # Augment our output image name with bazel to avoid collisions

				            output_image=${DOCKER_IMAGE}-bazel-${CIRCLE_SHA1}

				            output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-bazel-${CIRCLE_SHA1}

				            export COMMIT_DOCKER_IMAGE=$output_image

				            docker commit "$id" ${COMMIT_DOCKER_IMAGE}

				            time docker push ${COMMIT_DOCKER_IMAGE}

				@ -512,9 +535,10 @@

				  pytorch_linux_bazel_test:

				    <<: *pytorch_params

				    machine:

				      image: ubuntu-1604:201903-01

				      image: ubuntu-1604:202007-01

				    steps:

				    - checkout

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -522,16 +546,16 @@

				        no_output_timeout: "90m"

				        command: |

				          set -e

				          output_image=${DOCKER_IMAGE}-bazel-${CIRCLE_SHA1}

				          output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-bazel-${CIRCLE_SHA1}

				          export COMMIT_DOCKER_IMAGE=$output_image

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				            export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --gpus all -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          else

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				            export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          fi

				          retrieve_test_reports() {

				@ -541,9 +565,9 @@

				          trap "retrieve_test_reports" ERR

				          if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				            export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          else

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export CIRCLE_PULL_REQUEST=${CIRCLE_PULL_REQUEST}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				            export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          fi

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				@ -555,13 +579,13 @@

				  pytorch_doc_test:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-doc-test

				      # TODO: stop hardcoding this

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:209062ef-ab58-422a-b295-36c4eed6e906"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4"

				    resource_class: medium

				    machine:

				      image: ubuntu-1604:201903-01

				      image: ubuntu-1604:202007-01

				    steps:

				    - checkout

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -569,9 +593,9 @@

				        no_output_timeout: "30m"

				        command: |

				          set -ex

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.jenkins/pytorch/docs-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && . ./.jenkins/pytorch/docs-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

									
										173

.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml
									
												View File
												
				@ -2,12 +2,12 @@ jobs:

				  pytorch_linux_build:

				    <<: *pytorch_params

				    machine:

				      image: ubuntu-1604:201903-01

				      image: ubuntu-1604:202007-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - checkout

				    - optional_merge_target_branch

				    - setup_ci_environment

				    - run:

				@ -15,33 +15,42 @@ jobs:

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          if [[ "${DOCKER_IMAGE}" == *rocm3.9* ]]; then

				            export DOCKER_TAG="f3d89a32912f62815e4feaeed47e564e887dffd6"

				          fi

				          if [[ ${BUILD_ENVIRONMENT} == *"pure_torch"* ]]; then

				            echo 'BUILD_CAFFE2=OFF' >> "${BASH_ENV}"

				          fi

				          if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then

				            echo 'ATEN_THREADING=TBB' >> "${BASH_ENV}"

				            echo 'USE_TBB=1' >> "${BASH_ENV}"

				          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				            echo 'ATEN_THREADING=NATIVE' >> "${BASH_ENV}"

				          fi

				          echo "Parallel backend flags: "${PARALLEL_FLAGS}

				          # Pull Docker image and run build

				          echo "DOCKER_IMAGE: "${DOCKER_IMAGE}

				          time docker pull ${DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				          echo "DOCKER_IMAGE: "${DOCKER_IMAGE}:${DOCKER_TAG}

				          time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null

				          export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})

				          git submodule sync && git submodule update -q --init --recursive

				          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				          if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=TBB USE_TBB=1 "

				          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=NATIVE "

				          fi

				          echo "Parallel backend flags: "${PARALLEL_FLAGS}

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$PARALLEL_FLAGS"' && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # Copy dist folder back

				          docker cp $id:/var/lib/jenkins/workspace/dist /home/circleci/project/. || echo "Dist folder not found"

				          # Push intermediate Docker image for next phase to use

				          if [ -z "${BUILD_ONLY}" ]; then

				            # Note [Special build images]

				            # The xla build uses the same docker image as

				            # pytorch-linux-trusty-py3.6-gcc5.4-build. In the push step, we have to

				            # pytorch_linux_bionic_py3_6_clang9_build. In the push step, we have to

				            # distinguish between them so the test can pick up the correct image.

				            output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				            output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				            if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-xla

				            elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then

				@ -60,20 +69,25 @@ jobs:

				              export COMMIT_DOCKER_IMAGE=$output_image-android-x86_32

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-vulkan-x86_32"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-vulkan-x86_32

				            elif [[ ${BUILD_ENVIRONMENT} == *"vulkan-linux"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-vulkan

				            else

				              export COMMIT_DOCKER_IMAGE=$output_image

				            fi

				            docker commit "$id" ${COMMIT_DOCKER_IMAGE}

				            time docker push ${COMMIT_DOCKER_IMAGE}

				          fi

				    - store_artifacts:

				        path: /home/circleci/project/dist

				  pytorch_linux_test:

				    <<: *pytorch_params

				    machine:

				      image: ubuntu-1604:201903-01

				      image: ubuntu-1604:202007-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -81,8 +95,12 @@ jobs:

				        no_output_timeout: "90m"

				        command: |

				          set -e

				          export PYTHONUNBUFFERED=1

				          if [[ "${DOCKER_IMAGE}" == *rocm3.9* ]]; then

				            export DOCKER_TAG="f3d89a32912f62815e4feaeed47e564e887dffd6"

				          fi

				          # See Note [Special build images]

				          output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-xla

				          elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then

				@ -91,30 +109,34 @@ jobs:

				            export COMMIT_DOCKER_IMAGE=$output_image-paralleltbb

				          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-parallelnative

				          elif [[ ${BUILD_ENVIRONMENT} == *"vulkan-linux"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-vulkan

				          else

				            export COMMIT_DOCKER_IMAGE=$output_image

				          fi

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=TBB USE_TBB=1 "

				            echo 'ATEN_THREADING=TBB' >> "${BASH_ENV}"

				            echo 'USE_TBB=1' >> "${BASH_ENV}"

				          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=NATIVE "

				            echo 'ATEN_THREADING=NATIVE' >> "${BASH_ENV}"

				          fi

				          echo "Parallel backend flags: "${PARALLEL_FLAGS}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          # TODO: Make this less painful

				          if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				            export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --gpus all --shm-size=2g -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          elif [[ ${BUILD_ENVIRONMENT} == *"rocm"* ]]; then

				            hostname

				            export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size=8g --ipc=host --device /dev/kfd --device /dev/dri --group-add video -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          else

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				            export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size=1g --ipc=host -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          fi

				          echo "id=${id}" >> "${BASH_ENV}"

				          # Pass environment variables to the next step

				          # See https://circleci.com/docs/2.0/env-vars/#using-parameters-and-bash-environment

				          echo "export PARALLEL_FLAGS=\"${PARALLEL_FLAGS}\"" >> $BASH_ENV

				          echo "export id=$id" >> $BASH_ENV

				    - run:

				        name: Check for no AVX instruction by default

				        no_output_timeout: "20m"

				@ -131,8 +153,8 @@ jobs:

				          }

				          if is_vanilla_build; then

				            echo "apt-get update && apt-get install -y qemu-user" | docker exec -u root -i "$id" bash

				            echo "cd workspace/build; qemu-x86_64 -cpu Broadwell -E ATEN_CPU_CAPABILITY=default ./bin/basic --gtest_filter=BasicTest.BasicTestCPU" | docker exec -u jenkins -i "$id" bash

				            echo "apt-get update && apt-get install -y qemu-user gdb" | docker exec -u root -i "$id" bash

				            echo "cd workspace/build; qemu-x86_64 -g 2345 -cpu Broadwell -E ATEN_CPU_CAPABILITY=default ./bin/basic --gtest_filter=BasicTest.BasicTestCPU & gdb ./bin/basic -ex 'set pagination off' -ex 'target remote :2345' -ex 'continue' -ex 'bt' -ex='set confirm off' -ex 'quit \$_isvoid(\$_exitcode)'" | docker exec -u jenkins -i "$id" bash

				          else

				            echo "Skipping for ${BUILD_ENVIRONMENT}"

				          fi

				@ -142,21 +164,61 @@ jobs:

				        command: |

				          set -e

				          cat >docker_commands.sh \<<EOL

				          # =================== The following code will be executed inside Docker container ===================

				          set -ex

				          export SCRIBE_GRAPHQL_ACCESS_TOKEN="${SCRIBE_GRAPHQL_ACCESS_TOKEN}"

				          ${PARALLEL_FLAGS}

				          cd workspace

				          EOL

				          if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				            echo ".jenkins/pytorch/multigpu-test.sh" >> docker_commands.sh

				          elif [[ ${BUILD_ENVIRONMENT} == *onnx* ]]; then

				            echo "pip install click mock tabulate networkx==2.0" >> docker_commands.sh

				            echo "pip -q install --user \"file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx\"" >> docker_commands.sh

				            echo ".jenkins/caffe2/test.sh" >> docker_commands.sh

				          else

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export CIRCLE_PULL_REQUEST=${CIRCLE_PULL_REQUEST}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				            echo ".jenkins/pytorch/test.sh" >> docker_commands.sh

				          fi

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          echo "(cat docker_commands.sh | docker exec -u jenkins -i "$id" bash) 2>&1" > command.sh

				          unbuffer bash command.sh | ts

				    - run:

				        name: Report results

				        no_output_timeout: "5m"

				        command: |

				          set -e

				          docker stats --all --no-stream

				          echo "cd workspace; python test/print_test_stats.py test" | docker exec -u jenkins -i "$id" bash

				          cat >docker_commands.sh \<<EOL

				          # =================== The following code will be executed inside Docker container ===================

				          set -ex

				          export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}

				          export SCRIBE_GRAPHQL_ACCESS_TOKEN="${SCRIBE_GRAPHQL_ACCESS_TOKEN}"

				          export CIRCLE_TAG="${CIRCLE_TAG:-}"

				          export CIRCLE_SHA1="$CIRCLE_SHA1"

				          export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"

				          export CIRCLE_BRANCH="$CIRCLE_BRANCH"

				          export CIRCLE_JOB="$CIRCLE_JOB"

				          export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"

				          cd workspace

				          python test/print_test_stats.py --upload-to-s3 test

				          EOL

				          echo "(cat docker_commands.sh | docker exec -u jenkins -i "$id" bash) 2>&1" > command.sh

				          unbuffer bash command.sh | ts

				          echo "Retrieving test reports"

				          docker cp $id:/var/lib/jenkins/workspace/test/test-reports ./ || echo 'No test reports found!'

				          if [[ ${BUILD_ENVIRONMENT} == *"coverage"* ]]; then

				              echo "Retrieving C++ coverage report"

				              docker cp $id:/var/lib/jenkins/workspace/build/coverage.info ./test

				          fi

				          if [[ ${BUILD_ENVIRONMENT} == *"coverage"* || ${BUILD_ENVIRONMENT} == *"onnx"* ]]; then

				              echo "Retrieving Python coverage report"

				              docker cp $id:/var/lib/jenkins/workspace/test/.coverage ./test

				              docker cp $id:/var/lib/jenkins/workspace/test/coverage.xml ./test

				              python3 -mpip install codecov

				              python3 -mcodecov

				          fi

				        when: always

				    - store_test_results:

				        path: test-reports

				@ -166,7 +228,7 @@ jobs:

				    parameters:

				      executor:

				        type: string

				        default: "windows-cpu-with-nvidia-cuda"

				        default: "windows-xlarge-cpu-with-nvidia-cuda"

				      build_environment:

				        type: string

				        default: ""

				@ -175,16 +237,16 @@ jobs:

				        default: ""

				      cuda_version:

				        type: string

				        default: "10"

				        default: "10.1"

				      python_version:

				        type: string

				        default: "3.6"

				      vc_version:

				        type: string

				        default: "14.11"

				        default: "14.16"

				      vc_year:

				        type: string

				        default: "2017"

				        default: "2019"

				      vc_product:

				        type: string

				        default: "BuildTools"

				@ -195,11 +257,10 @@ jobs:

				    steps:

				      - checkout

				      - run:

				          name: Install VS2017

				          name: _HACK_ Install CUDA compatible cmath

				          no_output_timeout: 1m

				          command: |

				            if [[ "${VC_YEAR}" == "2017" ]]; then

				              powershell .circleci/scripts/vs_install.ps1

				            fi

				              powershell .circleci/scripts/vs_install_cmath.ps1

				      - run:

				          name: Install Cuda

				          no_output_timeout: 30m

				@ -211,10 +272,7 @@ jobs:

				          name: Install Cudnn

				          command : |

				            if [[ "${USE_CUDA}" == "1" ]]; then

				              cd c:/

				              curl --retry 3 -O https://ossci-windows.s3.amazonaws.com/cudnn-10.1-windows10-x64-v7.6.4.38.zip

				              7z x cudnn-10.1-windows10-x64-v7.6.4.38.zip -ocudnn

				              cp -r cudnn/cuda/* "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/"

				              .circleci/scripts/windows_cudnn_install.sh

				            fi

				      - run:

				          name: Build

				@ -237,7 +295,7 @@ jobs:

				    parameters:

				      executor:

				        type: string

				        default: "windows-cpu-with-nvidia-cuda"

				        default: "windows-medium-cpu-with-nvidia-cuda"

				      build_environment:

				        type: string

				        default: ""

				@ -246,16 +304,16 @@ jobs:

				        default: ""

				      cuda_version:

				        type: string

				        default: "10"

				        default: "10.1"

				      python_version:

				        type: string

				        default: "3.6"

				      vc_version:

				        type: string

				        default: "14.11"

				        default: "14.16"

				      vc_year:

				        type: string

				        default: "2017"

				        default: "2019"

				      vc_product:

				        type: string

				        default: "BuildTools"

				@ -267,34 +325,27 @@ jobs:

				      - checkout

				      - attach_workspace:

				          at: c:/users/circleci/workspace

				      - run:

				          name: Install VS2017

				          command: |

				            if [[ "${VC_YEAR}" == "2017" ]]; then

				              powershell .circleci/scripts/vs_install.ps1

				            fi

				      - run:

				          name: Install Cuda

				          no_output_timeout: 30m

				          command: |

				            if [[ "${CUDA_VERSION}" != "cpu" && "${JOB_EXECUTOR}" != "windows-with-nvidia-gpu" ]]; then

				              .circleci/scripts/windows_cuda_install.sh

				            if [[ "${CUDA_VERSION}" != "cpu" ]]; then

				              if [[ "${CUDA_VERSION}" != "10" || "${JOB_EXECUTOR}" != "windows-with-nvidia-gpu" ]]; then

				                .circleci/scripts/windows_cuda_install.sh

				              fi

				            fi

				      - run:

				          name: Install Cudnn

				          command : |

				            if [[ "${CUDA_VERSION}" != "cpu" && "${JOB_EXECUTOR}" != "windows-with-nvidia-gpu" ]]; then

				              cd c:/

				              curl --retry 3 -O https://ossci-windows.s3.amazonaws.com/cudnn-10.1-windows10-x64-v7.6.4.38.zip

				              7z x cudnn-10.1-windows10-x64-v7.6.4.38.zip -ocudnn

				              cp -r cudnn/cuda/* "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/"

				            if [[ "${CUDA_VERSION}" != "cpu" ]]; then

				              .circleci/scripts/windows_cudnn_install.sh

				            fi

				      - run:

				          name: Test

				          no_output_timeout: "30m"

				          command: |

				            set -e

				            export IN_CIRCLECI=1

				            export IN_CI=1

				            set +x

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}

									
										3

.circleci/verbatim-sources/workflows/workflows-ecr-gc.yml
									
												View File
												
				@ -11,7 +11,7 @@

				      - ecr_gc_job:

				            name: ecr_gc_job_for_pytorch

				            project: pytorch

				            tags_to_keep: "271,262,256,278,282,291,300,323,327,347,389,401,402,403,405,a8006f9a-272d-4478-b137-d121c6f05c83,6e7b11da-a919-49e5-b2ba-da66e3d4bb0a,f990c76a-a798-42bb-852f-5be5006f8026,e43973a9-9d5a-4138-9181-a08a0fc55e2f,8fcf46ef-4a34-480b-a8ee-b0a30a4d3e59,9a3986fa-7ce7-4a36-a001-3c9bef9892e2,1bc00f11-e0f3-4e5c-859f-15937dd938cd,209062ef-ab58-422a-b295-36c4eed6e906,be76e8fd-44e2-484d-b090-07e0cc3a56f0"

				            tags_to_keep: "271,262,256,278,282,291,300,323,327,347,389,401,402,403,405,a8006f9a-272d-4478-b137-d121c6f05c83,6e7b11da-a919-49e5-b2ba-da66e3d4bb0a,f990c76a-a798-42bb-852f-5be5006f8026,e43973a9-9d5a-4138-9181-a08a0fc55e2f,8fcf46ef-4a34-480b-a8ee-b0a30a4d3e59,9a3986fa-7ce7-4a36-a001-3c9bef9892e2,1bc00f11-e0f3-4e5c-859f-15937dd938cd,209062ef-ab58-422a-b295-36c4eed6e906,be76e8fd-44e2-484d-b090-07e0cc3a56f0,fff7795428560442086f7b2bb6004b65245dc11a,ab1632df-fa59-40e6-8c23-98e004f61148"

				            requires:

				              - docker_for_ecr_gc_build_job

				      - ecr_gc_job:

				@ -32,4 +32,3 @@

				            tags_to_keep: "34"

				            requires:

				              - docker_for_ecr_gc_build_job

				      - docker_hub_index_job

1132

.circleci/windows-jni/include/jni.h Normal file

View File

File diff suppressed because it is too large Load Diff

7

.clang-tidy

View File

 @ -1,6 +1,7 @@
 ---
 # NOTE there must be no spaces before the '-', so put the comma last.
 Checks: '-*,
 InheritParentConfig: true
 Checks: '
 bugprone-*,
 -bugprone-forward-declaration-namespace,
 -bugprone-macro-parentheses,
 @ -17,9 +18,11 @@ cppcoreguidelines-*,
 -cppcoreguidelines-pro-type-union-access,
 -cppcoreguidelines-pro-type-vararg,
 -cppcoreguidelines-special-member-functions,
 -facebook-hte-RelativeInclude,
 hicpp-exception-baseclass,
 hicpp-avoid-goto,
 modernize-*,
 -modernize-concat-nested-namespaces,
 -modernize-return-braced-init-list,
 -modernize-use-auto,
 -modernize-use-default-member-init,
 @ -27,7 +30,7 @@ modernize-*,
 -modernize-use-trailing-return-type,
 performance-*,
 -performance-noexcept-move-constructor,
   '
 '
 HeaderFilterRegex: 'torch/csrc/.*'
 AnalyzeTemporaryDtors: false
 CheckOptions:

21

.flake8

View File

 @ -12,5 +12,22 @@ ignore =
     B007,B008,
     # these ignores are from flake8-comprehensions; please fix!
     C400,C401,C402,C403,C404,C405,C407,C411,C413,C414,C415
 per-file-ignores = __init__.py: F401
 exclude = docs/src,venv,third_party,caffe2,scripts,docs/caffe2,torch/lib/include,torch/lib/tmp_install,build,torch/include,*.pyi,.git,build,build_test_custom_build,build_code_analyzer
 per-file-ignores = __init__.py: F401 torch/utils/cpp_extension.py: B950
 exclude =
     docs/src,
     docs/cpp/src,
     venv,
     third_party,
     caffe2,
     scripts,
     docs/caffe2,
     torch/lib/include,
     torch/lib/tmp_install,
     build,
     torch/include,
     *.pyi,
     .git,
     build,
     build_test_custom_build,
     build_code_analyzer,
     test/generated_type_hints_smoketest.py

1

.github/PULL_REQUEST_TEMPLATE.md vendored

View File

				`@ -0,0 +1 @@`
				`Fixes #{issue number}`

									
										2

.github/pytorch-circleci-labels.yml
									
										vendored
									
												View File
												
				@ -9,3 +9,5 @@ labels_to_circle_params:

				        - release/.*

				      tags:

				        - v[0-9]+(\.[0-9]+)*-rc[0-9]+

				    set_to_false:

				      - run_build

									
										86

.github/scripts/generate_binary_build_matrix.py
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,86 @@

				#!/usr/bin/env python3

				"""Generates a matrix to be utilized through github actions

				Will output a condensed version of the matrix if on a pull request that only

				includes the latest version of python we support built on three different

				architectures:

				    * CPU

				    * Latest CUDA

				    * Latest ROCM

				"""

				import json

				import os

				import itertools

				CUDA_ARCHES = [

				    "10.1",

				    "10.2",

				    "11.0"

				]

				ROCM_ARCHES = [

				    "3.10",

				    "4.0"

				]

				FULL_ARCHES = [

				    "cpu",

				    *CUDA_ARCHES,

				    *ROCM_ARCHES

				]

				CONTAINER_IMAGES = {

				    **{

				        # TODO: Re-do manylinux CUDA image tagging scheme to be similar to

				        #       ROCM so we don't have to do this replacement

				        gpu_arch: f"pytorch/manylinux-cuda{gpu_arch.replace('.', '')}"

				        for gpu_arch in CUDA_ARCHES

				    },

				    **{

				        gpu_arch: f"pytorch/manylinux-rocm:{gpu_arch}"

				        for gpu_arch in ROCM_ARCHES

				    },

				    "cpu": "pytorch/manylinux-cpu"

				}

				FULL_PYTHON_VERSIONS = [

				    "3.6",

				    "3.7",

				    "3.8",

				    "3.9",

				]

				def is_pull_request():

				    return os.environ.get("GITHUB_HEAD_REF")

				def generate_matrix():

				    python_versions = FULL_PYTHON_VERSIONS

				    arches = FULL_ARCHES

				    if is_pull_request():

				        python_versions = [python_versions[-1]]

				        arches = ["cpu", CUDA_ARCHES[-1], ROCM_ARCHES[-1]]

				    matrix = []

				    for item in itertools.product(python_versions, arches):

				        python_version, arch_version = item

				        # Not my favorite code here

				        gpu_arch_type = "cuda"

				        if "rocm" in CONTAINER_IMAGES[arch_version]:

				            gpu_arch_type = "rocm"

				        elif "cpu" in CONTAINER_IMAGES[arch_version]:

				            gpu_arch_type = "cpu"

				        matrix.append({

				            "python_version": python_version,

				            "gpu_arch_type": gpu_arch_type,

				            "gpu_arch_version": arch_version,

				            "container_image": CONTAINER_IMAGES[arch_version]

				        })

				    return json.dumps({"include": matrix})

				def main():

				    print(generate_matrix())

				if __name__ == "__main__":

				    main()

									
										113

.github/scripts/generate_pytorch_version.py
									
										vendored
									
										Executable file
									
												View File
												
				@ -0,0 +1,113 @@

				#!/usr/bin/env python3

				import argparse

				import os

				import subprocess

				import re

				from datetime import datetime

				from distutils.util import strtobool

				from pathlib import Path

				LEADING_V_PATTERN = re.compile("^v")

				TRAILING_RC_PATTERN = re.compile("-rc[0-9]*$")

				LEGACY_BASE_VERSION_SUFFIX_PATTERN = re.compile("a0$")

				class NoGitTagException(Exception):

				    pass

				def get_pytorch_root():

				    return Path(subprocess.check_output(

				        ['git', 'rev-parse', '--show-toplevel']

				    ).decode('ascii').strip())

				def get_tag():

				    root = get_pytorch_root()

				    # We're on a tag

				    am_on_tag = (

				        subprocess.run(

				            ['git', 'describe', '--tags', '--exact'],

				            cwd=root,

				            stdout=subprocess.DEVNULL,

				            stderr=subprocess.DEVNULL

				        ).returncode == 0

				    )

				    tag = ""

				    if am_on_tag:

				        dirty_tag = subprocess.check_output(

				            ['git', 'describe'],

				            cwd=root

				        ).decode('ascii').strip()

				        # Strip leading v that we typically do when we tag branches

				        # ie: v1.7.1 -> 1.7.1

				        tag = re.sub(LEADING_V_PATTERN, "", dirty_tag)

				        # Strip trailing rc pattern

				        # ie: 1.7.1-rc1 -> 1.7.1

				        tag = re.sub(TRAILING_RC_PATTERN, "", tag)

				    return tag

				def get_base_version():

				    root = get_pytorch_root()

				    dirty_version = open(root / 'version.txt', 'r').read().strip()

				    # Strips trailing a0 from version.txt, not too sure why it's there in the

				    # first place

				    return re.sub(LEGACY_BASE_VERSION_SUFFIX_PATTERN, "", dirty_version)

				class PytorchVersion:

				    def __init__(self, gpu_arch_type, gpu_arch_version, no_build_suffix):

				        self.gpu_arch_type = gpu_arch_type

				        self.gpu_arch_version = gpu_arch_version

				        self.no_build_suffix = no_build_suffix

				    def get_post_build_suffix(self):

				        if self.gpu_arch_type == "cuda":

				            return f"+cu{self.gpu_arch_version.replace('.', '')}"

				        return f"+{self.gpu_arch_type}{self.gpu_arch_version}"

				    def get_release_version(self):

				        if not get_tag():

				            raise NoGitTagException(

				                "Not on a git tag, are you sure you want a release version?"

				            )

				        return f"{get_tag()}{self.get_post_build_suffix()}"

				    def get_nightly_version(self):

				        date_str = datetime.today().strftime('%Y%m%d')

				        build_suffix = self.get_post_build_suffix()

				        return f"{get_base_version()}.dev{date_str}{build_suffix}"

				def main():

				    parser = argparse.ArgumentParser(

				        description="Generate pytorch version for binary builds"

				    )

				    parser.add_argument(

				        "--no-build-suffix",

				        type=strtobool,

				        help="Whether or not to add a build suffix typically (+cpu)",

				        default=os.environ.get("NO_BUILD_SUFFIX", False)

				    )

				    parser.add_argument(

				        "--gpu-arch-type",

				        type=str,

				        help="GPU arch you are building for, typically (cpu, cuda, rocm)",

				        default=os.environ.get("GPU_ARCH_TYPE", "cpu")

				    )

				    parser.add_argument(

				        "--gpu-arch-version",

				        type=str,

				        help="GPU arch version, typically (10.2, 4.0), leave blank for CPU",

				        default=os.environ.get("GPU_ARCH_VERSION", "")

				    )

				    args = parser.parse_args()

				    version_obj = PytorchVersion(

				        args.gpu_arch_type,

				        args.gpu_arch_version,

				        args.no_build_suffix

				    )

				    try:

				        print(version_obj.get_release_version())

				    except NoGitTagException:

				        print(version_obj.get_nightly_version())

				if __name__ == "__main__":

				    main()

									
										86

.github/workflows/build_linux_binaries.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,86 @@

				name: Build Linux Wheels

				on:

				  # TODO: These are only runnable from workflow_dispatch, we need to eventually add

				  #       a cron

				  # TODO: Add an on_release trigger to build on tags

				  workflow_dispatch:

				jobs:

				  generate-build-matrix:

				    if: ${{ github.repository_owner == 'pytorch' }}

				    runs-on: ubuntu-18.04

				    outputs:

				      matrix: ${{ steps.set-matrix.outputs.matrix }}

				    container:

				      image: python:3.9

				    steps:

				      - name: Clone pytorch/pytorch

				        uses: actions/checkout@v2

				      - name: Generating build matrix

				        id: set-matrix

				        run: |

				          # outputting for debugging purposes

				          python .github/scripts/generate_binary_build_matrix.py

				          MATRIX=$(python .github/scripts/generate_binary_build_matrix.py)

				          echo "::set-output name=matrix::${MATRIX}"

				  build-wheel:

				    if: ${{ github.repository_owner == 'pytorch' }}

				    needs: generate-build-matrix

				    runs-on: linux.2xlarge

				    strategy:

				      matrix:

				        ${{ fromJson(needs.generate-build-matrix.outputs.matrix) }}

				    container:

				      image: ${{ matrix.container_image }}

				    env:

				      DESIRED_PYTHON: ${{ matrix.python_version }}

				      # TODO: This is a legacy variable that we eventually want to get rid of in

				      #       favor of GPU_ARCH_VERSION

				      DESIRED_CUDA: ${{ matrix.gpu_arch_version }}

				      GPU_ARCH_VERSION: ${{ matrix.GPU_ARCH_VERSION }}

				      GPU_ARCH_TYPE: ${{ matrix.gpu_arch_type }}

				      PYTORCH_BUILD_NUMBER: 1

				      SKIP_ALL_TESTS: 1

				    steps:

				      - name: Clone pytorch/pytorch

				        uses: actions/checkout@v2

				        with:

				          path: pytorch

				          submodules: recursive

				      - name: Clone pytorch/builder

				        uses: actions/checkout@v2

				        with:

				          repository: pytorch/builder

				          path: builder

				      - name: Generate version string

				        working-directory: pytorch/

				        run: |

				          version=$(.github/scripts/generate_pytorch_version.py)

				          echo "Generated version: ${version}"

				          echo "PYTORCH_BUILD_VERSION=${version}" >> $GITHUB_ENV

				      # TODO: Remove this once we remove the need for the directories to be

				      #       in specific locations

				      - name: Symlink repositories to root directory (for legacy scripts purposes)

				        run: |

				          ln -s $(pwd)/pytorch /pytorch

				          ln -s $(pwd)/builder /builder

				      # TODO: Bundle the correct build script in the base container image so

				      #       that we don't have to do this type of specification

				      - name: Build PyTorch binary (CUDA specific)

				        if: ${{ matrix.gpu_arch_type == 'cuda' }}

				        run: |

				          /builder/manywheel/build.sh

				      - name: Build PyTorch binary (ROCM specific)

				        if: ${{ matrix.gpu_arch_type == 'rocm' }}

				        run: |

				          /builder/manywheel/build_rocm.sh

				      - name: Build PyTorch binary (CPU specific)

				        if: ${{ matrix.gpu_arch_type == 'cpu' }}

				        run: |

				          /builder/manywheel/build_cpu.sh

				      - uses: actions/upload-artifact@v2

				        with:

				          name: pytorch-wheel-py${{ matrix.python_version }}-${{matrix.gpu_arch_type}}-${{ matrix.gpu_arch_version }}

				          path: /remote/**/*.whl

				      # TODO: Add a step here for uploading binaries

									
										4

.github/workflows/clang_format.yml
									
										vendored
									
												View File
												
				@ -5,7 +5,7 @@ on:

				jobs:

				  clang-format:

				    runs-on: ubuntu-latest

				    runs-on: ubuntu-18.04

				    steps:

				      - name: Setup Python

				        uses: actions/setup-python@v1

				@ -35,7 +35,7 @@ jobs:

				          HEAD_SHA=${{ github.event.pull_request.head.sha }}

				          MERGE_BASE=$(git merge-base $BASE_SHA $HEAD_SHA)

				          # only run clang-format on whitelisted files

				          # only run clang-format on allowlisted files

				          echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"

				          echo "| clang-format failures found! Run: "

				          echo "|    tools/clang_format_ci.sh ${MERGE_BASE} "

									
										78

.github/workflows/jit_triage.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,78 @@

				name: jit-triage

				on:

				  issues:

				    types: [labeled]

				jobs:

				  welcome:

				    runs-on: ubuntu-18.04

				    steps:

				      - uses: actions/github-script@v2

				        with:

				          github-token: ${{secrets.GITHUB_TOKEN}}

				          script: |

				            // Arguments available:

				            // - github: A pre-authenticated octokit/rest.js client

				            // - context: An object containing the context of the workflow run

				            // - core: A reference to the @actions/core package

				            // - io: A reference to the @actions/io package

				            // Check if issue has a JIT label.

				            const kJitLabel = "oncall: jit";

				            issue = await github.issues.get({

				              owner: context.issue.owner,

				              repo: context.issue.repo,

				              issue_number: context.issue.number,

				            })

				            const hasJitLabel = issue.data.labels.filter(label => label.name == kJitLabel).length > 0;

				            if (!hasJitLabel) {

				              core.debug("Issue " + issue.data.title + " does not have JIT label");

				              return;

				            }

				            // Get project column ID.

				            const kProjectName = "JIT Triage";

				            const kColumnName = "Need triage";

				            // Query all projects in the repository.

				            // TODO: Support pagination once there are > 30 projects.

				            const projects = await github.projects.listForRepo({

				              owner: context.issue.owner,

				              repo: context.issue.repo,

				            });

				            // Filter out unwanted projects and get the ID for the JIT Triage project.

				            const filteredProjects = projects.data.filter(project => project.name == kProjectName);

				            if (filteredProjects.length != 1) {

				              core.setFailed("Unable to find a project named " + kProjectName);

				              return;

				            }

				            const projectId = filteredProjects[0].id;

				            // First, query all columns in the project.

				            // TODO: Support pagination once there are > 30 columns.

				            const columns = await github.projects.listColumns({

				              project_id: projectId,

				            });

				            // Filter out unwanted projects and get the ID for the Need triage column.

				            const filteredColumns = columns.data.filter(column => column.name == kColumnName);

				            if (filteredColumns.length != 1) {

				              core.setFailed("Unable to find a column named " + kColumnName);

				              return;

				            }

				            const columnId = filteredColumns[0].id;

				            // Create a project card for this new issue.

				            await github.projects.createCard({

				              column_id: columnId,

				              content_id: issue.data.id,

				              content_type: "Issue",

				            })

									
										80

.github/workflows/lint.yml
									
										vendored
									
												View File
												
				@ -8,7 +8,7 @@ on:

				jobs:

				  quick-checks:

				    runs-on: ubuntu-latest

				    runs-on: ubuntu-18.04

				    steps:

				      - name: Setup Python

				        uses: actions/setup-python@v1

				@ -17,17 +17,28 @@ jobs:

				          architecture: x64

				      - name: Checkout PyTorch

				        uses: actions/checkout@v1

				      - name: Checkout PR tip

				        run: |

				          set -eux

				          if [[ "${{ github.event_name }}" == "pull_request" ]]; then

				            # We are on a PR, so actions/checkout leaves us on a merge commit.

				            # Check out the actual tip of the branch.

				            git checkout ${{ github.event.pull_request.head.sha }}

				          fi

				          echo ::set-output name=commit_sha::$(git rev-parse HEAD)

				        id: get_pr_tip

				      - name: Ensure consistent CircleCI YAML config

				        run: |

				          pip install -r requirements.txt

				          cd .circleci && ./ensure-consistency.py

				      - name: Ensure Docker version is correctly deployed

				        run: |

				          pip install pyyaml

				          .circleci/validate-docker-version.py

				      - name: Shellcheck Jenkins scripts

				        # https://github.com/koalaman/shellcheck#installing-a-pre-compiled-binary

				        run: |

				          sudo apt-get install -y shellcheck

				          scversion="stable"

				          wget -qO- "https://github.com/koalaman/shellcheck/releases/download/${scversion?}/shellcheck-${scversion?}.linux.x86_64.tar.xz" | tar -xJv

				          sudo cp "shellcheck-${scversion}/shellcheck" /usr/bin/

				          rm -r "shellcheck-${scversion}"

				          shellcheck --version

				          .jenkins/run-shellcheck.sh

				      - name: Ensure no tabs

				        run: |

				@ -35,16 +46,23 @@ jobs:

				      - name: Ensure canonical include

				        run: |

				          (! git grep -I -l $'#include "' -- ./c10 ./aten ./torch/csrc ':(exclude)aten/src/ATen/native/quantized/cpu/qnnpack/**' || (echo "The above files have include with quotes; please convert them to #include <xxxx>"; false))

				      # note that this next step depends on a clean heckout;

				      # if you run it locally then it will likely to complain

				      # about all the generated files in torch/test

				      - name: Ensure C++ source files are not executable

				        run: |

				          (! find . \( -path ./third_party -o -path ./.git -o -path ./torch/bin -o -path ./build \) -prune -o -type f -executable -regextype posix-egrep -not -regex '.+(\.(bash|sh|py|so)|git-pre-commit|git-clang-format)$' -print | grep . || (echo 'The above files have executable permission; please remove their executable permission by using `chmod -x`'; false))

				          (! find . \( -path ./third_party -o -path ./.git -o -path ./torch/bin -o -path ./build \) -prune -o -type f -executable -regextype posix-egrep -not -regex '.+(\.(bash|sh|py|so)|git-pre-commit|git-clang-format|gradlew)$' -print | grep . || (echo 'The above files have executable permission; please remove their executable permission by using `chmod -x`'; false))

				      - name: C++ docs check

				        run: |

				          sudo apt-get install -y doxygen && pip install -r requirements.txt

				          cd docs/cpp/source && ./check-doxygen.sh

				      - name: CUDA kernel launch check

				        run: |

				          set -eux

				          python torch/testing/check_kernel_launches.py |& tee ${GITHUB_WORKSPACE}/cuda_kernel_launch_checks.txt

				  flake8-py3:

				    runs-on: ubuntu-latest

				    runs-on: ubuntu-18.04

				    steps:

				      - name: Setup Python

				        uses: actions/setup-python@v1

				@ -66,23 +84,25 @@ jobs:

				      - name: Run flake8

				        run: |

				          set -eux

				          pip install flake8==3.8.2 flake8-mypy flake8-bugbear flake8-comprehensions flake8-executable flake8-pyi==20.5.0 mccabe pycodestyle==2.6.0 pyflakes==2.2.0

				          pip install -r requirements-flake8.txt

				          flake8 --version

				          flake8 --exit-zero > ${GITHUB_WORKSPACE}/flake8-output.txt

				          cat ${GITHUB_WORKSPACE}/flake8-output.txt

				          flake8 | tee ${GITHUB_WORKSPACE}/flake8-output.txt

				      - name: Add annotations

				        uses: pytorch/add-annotations-github-action@master

				        with:

				          check_name: 'flake8-py3'

				          linter_output_path: 'flake8-output.txt'

				          commit_sha: ${{ steps.get_pr_tip.outputs.commit_sha }}

				          regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorCode>\w\d+) (?<errorDesc>.*)'

				          regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorCode>\w+\d+) (?<errorDesc>.*)'

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Catch any other warnings

				        run: |

				          [ ! -s flake8-output.txt ]

				  clang-tidy:

				    if: github.event_name == 'pull_request'

				    runs-on: ubuntu-latest

				    runs-on: ubuntu-18.04

				    steps:

				      - name: Setup Python

				        uses: actions/setup-python@v1

				@ -112,12 +132,12 @@ jobs:

				          sudo apt-get update

				          sudo apt-get --no-install-recommends -y install cuda-toolkit-10-2

				          # Install dependencies

				          pip install pyyaml

				          pip install pyyaml typing_extensions

				          wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -

				          sudo apt-add-repository "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-8 main"

				          sudo apt-add-repository "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-11 main"

				          sudo apt-get update

				          sudo apt-get install -y clang-tidy-8

				          sudo update-alternatives --install /usr/bin/clang-tidy clang-tidy /usr/bin/clang-tidy-8 1000

				          sudo apt-get install -y clang-tidy-11

				          sudo update-alternatives --install /usr/bin/clang-tidy clang-tidy /usr/bin/clang-tidy-11 1000

				      - name: Run clang-tidy

				        run: |

				          set -eux

				@ -135,33 +155,45 @@ jobs:

				            time python setup.py --cmake-only build

				            # Generate ATen files.

				            time python aten/src/ATen/gen.py \

				            time python -m tools.codegen.gen \

				              -s aten/src/ATen \

				              -d build/aten/src/ATen \

				              aten/src/ATen/Declarations.cwrap \

				              aten/src/THCUNN/generic/THCUNN.h \

				              aten/src/ATen/nn.yaml \

				              aten/src/ATen/native/native_functions.yaml

				              -d build/aten/src/ATen

				            # Generate PyTorch files.

				            time python tools/setup_helpers/generate_code.py            \

				              --declarations-path build/aten/src/ATen/Declarations.yaml \

				              --native-functions-path aten/src/ATen/native/native_functions.yaml \

				              --nn-path aten/src

				          fi

				          # Run Clang-Tidy

				          # The negative filters below are to exclude files that include onnx_pb.h or

				          # caffe2_pb.h, otherwise we'd have to build protos as part of this CI job.

				          # FunctionsManual.cpp is excluded to keep this diff clean. It will be fixed

				          # in a follow up PR.

				          # /torch/csrc/generic/*.cpp is excluded because those files aren't actually built.

				          # deploy/interpreter files are excluded due to using macros and other techniquies

				          # that are not easily converted to accepted c++

				          python tools/clang_tidy.py                               \

				            --verbose                                              \

				            --paths torch/csrc/                                    \

				            --diff "$MERGE_BASE"                                   \

				            -g"-torch/csrc/jit/passes/onnx/helper.cpp"             \

				            -g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp"\

				            -g"-torch/csrc/jit/serialization/onnx.cpp"             \

				            -g"-torch/csrc/jit/serialization/export.cpp"           \

				            -g"-torch/csrc/jit/serialization/import.cpp"           \

				            -g"-torch/csrc/jit/serialization/import_legacy.cpp"    \

				            -g"-torch/csrc/onnx/init.cpp"                          \

				            -g"-torch/csrc/cuda/nccl.*"                            \

				            -g"-torch/csrc/cuda/python_nccl.cpp"                   \

				            -g"-torch/csrc/autograd/FunctionsManual.cpp"           \

				            -g"-torch/csrc/generic/*.cpp"                          \

				            -g"-torch/csrc/jit/codegen/cuda/runtime/*"             \

				            -g"-torch/csrc/deploy/interpreter/interpreter.cpp"     \

				            -g"-torch/csrc/deploy/interpreter/interpreter.h"  \

				            -g"-torch/csrc/deploy/interpreter/interpreter_impl.h"  \

				            -g"-torch/csrc/deploy/interpreter/test_main.cpp"  \

				            "$@" > ${GITHUB_WORKSPACE}/clang-tidy-output.txt

				          cat ${GITHUB_WORKSPACE}/clang-tidy-output.txt

				@ -176,7 +208,7 @@ jobs:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				  cmakelint:

				    runs-on: ubuntu-latest

				    runs-on: ubuntu-18.04

				    steps:

				      - name: Setup Python

				        uses: actions/setup-python@v1

									
										78

.github/workflows/quantization_triage.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,78 @@

				name: quantization-triage

				on:

				  issues:

				    types: [labeled]

				jobs:

				  welcome:

				    runs-on: ubuntu-18.04

				    steps:

				      - uses: actions/github-script@v2

				        with:

				          github-token: ${{secrets.GITHUB_TOKEN}}

				          script: |

				            // Arguments available:

				            // - github: A pre-authenticated octokit/rest.js client

				            // - context: An object containing the context of the workflow run

				            // - core: A reference to the @actions/core package

				            // - io: A reference to the @actions/io package

				            // Check if issue has a Quantization label.

				            const kQuantizationLabel = "oncall: quantization";

				            issue = await github.issues.get({

				              owner: context.issue.owner,

				              repo: context.issue.repo,

				              issue_number: context.issue.number,

				            })

				            const hasQuantizationLabel = issue.data.labels.filter(label => label.name == kQuantizationLabel).length > 0;

				            if (!hasQuantizationLabel) {

				              core.debug("Issue " + issue.data.title + " does not have Quantization label");

				              return;

				            }

				            // Get project column ID.

				            const kProjectName = "Quantization Triage";

				            const kColumnName = "Need Triage";

				            // Query all projects in the repository.

				            // TODO: Support pagination once there are > 30 projects.

				            const projects = await github.projects.listForRepo({

				              owner: context.issue.owner,

				              repo: context.issue.repo,

				            });

				            // Filter out unwanted projects and get the ID for the Quantization Triage project.

				            const filteredProjects = projects.data.filter(project => project.name == kProjectName);

				            if (filteredProjects.length != 1) {

				              core.setFailed("Unable to find a project named " + kProjectName);

				              return;

				            }

				            const projectId = filteredProjects[0].id;

				            // First, query all columns in the project.

				            // TODO: Support pagination once there are > 30 columns.

				            const columns = await github.projects.listColumns({

				              project_id: projectId,

				            });

				            // Filter out unwanted projects and get the ID for the Need triage column.

				            const filteredColumns = columns.data.filter(column => column.name == kColumnName);

				            if (filteredColumns.length != 1) {

				              core.setFailed("Unable to find a column named " + kColumnName);

				              return;

				            }

				            const columnId = filteredColumns[0].id;

				            // Create a project card for this new issue.

				            await github.projects.createCard({

				              column_id: columnId,

				              content_id: issue.data.id,

				              content_type: "Issue",

				            })

									
										36

.github/workflows/stale_pull_requests.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,36 @@

				name: 'Close stale pull requests'

				on:

				  schedule:

				    # TODO: Reduce frequency once we work through the backlog of pull requests

				    - cron: '0 * * * *'

				  workflow_dispatch:

				jobs:

				  stale:

				    if: ${{ github.repository_owner == 'pytorch' }}

				    runs-on: ubuntu-18.04

				    steps:

				      - uses: actions/stale@v3

				        with:

				          stale-pr-message: >

				            Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as `Stale`. <br>

				            Feel free to remove the `Stale` label if you feel this was a mistake. <br>

				            `Stale` pull requests will automatically be closed 30 days after being marked `Stale` <br>

				          exempt-pr-labels: "no-stale,open source,high priority"

				          days-before-stale: 60

				          days-before-close: 90

				  stale-open-source:

				    if: ${{ github.repository_owner == 'pytorch' }}

				    runs-on: ubuntu-18.04

				    steps:

				      - uses: actions/stale@v3

				        with:

				          stale-pr-message: >

				            Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as `Stale`. <br>

				            Feel free to remove the `Stale` label if you feel this was a mistake. <br>

				            If you are unable to remove the `Stale` label please contact a maintainer in order to do so. <br>

				            `Stale` pull requests will automatically be closed 30 days after being marked `Stale` <br>

				          exempt-pr-labels: "no-stale,high priority"

				          only-labels: "open source"

				          days-before-stale: 150

				          days-before-close: 180

Compare commits

5678 Commits v1.6.0 ... release/1.

39 .circleci/README.md Unescape Escape View File

60 .circleci/cimodel/data/binary_build_data.py Unescape Escape View File

101 .circleci/cimodel/data/binary_build_definitions.py Unescape Escape View File

91 .circleci/cimodel/data/caffe2_build_data.py Unescape Escape View File

174 .circleci/cimodel/data/caffe2_build_definitions.py Unescape Escape View File

15 .circleci/cimodel/data/dimensions.py Unescape Escape View File

147 .circleci/cimodel/data/pytorch_build_data.py Unescape Escape View File

217 .circleci/cimodel/data/pytorch_build_definitions.py Unescape Escape View File

28 .circleci/cimodel/data/simple/anaconda_prune_defintions.py Normal file Unescape Escape View File

27 .circleci/cimodel/data/simple/android_definitions.py Unescape Escape View File

10 .circleci/cimodel/data/simple/bazel_definitions.py Unescape Escape View File

2 .circleci/cimodel/data/simple/binary_smoketest.py Unescape Escape View File

35 .circleci/cimodel/data/simple/docker_definitions.py Unescape Escape View File

35 .circleci/cimodel/data/simple/ge_config_tests.py Unescape Escape View File

23 .circleci/cimodel/data/simple/ios_definitions.py Unescape Escape View File

36 .circleci/cimodel/data/simple/mobile_definitions.py Unescape Escape View File

14 .circleci/cimodel/data/simple/nightly_android.py Unescape Escape View File

2 .circleci/cimodel/data/simple/nightly_ios.py Unescape Escape View File

5 .circleci/cimodel/data/simple/util/branch_filters.py Unescape Escape View File

43 .circleci/cimodel/data/simple/util/docker_constants.py Unescape Escape View File

5 .circleci/cimodel/data/simple/util/versions.py Unescape Escape View File

42 .circleci/cimodel/data/windows_build_definitions.py Unescape Escape View File

7739 .circleci/config.yml View File

190 .circleci/docker/build.sh Unescape Escape View File

6 .circleci/docker/build_docker.sh Unescape Escape View File

92 .circleci/docker/centos-rocm/Dockerfile Normal file Unescape Escape View File

38 .circleci/docker/common/install_android.sh Unescape Escape View File

157 .circleci/docker/common/install_base.sh Unescape Escape View File

94 .circleci/docker/common/install_cache.sh Unescape Escape View File

57 .circleci/docker/common/install_conda.sh Unescape Escape View File

21 .circleci/docker/common/install_db.sh Unescape Escape View File

10 .circleci/docker/common/install_devtoolset.sh Executable file Unescape Escape View File

1 .circleci/docker/common/install_gcc.sh Unescape Escape View File

34 .circleci/docker/common/install_glibc.sh Executable file Unescape Escape View File

8 .circleci/docker/common/install_lcov.sh Normal file Unescape Escape View File

30 .circleci/docker/common/install_llvm.sh Unescape Escape View File

4 .circleci/docker/common/install_nccl.sh Normal file Unescape Escape View File

4 .circleci/docker/common/install_openmpi.sh Normal file Unescape Escape View File

21 .circleci/docker/common/install_protobuf.sh Unescape Escape View File

82 .circleci/docker/common/install_rocm.sh Unescape Escape View File

24 .circleci/docker/common/install_swiftshader.sh Executable file Unescape Escape View File

97 .circleci/docker/common/install_travis_python.sh Unescape Escape View File

21 .circleci/docker/common/install_vision.sh Unescape Escape View File

23 .circleci/docker/common/install_vulkan_sdk.sh Executable file Unescape Escape View File

23 .circleci/docker/ubuntu-cuda/Dockerfile Unescape Escape View File

5 .circleci/docker/ubuntu-rocm/Dockerfile Unescape Escape View File

28 .circleci/docker/ubuntu/Dockerfile Unescape Escape View File

74 .circleci/ecr_gc_docker/gc.py Unescape Escape View File

35 .circleci/generate_config_yml.py Unescape Escape View File

5 .circleci/scripts/binary_checkout.sh Unescape Escape View File

3 .circleci/scripts/binary_ios_build.sh Unescape Escape View File

4 .circleci/scripts/binary_ios_test.sh Unescape Escape View File

10 .circleci/scripts/binary_ios_upload.sh Unescape Escape View File

20 .circleci/scripts/binary_linux_build.sh Unescape Escape View File

60 .circleci/scripts/binary_linux_test.sh Unescape Escape View File

49 .circleci/scripts/binary_linux_upload.sh Unescape Escape View File

4 .circleci/scripts/binary_macos_test.sh Unescape Escape View File

49 .circleci/scripts/binary_macos_upload.sh Unescape Escape View File

13 .circleci/scripts/binary_populate_env.sh Unescape Escape View File

2 .circleci/scripts/binary_run_in_docker.sh Unescape Escape View File

98 .circleci/scripts/binary_upload.sh Executable file Unescape Escape View File

4 .circleci/scripts/binary_windows_build.sh Unescape Escape View File

48 .circleci/scripts/binary_windows_upload.sh Unescape Escape View File

46 .circleci/scripts/build_android_gradle.sh Unescape Escape View File

43 .circleci/scripts/cpp_doc_push_script.sh Unescape Escape View File

8 .circleci/scripts/driver_update.bat Normal file Unescape Escape View File

79 .circleci/scripts/python_doc_push_script.sh Unescape Escape View File

95 .circleci/scripts/setup_ci_environment.sh Unescape Escape View File

2 .circleci/scripts/setup_linux_system_environment.sh Unescape Escape View File

4 .circleci/scripts/upload_binary_size_to_scuba.py Unescape Escape View File

2 .circleci/scripts/vs_install.ps1 Unescape Escape View File

5 .circleci/scripts/vs_install_cmath.ps1 Normal file Unescape Escape View File

52 .circleci/scripts/windows_cuda_install.sh Unescape Escape View File

21 .circleci/scripts/windows_cudnn_install.sh Normal file Unescape Escape View File

45 .circleci/validate-docker-version.py Unescape Escape View File

2 .circleci/verbatim-sources/build-parameters/binary-build-params.yml Unescape Escape View File

27 .circleci/verbatim-sources/build-parameters/caffe2-build-params.yml Unescape Escape View File

12 .circleci/verbatim-sources/build-parameters/pytorch-build-params.yml Unescape Escape View File

5678 Commits

v1.6.0 ... release/1.

39

.circleci/README.md

View File

60

.circleci/cimodel/data/binary_build_data.py

View File

101

.circleci/cimodel/data/binary_build_definitions.py

View File

91

.circleci/cimodel/data/caffe2_build_data.py

View File

174

.circleci/cimodel/data/caffe2_build_definitions.py

View File

15

.circleci/cimodel/data/dimensions.py

View File

147

.circleci/cimodel/data/pytorch_build_data.py

View File

217

.circleci/cimodel/data/pytorch_build_definitions.py

View File

28

.circleci/cimodel/data/simple/anaconda_prune_defintions.py Normal file

View File

27

.circleci/cimodel/data/simple/android_definitions.py

View File

10

.circleci/cimodel/data/simple/bazel_definitions.py

View File

2

.circleci/cimodel/data/simple/binary_smoketest.py

View File

35

.circleci/cimodel/data/simple/docker_definitions.py

View File

35

.circleci/cimodel/data/simple/ge_config_tests.py

View File

23

.circleci/cimodel/data/simple/ios_definitions.py

View File

36

.circleci/cimodel/data/simple/mobile_definitions.py

View File

14

.circleci/cimodel/data/simple/nightly_android.py

View File

2

.circleci/cimodel/data/simple/nightly_ios.py

View File

5

.circleci/cimodel/data/simple/util/branch_filters.py

View File

43

.circleci/cimodel/data/simple/util/docker_constants.py

View File

5

.circleci/cimodel/data/simple/util/versions.py

View File

42

.circleci/cimodel/data/windows_build_definitions.py

View File

7739

.circleci/config.yml

View File

190

.circleci/docker/build.sh

View File

6

.circleci/docker/build_docker.sh

View File

92

.circleci/docker/centos-rocm/Dockerfile Normal file

View File

38

.circleci/docker/common/install_android.sh

View File

157

.circleci/docker/common/install_base.sh

View File

94

.circleci/docker/common/install_cache.sh

View File

57

.circleci/docker/common/install_conda.sh

View File

21

.circleci/docker/common/install_db.sh

View File

10

.circleci/docker/common/install_devtoolset.sh Executable file

View File

1

.circleci/docker/common/install_gcc.sh

View File

34

.circleci/docker/common/install_glibc.sh Executable file

View File

8

.circleci/docker/common/install_lcov.sh Normal file

View File

30

.circleci/docker/common/install_llvm.sh

View File

4

.circleci/docker/common/install_nccl.sh Normal file

View File

4

.circleci/docker/common/install_openmpi.sh Normal file

View File

21

.circleci/docker/common/install_protobuf.sh

View File

82

.circleci/docker/common/install_rocm.sh

View File

24

.circleci/docker/common/install_swiftshader.sh Executable file

View File

97

.circleci/docker/common/install_travis_python.sh

View File

21

.circleci/docker/common/install_vision.sh

View File

23

.circleci/docker/common/install_vulkan_sdk.sh Executable file

View File

23

.circleci/docker/ubuntu-cuda/Dockerfile

View File

5

.circleci/docker/ubuntu-rocm/Dockerfile

View File

28

.circleci/docker/ubuntu/Dockerfile

View File

74

.circleci/ecr_gc_docker/gc.py

View File

35

.circleci/generate_config_yml.py

View File

5

.circleci/scripts/binary_checkout.sh

View File

3

.circleci/scripts/binary_ios_build.sh

View File

4

.circleci/scripts/binary_ios_test.sh

View File

10

.circleci/scripts/binary_ios_upload.sh

View File

20

.circleci/scripts/binary_linux_build.sh

View File

60

.circleci/scripts/binary_linux_test.sh

View File

49

.circleci/scripts/binary_linux_upload.sh

View File

4

.circleci/scripts/binary_macos_test.sh

View File

49

.circleci/scripts/binary_macos_upload.sh

View File

13

.circleci/scripts/binary_populate_env.sh

View File

2

.circleci/scripts/binary_run_in_docker.sh

View File

98

.circleci/scripts/binary_upload.sh Executable file

View File

4

.circleci/scripts/binary_windows_build.sh

View File

48

.circleci/scripts/binary_windows_upload.sh

View File

46

.circleci/scripts/build_android_gradle.sh

View File

43

.circleci/scripts/cpp_doc_push_script.sh

View File

8

.circleci/scripts/driver_update.bat Normal file

View File

79

.circleci/scripts/python_doc_push_script.sh

View File

95

.circleci/scripts/setup_ci_environment.sh

View File

2

.circleci/scripts/setup_linux_system_environment.sh

View File

4

.circleci/scripts/upload_binary_size_to_scuba.py

View File

2

.circleci/scripts/vs_install.ps1

View File

5

.circleci/scripts/vs_install_cmath.ps1 Normal file

View File

52

.circleci/scripts/windows_cuda_install.sh

View File

21

.circleci/scripts/windows_cudnn_install.sh Normal file

View File

45

.circleci/validate-docker-version.py

View File

2

.circleci/verbatim-sources/build-parameters/binary-build-params.yml

View File

27

.circleci/verbatim-sources/build-parameters/caffe2-build-params.yml

View File

12

.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml

View File

79

.circleci/verbatim-sources/commands.yml

View File